You are here

Archive for Random

Thoughts on Linux on the desktop

You may disagree with this statement I made earlier today:

Linux on the desktop will never be as polished as OS X, or even Windows, until some company pours hundreds of thousands of dollars (if not more) into building a usable UI on top of it. However, any company that dumps that much into it isn't likely to make those contributions open source.

It is very possible to make Linux desktop-ready. OS X is fundamentally a BSD-clone with a lot of money and time poured into the user interface. So given some vast amount of time and money, it should be possible to do the same thing with Linux.

I could be wrong but: Most organizations that focus on the open source aspects of Linux (well, properly, GNU/Linux and other stuff) tend to not have a lot of money. So providing the required amount of money will probably fall to a company. And a company that is making a such a significant investment is going to want to capitalize on that. Pouring vast amounts of time and money into something and then giving it away is a losing strategy.

It is possible for the software foundations to fund the development but it will take significantly longer. The added time will ensure that desktop Linux UI would always trail the corporate offerings (Windows, OS X, etc.) because it will have never caught up (assuming the corporate offerings do not stagnate).

An open question on email

I have been comparing email statistics between two years ago and this week. The percentages of emails rejected as viruses seems to have changed dramatically. Two years ago, the number of virus emails flagged by ClamAV made up at least 10% (and as high as 30%) of the email volume. This week, I'm seeing less than 1%.

I have no reason to doubt that ClamAV is performing normally. However, such a change is still unexpected. Has anyone else noticed a decrease (at least percentage-wise) of virus emails?

On discipline

I threw out a couple days worth of code a couple weeks ago.

I read somewhere that knowing that you can throw out code and start over should be uplifting and exhilarating. It's not. When I threw it out the topic branch in my local git repository, I was annoyed, I was disgusted, I was unhappy. Perhaps that has to do with why I threw it out.

I have this project I'm working on in my spare time. As I've mentioned before, I'm trying out BDD with RSpec and cucumber. And I came to a point that I realized that most of what I was doing was not behavior-driven. It wasn't even adhering to the YAGNI principle. I had a possible case, one that may not arise, and I was writing to try to meet it without a test case. I had realized I was doing this early on and instead of stopping then, I just said that I would run it through rcov and add tests later.

Sometimes I am very, very stupid.

The problem with BDD or TDD or any new way of doing things is that it requires discipline to properly follow them until you get in the habit of doing them. Once you're in the habit, it's not so hard but you have to get there first. Until then, you have to face your urges to use the old method, to take the quick way you've already learned, and deny them. It requires discipline.

And discipline, at least for me, is hard.

I will be trying again on this particular feature tonight or tomorrow night. Hopefully I will have the discipline this time.

On Choosing a License

I have a web application I want to build. Other sites have an application like this built already but they don't quite do everything I want to do. I have started doing some exploratory work on it. (And I've found it's not quite as easy as it looks. This post by Benjamin Pollack comes to mind.)

I want to release the end result as open source so other people can use it. This way, those who wish to use it for themselves have the option to do so. However, before doing so, I have to pick a license.

Why Not the (A/L)GPL By Default?

The GPL is perhaps the most ubiquitous of the open source licenses. Various projects, from Linux to MySQL to Drupal, have benefited from the license and from the openness it fosters (and enforces). It also has its detractors who complain about the openness required, usually from business entities but sometimes from individual developers.

Zed Shaw sparked a new set of exchanges in the BSD-vs.-GPL holy war two weeks ago with his post "Why I (A/L)GPL". This caused a lot of discussion. For example, Kumar McMillan responded with a post detailing reasons not to license code with the GPL.

Separately, Jacob Kaplan-Moss posted a set of twenty questions for the GPL. James Bennett has other questions and concerns, in his post "When Licenses Attack". These posts point out a distinct lack of clarity with regards to what the GPL allows or disallows with dynamic languages. For example, the GPL explicitly mentions linking but does a Python include or a Ruby require constitute linking?

The GPL has a known loophole for web applications or other network services. The loophole is not a horrible idea, as mentioned by Dries Buytaert and Ted Haeger.

The presence of the loophole, however, concerns some people. The solution is the AGPL which forces service providers et al to provide a means to get the source code for a hosted network service, web application, etc. Some people like this. For example, Alberto García Hierro, formerly of byNotes, chose AGPL for his code.

However, even the AGPL has issues. The AGPL is technically incompatible with the GPL. There was some objection to some of the wording within the Debian community. On the forums for the Frog CMS, a web developer stated an issue with Frog's use of the AGPL and how he felt it would impact his client sites. Ted Haeger asks if the AGPL is too radioactive. And even Alberto García Hierro mentions issues with the AGPL. Both of them wonder if there's need for a LGPL-like version of the AGPL.

What About the Apache License?

Kumar McMillan suggests the Apache License as an alternative to the GPL. It is, indeed, an attractive alternative. Based on the license itself and part of chapter 10 of Van Lindberg's Intellectual Property and Open Source, it looks like a license well suited to a lot of projects who want to avoid the GPL. The clauses about patents and trademarks may not be useful for a small-time developer but the clause about contributions could certainly help to avoid headaches.

While there has been some discussion on how the BSD and MIT licenses interact with the GPL, I have found little documentation on how Apache-licensed code could be integrated into MIT- or BSD-licensed projects. It looks like preserving the license and attributions are necessary. I could see this being messy. The only concrete information I've found is that OpenBSD specifically forbids inclusion of source code licensed under version 2 of the Apache License.

So Choosing a License

For my particular application, the factors that impact a license decision are:

  • Platform: The language or framework the application is built on plays a part in determining the license. Any development using either that would be distributed must be done with a license that is compatible with that language or framework.

    An issue arises when dealing with frameworks that include parts of themselves in the final application. For example, some of the generators in Ruby on Rails could be claimed to work this way. This means that the output is considered a derivative work since it includes part of the original. (This is part of why GNU bison has a license exemption in its output files.) This further requires the use of a compatible license. (I am also not sure if it is possible to have two sections of a file under different licenses.)

    My current reasoning: The current draft of code is built on Rails which is licensed under the MIT/X11 license. Since the MIT license is one of the most permissive licenses, this does not restrict the license choice.

    The copyright status of output files from the Rails generators concerns me. Including parts of Rails within the application obviously makes it a derivative work of Rails. However, I do not know at what point, if any, the copyright for those sections of Rails would transfer to me or if those sections would always be copyrighted by the Rails development team and therefore would always fall under the MIT license and, therefore, always need the MIT license included.

    This concern alone makes the MIT license a strong candidate.

    Were I using Django, the BSD license would likely be a strong candidate for the same reasons.

  • Reusing code: As mentioned, this is not the first time someone has tried to do what I'm doing. There exist open source projects that at least somewhat overlap with what I'm doing. For the purpose of this discussion, we'll say that one is released under the three paragraph BSD license and one is released under the GPL.

    The BSD license has few restrictions on what can be done with the source code. As long as attribution is given and the terms of the license are mentioned, source code can be copied outright. Any derivative works, e.g. translations, modifications, etc., can be used or even relicensed as long as the original attribution and licensing is given.

    The GPL has significant restrictions on what can be done with the source code. While I can do almost anything I want with source code licensed under the BSD license (aside from strip attributions and the original license), I can only include GPL source code in other GPL'd works. Derivative works also have to licensed under the GPL.

    So if I use a BSD or MIT license, I can only use the BSD-licensed project for a reference. This is also true for the AGPL since, as mentioned earlier, it is technically incompatible with the GPL. I cannot use the GPL'd project as a reference. Only if I use the GPL can I use that project as a license.

    (This is technically not completely true. The GPL only applies to copyrighted material. According to section 102b of the US Copyright Law, copyright protection does not apply to ideas, procedures, or processes. It would therefore be theoretically possible to use the GPL'd project as a reference to find out how it does something. However, due to the high risk of cross-contamination, i.e. the likelihood that the reimplementation of a process or procedure would resemble a derivative work rather than a separate one, it is probably safer to not look at all.)

    Down the line, it is also possible that someone might want to use my code. If I release the code under the GPL, they cannot use it unless they themselves are using the GPL. The same is true of the AGPL. If I use a permissive license, there are no limitations.

    My current reasoning: Losing access to the GPL'd project is not a significant concern. It would probably speed development, at least some, but I would probably learn better if I implemented it myself from the beginning.

    I don't expect to do anything significant in coding this application. Anything I come up with could be easily developed by someone else given enough time. Requiring the use of a reciprocal license then just gets in the way.

  • Business model: This is often a sticky point for choosing licenses. A lot of the time, it comes down to two questions: "Do I want to have the option to make money off of this?" and "Do I care if other people make money off of this?"

    If there is a strong desire to prevent other people from making money off of the project, the GPL is a strong candidate. Since the source for the software must always be distributed with the binaries, it is unlikely that someone else could build a business model around direct sales of the software. There is no way to prevent another person from building a business model around offering support or other services based on the software. For example, since Drupal is released under the GPL, it is exceedingly difficult to build a business around selling the software. However, Acquia has a business model built around providing services for Drupal.

    Releasing software under the GPL does not prevent the copyright holder from making money off of it. While the value of paying for the software is lessened since a free version is available, there is nothing that prevents the copyright holder from providing the software under a commercial license. (As far as I know, no copyright license can prevent the copyright holder from relicensing the software.) MySQL AB saw some success with releasing a commercial version of MySQL.

    The BSD and MIT licenses place few restrictions on what someone else can do with the software. While they do not prevent the copyright holder from making money on direct sales of the software, there is nothing to prevent another person from doing the same.

    I believe that it is unrealistic to make money with a direct sales of a web application, especially one built on an open source framework. (There is a market for it obviously, given the existence of the ionCube PHP Encoder and Zend Guard.) Most of the money to be made with a web application is going to be found with services built around a specific application, e.g. hosting or local customization.

    The only way to escape the local customization and hosting loophole if you want to avoid others making money from the application would be to use the AGPL. This forces anyone who modifies the source code and deploys the reuslting work to provide a download link (or other means of distribution). Using any other license does not allow the developer to get access to any downstream modifications unless volunteered by the people who make them.

    My current reasoning:Since this is a web application and built on Ruby on Rails, I don't think there's any concern about the source code being used for monetary gain. (Were I using Java for this, when the application could be distributed solely in binary format, I probably prefer a license that enforced source code distribution.)

    I have no concerns about someone building a service around hosting the application. (I would consider doing this myself but I simply do not have the time.) My main concern is that someone would build a business model around a modified copy of the application and not send those changes upstream. This could only be mitigated through using the AGPL and the person doing this being honest enough to follow the terms of the license. However, given the issues surrounding the AGPL, I doubt that this would be a positive tradeoff.

So, given the above, the MIT license sounds like a strong candidate. I am still thinking it over but this is how I'm currently leaning.

Project Euler as a Means of Learning

About four months ago, I wrote about Project Euler. Back then, I posted that I would do the problems in Ruby to try to hone my skills there. Since then, I've mostly done Ruby but I have also done some solutions in Haskell for speed reasons, namely prime number generation. (I should really revisit how I do that in Ruby...)

Project Euler is a good way to introduce some basic concepts. Each problem is best solved using a given set of the language. They are probably better exercises for those who like puzzles than the exercises normally taught in beginner books or in first semester programming courses. However, I see two problems with using Project Euler as a long-term means of learning a programming language.

First, Project Euler's focus is not on learning a given language or teaching about a given set of language features. Project Euler's focus is on the mathematical problems. More time in solutions, especially in later problems, is spent on figuring out the algorithm or method for solving the problem rather than on how to write that algorithm in a given programming langauge. Some language features or methodologies are never addressed because they never come up in the process of solving the problems.

Second, the scope of Project Euler problems is relatively small. The only focus within a given problem is answering that problem. Each solution amounts to a one-time script with components you might reuse later. As a result, Project Euler is insufficient for learning how to develop applications in a given language. (It may, however, have use in learning how to develop a library or module since some algorithms or components are used repeatedly.) There is not sufficient scope to investigate using it to develop an interactive application.

So I think that Project Euler works out well when first starting. However, once familiar with the basic concepts, supplementing or replacing Project Euler with another method of learning, e.g. building an application, is needed to ensure that there is further learning.

This is not to say that Project Euler should be completely abandoned at that point. It just ceases to be useful for learning about applying the programming language by itself. If you happen to enjoy the puzzles (I know I do), feel free to continue to do them.

GnuPG keys on USB

This is a reasonably simple process. Most of the process can be found in this Enigmail forum discussion.

  1. Move the GnuPG keys to a USB drive. (For the purpose of this discussion, I will assume that the USB drive is X: and the directory on the drive is .gnupg.)
  2. On the computer (not on the USB drive), change gpg.conf to include these directives:
    keyring X:\.gnupg\pubring.gpg
    primary-keyring X:\.gnupg\pubring.gpg
    secret-keyring X:\.gnupg\secring.gpg
    trustdb-name X:\.gnupg\trustdb.gpg

    Under Mac OS X, assuming a volume name of USB drive, you would add:

    keyring /Volumes/USB drive/.gnupg/pubring.gpg
    primary-keyring /Volumes/USB drive/.gnupg/pubring.gpg
    secret-keyring /Volumes/USB drive/.gnupg/secring.gpg
    trustdb-name /Volumes/USB drive/.gnupg/trustdb.gpg

    For Linux, it should be the same as for OS X but /Volumes/USB drive would be replaced by the mount point used for the drive.

  3. And that's it.

If you want to use an encrypted partition or filestore, e.g. through TrueCrypt, the above instructions are still valid. However, you would point it to wherever you have TrueCrypt mount the encrypted partition or filestore.

My PGP key

I finally went through the process of setting up a PGP key. The fingerprint is:

9A86 1FA4 DADE 9C93 F2B0  7C23 38E9 ECDE D61A 0437

You can retrieve the key from a public keyserver or you can download it here.

(There is another PGP key with a fingerprint ending in FCD4 761B but I prefer that it is not used.)

The PGP key was created following the steps used by Ana Guerrero in her blog post.

To make use of PGP, I have set up pinepgp for PINE and Enigmail for Thunderbird. (I'm trying to move away from PINE because it's not working out particularly well but I still have a lot of mail and other such used in it.) I haven't set up PGP on my iBook yet but that's one of my projects for the next few days. I also want to move the PGP keys to a USB drive for security but I haven't started on that process yet either. I will include more information on both when I have them set up.

Infix, Prefix, Postfix, Oh My

Any professionally taught programmer eventually has to learn binary trees. Some high school students are exposed early as part of the AP Computer Science AB exam but most have to learn it as part of a data structures course in college.

There are three ways to read a binary tree:

  • Prefix: Root node, then left child, then right child
  • Infix: Left child, then root node, then right child
  • Postfix: Left child, then right child, then root node

Take, for example, this really simple binary tree:

The ways to read this are:

  • Prefix: + 2 3
  • Infix: 2 + 3
  • Postfix: 2 3 +

The infix reading of this tree resembles (and, in fact, is) the standard way we write and interpret simple mathematical equations. "Two plus three equals..." (As an aside, all simple mathematical equations can be expressed as a binary tree. I'm not happy with the tools I have available to render trees right now so I will leave this as an exercise for you, the reader.)

The postfix reading should be familiar to anyone who owns a Hewlett-Packard graphing calculator. This form of representing mathematical equations is most commonly referred to as Reverse Polish notation. Postfix ordering of mathematical expressions is commonly used for rendering stack-based calculators, usually in assignments for a programming class.

The prefix reading resembles the standard way we use constructs in programming languages. If we had to represent "2 + 3" using a function, we would write something like plus( 2, 3 ). This is most clearly shown with LISP's construct ( + 2 3 ). Haskell's backtick operators around infix operators, e.g. `div`, have a side effect of reminding programmers that most functions are prefix-oriented.

So why discuss reading binary trees anyway? In a classroom, teaching the student how to read a binary tree leads to the student being able to program a way to read a binary tree which will then lead to other things. Here, I discuss reading them as background for the next post which may itself lead to other things.

Edit: (11 May 2009) Unfortunately, in the process of writing the next post, I realized that the entire premise of the post was invalid. So the next post will probably not have anything to do with trees.

Excerpts From My To-do List

Like many sysadmins, I have a lot of things going on. This is an excerpt of some of the current entries and an explanation.

  • Look at planners.

    In Time Management for System Administrators, Thomas Limoncelli suggests getting some sort of personal assistant, either analog or digital. Since my current smartphone leaves a lot to be desired and I have an old-fashioned penchant for fountain pens, going the analog route seems ideal. So, to make an informed decision about what to get, I need to know what's available and that entails a trip to a local office store.

    I may use such a trip to look at furniture as well. I'm planning to work on an "office" in the spare bedroom to separate play from sleep. Having read about Mitch Haile's office, I am somewhat inspired. (Although proper furniture is expensive.) And, having just read this, a shredder may not be a bad idea either.

    The hard part about this is actually going out and doing it.

  • Look at shelving.

    As mentioned above, I want to work on assembling a "home office." One thing I want to avoid is having the computers on the floor since it's not good for them and it's not really good for me. I've also damaged cables because of where the computers are and their proximity to the chair. (Not "major" cables fortunately.)

    I don't need expensive shelving, just durable shelving. "Nice" is a bonus. Plastic is out due to static concerns. Lowe's Home Improvement has several options, like this one.

    Like with the office store above, the hard part is actually going out and doing it. I could probably even do them both in the same trip since there's a Staples near the local Lowe's. (Although I prefer Office Depot out of our local office stores.)

  • Install a personal wiki.

    At first, it sounds silly to install a wiki for use as what's little more than an online notepad. However, it seems like it would be a great way to write down things and flesh out ideas with little overhead. I could keep notes as documents on my laptop but that requires having access to the laptop. I'm fine with the requirement of needing to be online to modify it. I have a small Moleskine notebook in my coat pocket in case I need to make notes when I'm not near a computer. (I have also considered getting a portable voice recorder for taking notes as well.)

  • Play with RT.

    In Time Management for System Administrators, Limoncelli suggests using RT as a tracking system. While this won't work for anything on the immediate to-do list, it would help make sure that nothing falls through the cracks.

    I already have RT set up (although I'm not sure the email functionality is working correctly, I'll have to check that) but I haven't done a lot of playing with it yet. I expect that I will end up rereading RT Essentials.

  • Read the books I got this month.

    For some reason, February is usually a big book-buying month for me. Highlights include The Algorithm Design Manual, Code Complete, Programming Pearls, The Practice of System and Network Administration, and Pragmatic Thinking and Learning. This will probably keep me busy until, oh, May.

    There's other books I have which I'm sure I haven't read or don't remember reading. Most of them are still in boxes. I hope to unbox most of them when I'm done setting up the "office."

  • Write a Rails app.

    I actually have a specific Rails app in mind. have mentioned before that I like books. However, I've taken to buying the print book + PDF bundles when I by from The Pragmatic Programmers. I've been saved some by having a PDF of the book when it has not been available.

    Since I can't have all of my print books while traveling but I can have all of my PDF books, I want to have an application I can use to search the PDFs for given content. (For various reasons, it should only be accessible locally from the laptop and only run when I want it to run. However, this isn't an application issue as much as a deployment one.) Ferret, paired with either pdftotext or pdftohtml, should work for the search component. It should be reasonably easy to write. I just, you know, have to do it.

It's Harder Than It Looks

The problem with practice is remembering to practice. This is especially true if you are trying to practice as a way to break a bad habit.

It's been a bad weekend for me. I forgot to practice touch-typing. And while I keep starting with tests for my Project Euler problems, I keep slipping to implementing large parts of the algorithm without further testing.

At least according to Thomas Limoncelli, "[P]sychologists tell us that it takes 21 days of doing a new behavior to develop it into a habit." Twenty-one days doesn't seem that long, does it? It's only three weeks, right?

Let's face it: It's hard.

It's not just three weeks of doing it. It's three weeks of making yourself do it. It's three weeks of making sure you don't miss a single day for whatever reason.

This is a common thing for people who decide to start exercise regimens. They start and keep going for a week and then stop. Something comes up and they put it off. And then they put it off again. And... I haven't found a statistic about the number of gym memberships that go unused but it apparently ranks as one of the top ten money drains.

It's easy to say that you need to force yourself to do things day after day, force yourself to think about doing them until you're doing them without thinking about them. It's easy to say this because talk is cheap. Action is not.