Open Source or Dirty Laundry?
By user12610707 on Sep 22, 2006
As we get ready to dive into the open source world, one of the many activities that's occurring is the preparation of the code for being open sourced. There are some obvious things that need to be done. For instance, our source code includes a mixture of code that we've written and code that we've licensed from others. We'll need to separate out the latter and open source only the appropriate pieces of code.
Another preparation activity is "scrubbing" the code of proprietary information, mentions of particular customers, developers, technologies etc. This is a little less obvious, but consider the following example:
/\* \* HACK - insert a time delay here because the stupid Intertrode \* Technologies framebuffer driver will hang the system if we \* don't. Those guys over there must really be idiots. \*/
While all of the above might be true, we probably have a relationship of some sort with Intertrode Tech and having comments like this in the code could hurt our business somehow and so it should be removed. Arguably it shouldn't have been there in the first place, but now's the time to take it out.
Another part of the "scrubbing" activity is to remove profanity and other "undesirable" words such as "hack" and "kludge" from comments. The list of undesirable words includes common tags such as FIXME, TODO, BUG, and so forth. These are usually replaced with something generic like "IMPL NOTE". This is the point at which this stops making sense to me. Comments that express strongly held opinions are quite useful and help to convey the thinking and the point of view of the author of the code. The goal of writing code in a high-level language is in fact to convey intent and semantics to other programmers. The comments are part of the code, and -- warts and all -- they help to convey the same information. Scrubbing this out can remove this information, decreasing the maintainability of the code.
Consider, for example, how much less information the above comment would contain after it has been scrubbed:
/\* \* IMPL NOTE - insert a time delay here because the IMPL NOTE Intertrode \* Technologies framebuffer driver will hang the system if we \* don't. Those guys over there must really be IMPL NOTEs. \*/
Seriously, there is a real danger of removing vital information in this process. It seems likely that the comment might actually be scrubbed down to something like this:
/\* IMPL NOTE - insert time delay \*/
which says just about nothing. It says that a time delay was introduced, but this is probably redundant since there will be something like a call to sleep() immediately below. It fails to say why the time delay was introduced, which is the most important piece of information. Without this information, the delay might remain in the code base long after it's necessary (after Intertrode had corrected its driver) because developers are too afraid to take it out. Or, they'll go ahead and take it out, leading to an all-night debugging session by a hapless developer who happens to be trying to port the system to a board with a backrev Intertrode framebuffer.
The question of profanity in code is a somewhat different matter. Personally, I don't swear very much, but sometimes a well-placed expletive is exactly what's necessary to convey the information in the right way. Removing profanity reduces the information content of the code just the same as removal of other "undesirable" words. Consider the rich texture of the comments and names in the following code, and how this texture resonates vibrantly against the vigorous nesting of the Lisp code:
This masterpiece of coding profanity sets a standard to which we should all aspire.