OSCON 2006

O'Reilly publishing hosts OSCON, which is a convention dedicated to open source. OSCON 2006 was my second OSCON. My first OSCON was in 2004, just after I started working on Sun's OpenSolaris team. Apologies for the delay in posting the trip report--life's been a bit hectic since July.

general impressions

At OSCON 2004 I tried to hit as many "experiences" and "how-to" talks as I could. This year I have a better understanding of the tools, so I skipped the open source how-to talks. I did go to a few "experiences" talks, in the hopes that I'd learn something that I was overlooking in my work with OpenSolaris. While there was good information in those talks, they weren't the learning experience I was hoping for. I did have good luck with other talks that I went to just because they seemed interesting. More on that below.

I also spent a few hours helping staff Sun's booth at OSCON. This was quite a contrast from JavaOne. At the JavaOne booth, I spent a lot of time talking about OpenSolaris and why Sun is doing it. At OSCON, pretty much everyone knew about OpenSolaris. I did get a question about the status of the ksh93 integration work. There was also someone asking about ZFS and what's so cool about it (he went away suitably impressed). But a lot of the questions were either about support for specific devices--which I couldn't answer--or about things unrelated to Solaris.

Wednesday talks

The first talk I went to was about the use of open source by the US government, particularly the Department of Defense (DoD). Open source software is already used in government systems, including the military. Despite that, some people in government find "open source" to be scary[1]. Also, the DoD is interested in more than just software. So they tend to talk about "open technology development", rather than "open source". The emphasis is on open standards and interfaces, not implementations.

The benefits that the DoD hopes to get from open technology include support for dispersed teams, technological agility (e.g., avoid vendor lock-in), and efficient use of money (avoid duplicate work).

The DoD has several interesting issues that it has to deal with. One issue is how to handle security concerns, e.g., how to participate without revealing classified information. Another issue is that the US government is not allowed to hold copyright on anything, so what happens when someone in the DoD wants to contribute code back to a project? A third issue is regulatory requirements. For example, there are regulations that bound the profit that a company can make on a government contract. So suppose there are two bids, one based on open source and one based on proprietary software that was developed from scratch. It's conceivable that the open source bid would cost less but would be ruled out because it gives the vendor too much of a profit.

The second talk was an experience talk about open sourcing the MySQL Clusters code, which had been developed at Ericsson and then sold to MySQL. The talk was structured as a series of "shocks" that the development team had to deal with.

Shock 1 was that the code needed to install in less than 15 minutes. Prior to this, the team was proud of the fact that they had gotten the install time down from 1-2 days to 3-4 hours. But people can be impatient--if it doesn't install quickly enough, they'll give up and move on to something else that looks cool. And the database that gets included in a final product is often the database that was used for the prototype. Ease of installation means increased likelihood of being used for the prototype, which means increased likelihood of being used in someone's final product.

Shock 2 was what "easy to understand" means. At Ericsson the documentation could assume that the reader understood the basic concepts, because there were people whose job was to help the customer understand those issues. As an open source project, the documentation had to stand by itself. Also, the documentation (and code) got a lot more exposure as open source, so the weak spots showed up more clearly. Since going open source, they've put more documentation in the code and have less design documentation. In the future, they'd like to have more design documentation, which they plan to publish for early community feedback.

Shock 5 was that all their bug reports must be published on the web. Even security bugs. The reason they can get away with this for security bugs is that they don't have many, and they're usually fixed quickly.[2]

Shock 6 was adapting to distributed teams. One change was that they had to write more things down than they used to. They also use plain text more than they used to. They do have annual meetings for the whole developer organization, plus individual teams can get together more frequently if it seems necessary.

Shock 7 was the increase in email load. They also use IRC, but they're starting to move towards more use of the telephone. The advantage of asynchronous communication is that it encourages self-sufficiency, but it also makes it easier for people to proceed along the wrong track. They have been talking about using distributed whiteboards, but that hasn't happened yet, though they do sometimes use screen.

Shock 9 was the use of agile development techniques, such as monthly sprints. That is, they pick the goals for the month and then focus on them. They take less interruptions than they used to; those issues are instead deferred to the next month's sprint.

Shock 10 was the constant stream of feedback from the community.

The third talk was another experiences talk about opening closed code that BEA had acquired. This talk focused more on business issues. For example, the speaker (Neelan Choksi) talked about how guerilla marketing does not mean there is no place for more traditional marketing. He mentioned that BEA is out-sourcing their professional and training services. He said BEA isn't really set up to do it themselves, and that out-sourcing these services helps grow the community.

The fourth talk was about the best and worst of open source tactics. This talk was a grab-bag of things that Cliff Schmidt had found to work well, plus a few things that don't work so well.

Phased delivery seems to be useful. One slide was about the "maturity sweet spot": the code works well enough that people can play with it, but it could be even better with some help. Another slide talked about a "series of film shorts" model; he used OpenSolaris, and how Sun is delivering it in phases, as an example of this.

Modularity is important, of course ("modularity or death!"). It's what lets random people go off and hack on things and be able to easily integrate their changes later.

Some things to think about when implementing to a standard:

  • How is the standard licensed? For example, what is the patent clause, if any?
  • How mature is the standard? If the standard is not "done", make sure you'll continue to have access to the standard as it evolves.
  • Does the standards body encourage open participation?

Related to that was a caution about how hard it is to create a de-facto standard yourself (the "ubiquity play" model of open source). If there are competing standards, consider jumping on your competitor's bandwagon. I suppose this could include some sort of migration functionality, as well as finding ways to interoperate.

If you're trying to establish a standard platform, it's important that the platform be able to evolve gracefully. Focus on interfaces, and lay down the backward compatibility rules early on.

Marketing mistakes to avoid include marketing vaporware, tunnel vision, promoting your company over the community, ignorance of the "live web", and shooting yourself in the foot when selling your support services.

There was a Solaris BOF Wednesday evening, but I missed it. There was a reception that I had been planning to graze at before the BOF, but it turned out to have mostly food I couldn't eat. So I went off in search of a restaurant. By the time I got back, the BOF was pretty much over.[3]

Thursday talks

The first talk that I attended on Thursday was a fascinating talk about the history of copyright by Karl Fogel. Briefly, the introduction of the printing press made it easier for people to produce anti-government leaflets. The English government responded by granting monopoly powers over printing presses and distribution of printed works to a "stationers guild". In return, the guild had to run everything past government censors. Eventually the makeup of the government changed, and in the late 1600s Parliament decided to revoke this monopoly. The guild proposed copyright in response. It helped them retain some of the control and income that they had as a monopoly. Also, by basing copyright in property law, they made it harder for the government to take away, compared to how easily Parliament had dissolved the original monopoly setup.

Karl's point is that copyright is designed primarily to benefit the distributor, not the artist or author. So now that digital technology has made copying and distribution even easier than before, what should we do with current copyright law?

The second talk was Guido van Rossum's talk about Python 3000, especially about how he is approaching it and what some of the changes are likely to be. The actual release will probably be called Python 3.0. The "3000" name was a dig at "Windows 2000".

One theme for 3.0 is to take the opportunity to fix some bugs from the early design of Python. But it's not a redesign from the ground up. No major changes to the look-and-feel are on the table (e.g., no macros). Nor will the changes be decided by a community vote. Guido will make the final decision(s), with lots of community input.

Some of the things that will go away in Python 3000 are classic classes, string exceptions, differences between int and long, and at least some MT-unsafe APIs. Other incompatible changes include additional keywords, incompatible changes to various methods, and making strings Unicode (actual encoding has not been decided yet).

To go along with the strings changes there will be a new "bytes" data type for byte arrays, which will have some string-like methods, e.g., find(), but not things like case conversion. All data either be binary or text, and you won't be able to mix them. Guido mentioned learning from the Java streams API for things like stackable components and how to handle Unicode issues.

The time frame for Python 3.0 is still unclear. Guido was thinking of maybe doing an alpha release in early 2007, with the final release around a year later.

The migration from 2.x to 3.0 still needs to be worked out. Issues include the time frame, what 3.0 features to back-port to 2.x, and what migration tools to provide.

The challenge for migration tools is that there's a lot of information that's only available at runtime. The current plan is to have static analysis tools that will do around 80% of the job, and to provide an instrumented 2.x runtime that will warn about doomed code.

People who would like to keep current on Guido's plans for Python 3000 can follow his blog at artima.com/weblogs/.

The third talk I went to on Thursday was Simon Phipps' talk on Sun's Open Source Strategy. Part of the talk was explaining why Sun has not open sourced Java until now. Another part of the talk was about recent work, such as making the JDK redistributable. Someone asked if the compatibility test suite (TCK) will be open sourced. The answer was that folks were still trying to work that out.

The fourth Thursday talk that I went to was Jeff Waugh's talk on Building the Ubuntu Community. Some of the things that Jeff said are important for a community are shared values, shared vision, and governance. He broke down governance into 3 areas: code of conduct[4], technical policies, and governance policies. He also said that it's crucial to have people who help build the community and who keep it healthy.

Jeff talked a bit about authority and responsibility of community members. He said that communities who lack a "benevolent dictator" don't have a central person for making decisions and resolving conflicts, so it's easier for gridlock to set in. Jeff went on to say that if you give someone responsibility, they'll usually step up to it. But it's important to be clear who has the responsibility and authority for something. First, it helps other people figure out who to talk to. Second, it encourages the person to step up to the role.

Someone asked for the justification of including NVidia drivers in what's otherwise 100% free software. Jeff answered that the end-user visual experience is very important. Ubuntu has a limited number of non-free modules, all of which are drivers. Of course, they are pressuring the relevant hardware vendors to do what's needed to support open drivers.

I went to two BOFs on Thursday evening. One was the ZFS and Zones BOF; the other was the BOF on Sun's Open Source Strategy. I didn't take any notes from the ZFS and Zones BOF. I do remember that it was mostly attended by Sun employees.

The second BOF was run by Simon Phipps. He kicked off the BOF by asking the non-Sun employees to say what Sun is doing wrong. Most of the responses were familiar:

  • Sun is disjointed, non-unified, with no clear business strategy
  • Sun is not transparent enough
  • Sun is not getting its open source story out there and visible enough
  • Solaris needs broader hardware support and a better out-of-the-box experience for desktop users
  • Sun is not explaining how OpenSolaris and Solaris Express are well-suited to hobbyists
  • patches are not readily available
  • Sun is using the CDDL instead of the GPL

I was surprised by a remark that Sun has an "asymmetric" relationship with the community. The copyright assignment requirement in the contributor agreement was pointed at as an example of this. So perhaps one of the things Sun is doing wrong is not explaining the contributor agreement well enough. Later in the BOF, Simon mentioned that all Sun open source projects (JDK 6, OpenOffice, OpenSolaris, etc.) use the same joint copyright assignment.

At one point in the BOF there was a description of where Sun expects to find new customers: companies who want to put together a solution from Sun products, perhaps in combinations with others' products. Sun's value-add would be the ability to put the solution together more cheaply than the customer could. Someone pointed out that even with this business plan, Sun still has to provide things like a good desktop, in order to attract developers.

Friday talks

The Friday talks were fun. In the first one, Jonathan Oxer talked about using scripting languages to control hardware. He started by talking about the different ports that are available on a typical computer, with parallel ports being the easiest to work with, and IR ports not being as useful as the others. The reason that parallel ports are easy is that you can set or read bits directly--there are no protocols that you have to deal with. Most scripting languages do require a helper program to access the port. With Linux the parallel port is available to C programs (using <sys/io.h>) as a memory-mapped address. The helper program could also map the port to a network socket.

Jon also talked a bit about safety. Parallel ports are safe in that the signal is only 5 volts. On the other hand, if your application controls power to appliances or other things that you might plug into a wall outlet, Jon recommended using switchable power boxes, rather than messing with 110V (or higher) directly.

Jonathan then demoed several applications. One application would send his cell phone a text message when his mailbox at home was opened and closed (i.e., when he had mail). Another application was a magnetic lock that uses RFID tags as keys.

The last talk I went to was by Michael Sparks, who works in a research group at the BBC. The BBC generates a lot of audio and video data[5], and they want easy ways to manage and manipulate it. Michael talked about Kamaelia, which is a Python application that lets them do that. Kamaelia provides a toolbox of simple components that can be pipelined together using Python generators. Developers don't have to deal with concurrency issues thanks to the pipeline structure. Nor do they have to deal with low-level details related to multimedia data, because that's all managed by the components in the toolbox.

Kamaelia currently only runs on Linux because drivers for some of their hardware are not available on other flavors of Unix.


All in all, it was a good week: lots of people doing interesting stuff, and Portland is always a fun city to visit. Next time I hope I finally make it to Powell's.

[1] This appears to be generational, with most of the concern coming from people who are older than 45.

[2] And they presumably don't have to coordinate the announcement of the fix with other vendors.

[3] It turns out that there's a perfectly fine sandwich shop a couple blocks from the convention center. But I didn't find out about it until Thursday.

[4] One of the rules in the code of conduct is that the code of conduct is not to be used as a weapon.

[5] One channel of video for a month is around 200 GB.

Technorati tags: OSCON OSCON06


What is your opinion about the ksh93 integration progress? Do you think it is a good example for an Open Solaris project or are there points where future improvements are planned to improve cooperation between Sun and external contributors?

Posted by Knut Reinert on October 23, 2006 at 06:19 AM PDT #

Those are good questions, and you're not the only person to ask them.

To answer your first question, I'm disappointed that the initial ksh93 integration is taking as long as it is. But it's bigger than any other piece of work that an external contributor has put into OpenSolaris so far, so I'm only somewhat disappointed. Most first-time efforts take longer than we expect.

To answer your second question, there are certainly some improvements that can be made, both in terms of infrastructure and in terms of "process". We're working on the infrastructure improvements. We now have production access to Subversion repositories on opensolaris.org. Beta access to Mercurial repositories should begin soon.

The process issues are less straightforward. The questions that the ARC asked for the two ARC cases were fairly good examples of what things the ARC is interested in. But the questions weren't always asked in a way that made it clear why they were worth asking. The ARC has started looking at ways to do better for future cases.

Another "process" improvement has to do with things the project team can do to make reviews go more smoothly. I think the main lesson for the ksh93 project (and for me, as a project mentor) was to submit the ARC review sooner. Having an early review means that there's more time to discuss and consider changes, which is less stressful than making changes at the last minute. Also, it encourages having the main discussions with the people who will actually approve the project, rather than having the same argument multiple times with different groups of people (which is what happened with the libcmd changes).

Finally, I think it's important to keep in mind that the ksh93 integration is about more than just putting another shell into OpenSolaris. There's a lot of infrastructure that ksh93 comes with, and the project team has been thinking (long-term) about other pieces of OpenSolaris that could take advantage of that infrastructure. And of course there's also the long-term goal of replacing /usr/bin/ksh with ksh93. Setting up the foundation for this long-term work does require some additional planning, work, and review.

Posted by Mike Kupfer on October 23, 2006 at 10:24 AM PDT #

Post a Comment:
Comments are closed for this entry.

Random information that I hope will be interesting to Oracle's technical community. The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.


« April 2014