Saturday Aug 04, 2012

Recovering a Totally Full ZFS Filesystem

If your ZFS filesystem is completely full, it can be difficult to free up space.  Most people's first impulse is to delete files, but that often fails because ZFS can require free space to record the deletion.  A colleague of mine ran into this a few weeks ago.  I advised him to try truncating a large file using shell redirection (e.g., "cat /dev/null > my_large_file"), but that didn't work.  He was able to free up space using the truncate command, which is available in Solaris 11.

 truncate -s 0 my_large_file

Tuesday Nov 29, 2011

IPS Facets and Info files

One of the unusual things about IPS is its "facet" feature. For example, if you're a developer using the foo library, you don't install a libfoo-dev package to get the header files. Intead, you install the libfoo package, and your facet.devel setting controls whether you get header files.

I was reminded of this recently when I tried to look at some documentation for Emacs Org mode. I was surprised when Emacs's Info browser said it couldn't find the top-level Info directory. I poked around in /usr/share but couldn't find any info files.

  $ ls -l /usr/share/info
  ls: cannot access /usr/share/info: No such file or directory

Was I was missing a package?

  $ pkg list -a | egrep "info|emacs"
  editor/gnu-emacs                                  23.1-0.175.0.0.0.2.537     i--
  editor/gnu-emacs/gnu-emacs-gtk                    23.1-0.175.0.0.0.2.537     i--
  editor/gnu-emacs/gnu-emacs-lisp                   23.1-0.175.0.0.0.2.537     ---
  editor/gnu-emacs/gnu-emacs-no-x11                 23.1-0.175.0.0.0.2.537     ---
  editor/gnu-emacs/gnu-emacs-x11                    23.1-0.175.0.0.0.2.537     i--
  system/data/terminfo                              0.5.11-0.175.0.0.0.2.1     i--
  system/data/terminfo/terminfo-core                0.5.11-0.175.0.0.0.2.1     i--
  text/texinfo                                      4.7-0.175.0.0.0.2.537      i--
  x11/diagnostic/x11-info-clients                   7.6-0.175.0.0.0.0.1215     i--
  $
  

Hmm. I didn't have the gnu-emacs-lisp package. That seemed an unlikely place to stick the Info files, and pkg(1) confirmed that the info files were not there:

  $ pkg contents -r gnu-emacs-lisp | grep info
  usr/share/emacs/23.1/lisp/info-look.el.gz
  usr/share/emacs/23.1/lisp/info-xref.el.gz
  usr/share/emacs/23.1/lisp/info.el.gz
  usr/share/emacs/23.1/lisp/informat.el.gz
  usr/share/emacs/23.1/lisp/org/org-info.el.gz
  usr/share/emacs/23.1/lisp/org/org-jsinfo.el.gz
  usr/share/emacs/23.1/lisp/pcvs-info.el.gz
  usr/share/emacs/23.1/lisp/textmodes/makeinfo.el.gz
  usr/share/emacs/23.1/lisp/textmodes/texinfo.el.gz
  $ 
  

Well, if I have what look like the right packages but don't have the right files, the next thing to check are the facets.

The first check is whether there is a facet associated with the Info files:

  $ pkg contents -m gnu-emacs | grep usr/share/info
  dir facet.doc.info=true group=bin mode=0755 owner=root path=usr/share/info
  file [...] chash=[...] facet.doc.info=true group=bin mode=0444 owner=root path=usr/share/info/mh-e-1 [...] 
  file [...] chash=[...] facet.doc.info=true group=bin mode=0444 owner=root path=usr/share/info/mh-e-2 [...]
  [...]

Yes, they're associated with facet.doc.info.

Now let's look at the facet settings on my desktop:

  $ pkg facet
  FACETS           VALUE
  facet.locale.en* True
  facet.locale*    False
  facet.doc.man    True
  facet.doc*       False
  $ 
  

Oops. I've got man pages and various English documentation files, but not the Info files. Let's fix that:

  # pkg change-facet facet.doc.info=True
  Packages to update: 970
  Variants/Facets to change:   1
  Create boot environment:  No
  Create backup boot environment: Yes
  Services to change:   1
  
  DOWNLOAD                                  PKGS       FILES    XFER (MB)
  Completed                              970/970     181/181      9.2/9.2
  
  PHASE                                        ACTIONS
  Install Phase                                226/226
  
  PHASE                                          ITEMS
  Image State Update Phase                         2/2
  
  PHASE                                          ITEMS
  Reading Existing Index                           8/8
  Indexing Packages                            970/970
  # 

Now we have the info files:

  $ ls -F /usr/share/info
  a2ps.info	  dir@	      flex.info      groff-2	     regex.info
  aalib.info	  dired-x     flex.info-1    groff-3	     remember
  ...

Sunday Jun 24, 2007

GNOME Disk Analyzer

I upgraded my desktop to snv_66 (build 66 of Solaris Nevada) earlier in the week and played around some with the new GNOME bits (2.18). The new Disk Analyzer GUI has a much-improved format for showing where you're using disk space. In the example below, about 25% of my home directory is email, and about 23 MB is email for NFS.

Mike's home directory (GNOME)

I still prefer the equivalent view in Konqueror, because it can identify individual files, whereas the GNOME tool only tells you about directories. But the radial format in the GNOME tool is pretty cool.

Mike's
    home directory (KDE)

(In case anyone is wondering, the KDE screenshot is from back in March, so this picture is not directly comparable with the one above.)

Wednesday May 30, 2007

What I Learned From Ubuntu

Mark Shuttleworth and a few Ubuntu developers stopped by the Sun Menlo Park campus on Friday May 4th. I'm not working with Ubuntu, but since I'm involved with the Solaris Companion and with general OpenSolaris issues, I wanted to see what they had to say about third-party packages and about how they do their releases.

You can organize Ubuntu packages along two dimensions. The first dimension is whether the package is free (libre). The second dimension is whether Canonical (Ubuntu's corporate sponsor) provides support (e.g., security fixes). This gives us the following table:

supported by Canonical not supported by Canonical
free main
(2,000 packages)
universe
(18,000 packages)
not free restricted
(5 packages)
multiverse
(200 packages)

Notice that Canonical only supports 10% of the packages in the distro.

There are two levels of access to the third-party packages. The first level is an engineering repository which bypasses Canonical. That is, people can update the repository at any time, without regard to the Ubuntu release schedule. The second level is the actual distro, which has tighter controls.

Some of the packages are available on the Ubuntu CD, but many are only available via network download. Canonical does not track the downloads. This would be heresy inside Sun, where there's a big emphasis on measuring things. But Mark said that Canonical doesn't really care about the download numbers, and it would be difficult to get accurate numbers anyway (e.g., because of mirroring).

Someone asked Mark how they deal with packages that potentially infringe on a patent. Mark said that there's no such thing as a global patent, so those packages are allowed in the distro, but they're only available via network download. The user self-certifies that it's okay for him or her to use the package.

Another issue that comes up with third-party packages is how to track bugs. Mark talked about this a bit, and it's is something we're facing with OpenSolaris, too. The basic problem is that for a given package, there may be two bug databases: one deployed by the upstream project and one deployed by the distro. So far, the industry best practice seems to be to push distro-independent information to the upstream database, leaving distro-specific details in the distro's database. This approach is less than ideal, because it requires a fair amount of manual effort to track the bug status and to keep the right information in the right database. Canonical developed a tracking application called Launchpad to help deal with this, but Mark mentioned that it's still not quite what they want, and that Canonical might be revisiting the issue in a couple years. It'd be nice if the Ubuntu and OpenSolaris communities could somehow work together on that.

Mark spent a little time describing Launchpad, and it does have some nice bug-tracking features. For example, you can create hyperlinks to the upstream database entry, and Launchpad can automatically query the upstream database to get the bug's status.

Launchpad also has more general collaboration support, such as mailing lists, project web space, and a code repository. Launchpad includes features that would be useful on opensolaris.org, like a translation tracker and an application for proposing and tracking project ideas.

The other major topic that I was interested in was how Ubuntu releases are done. Ubuntu releases follow a train model, with releases appearing every 6 months. There is support for 18 months, except for Long Term Support (LTS) releases, where servers are supported for 5 years. For those who are not familiar with the train model, the basic idea is that if your code is not ready in time, it is bumped to the next release, rather than delaying the current release.

Sun tried a train model for Solaris in the 1990s, with releases every 6 months[1]. It didn't work for us, and we eventually gave it up. I wasn't involved with Solaris release management, so I probably have a limited perspective on what all the issues were. But as a developer I could see a couple things that contributed to abandoning 6-month trains.

The first problem that I saw was that we didn't stick to the cutoff dates. There was often some new feature that just couldn't wait for the next train, so we would bend the rules and let changes integrate after the nominal cutoff[2]. I suppose that having a late binding mechanism makes sense for exceptional circumstances, but I think it got overused. These days, it seems like late binding isn't just a safety net to keep the release from falling apart, it's a regular phase in the release cycle. I suppose the net effect isn't too horrible--it's effectively a gradual freezing of the code, rather than a hard freeze. But it does push back the real, final freeze date, which then reduces the time that is available for later parts of the release cycle.

This ties in to the other problem that I saw, which was that the Beta Test period was too short. I forget how long the Beta periods were, but they were short enough that by the time customers had actually deployed the code, identified and reported issues, and we had worked out a fix, it was too late to get the fix into that release.

Of course, this begs the question of why Canonical doesn't have the same problems with Ubuntu.

One explanation is that much of what goes into Ubuntu comes from an upstream source and is already (more or less) stable. There is some original work done for Ubuntu, but it's not the "deep R&D"[3] of things like SMF, DTrace, or ZFS. It's hard to predict the schedule for cutting-edge projects, particularly ones that affect large parts of the system. That's not an entirely satisfactory answer, though, because according to the train model, if a project is late, you just bump it to the next release. So there must be more going on than that.

One thing that could mess up a train model is technical dependencies. Suppose Project A depends on Project B. If you integrate parts of A under the assumption that B will integrate later in the release, there will be a strong temptation to delay the release if B is late. The Ubuntu folks try to avoid this problem by avoiding dependencies on upstream cde that's scheduled to be released near the feature freeze. How strict they are about this depends in part on how much they trust the upstream provider to meet its schedule. And in a pinch, they might take beta code if it's deemed to be stable enough. I don't know if technical dependencies were a factor in moving a way from the train model for Solaris releases. It shouldn't have been an issue for the OS/Net consolidation ("FCS Quality All the Time"), but I don't know about Solaris as a whole.

I suppose there could have also been a sort of "marketing and PR" dependency problem, where we feared a loss of face if Feature X didn't make its target release. I don't know if this was actually an issue, but Sun does seem to like big, flashy announcements, and there are quite a few analyst briefings that happen under embargo[4] prior to these events.

Another explanation for why Canonical can make 6-month trains work is that the 6-month releases serve a different target market than the one Solaris has been in. A noticeable chunk of the Solaris user base would go nuts with a 6-month release cycle and 18-month support tail. As soon as they got one release qualified and deployed, they'd have to do it all over again.

So one thing we might want to look at for Solaris is to have two release vehicles, similar to the 6-month and LTS releases that Canonical is doing with Ubuntu. But there are still some issues with that model that we'd want to figure out. For example, the Ubuntu folks said that most of the Ubuntu LTS customers just want security fixes, whereas Solaris customers often demand patches for non-security bugs.

Another thing that distinguishes Ubuntu releases from the 6-month Solaris trains is when customers actually get the bits to play with. There are only 3 weeks between the Beta release and final release for Gutsy, but there will be six snapshots that are available sooner, with the first (fairly unstable) one appearing 16 weeks before the Beta release. This gives users a larger window than we had with the 6-month Solaris trains in which to try out the release and give feedback.

So, to sum it all up: I learned that distros can successfully deal with issues that OpenSolaris and Sun are facing, like how to provide the many third-party packages that users want, and how to keep them current. What we need to do now is figure out how to make it work for OpenSolaris, without sacrificing the stability that attracted many Solaris users in the first place.


[1] The internal code names for SunOS 5.2, 5.3, and 5.4 were on493, on1093, and on494, respectively.

[2] At some point we came up with a formalized "late binding" process, but I don't remember just when that was introduced.

[3] That's the term Mark used.

[4] That is, the analyst isn't allowed to publish anything about it before a certain date and time.

Tuesday Oct 17, 2006

OSCON 2006

O'Reilly publishing hosts OSCON, which is a convention dedicated to open source. OSCON 2006 was my second OSCON. My first OSCON was in 2004, just after I started working on Sun's OpenSolaris team. Apologies for the delay in posting the trip report--life's been a bit hectic since July.

general impressions

At OSCON 2004 I tried to hit as many "experiences" and "how-to" talks as I could. This year I have a better understanding of the tools, so I skipped the open source how-to talks. I did go to a few "experiences" talks, in the hopes that I'd learn something that I was overlooking in my work with OpenSolaris. While there was good information in those talks, they weren't the learning experience I was hoping for. I did have good luck with other talks that I went to just because they seemed interesting. More on that below.

I also spent a few hours helping staff Sun's booth at OSCON. This was quite a contrast from JavaOne. At the JavaOne booth, I spent a lot of time talking about OpenSolaris and why Sun is doing it. At OSCON, pretty much everyone knew about OpenSolaris. I did get a question about the status of the ksh93 integration work. There was also someone asking about ZFS and what's so cool about it (he went away suitably impressed). But a lot of the questions were either about support for specific devices--which I couldn't answer--or about things unrelated to Solaris.

Wednesday talks

The first talk I went to was about the use of open source by the US government, particularly the Department of Defense (DoD). Open source software is already used in government systems, including the military. Despite that, some people in government find "open source" to be scary[1]. Also, the DoD is interested in more than just software. So they tend to talk about "open technology development", rather than "open source". The emphasis is on open standards and interfaces, not implementations.

The benefits that the DoD hopes to get from open technology include support for dispersed teams, technological agility (e.g., avoid vendor lock-in), and efficient use of money (avoid duplicate work).

The DoD has several interesting issues that it has to deal with. One issue is how to handle security concerns, e.g., how to participate without revealing classified information. Another issue is that the US government is not allowed to hold copyright on anything, so what happens when someone in the DoD wants to contribute code back to a project? A third issue is regulatory requirements. For example, there are regulations that bound the profit that a company can make on a government contract. So suppose there are two bids, one based on open source and one based on proprietary software that was developed from scratch. It's conceivable that the open source bid would cost less but would be ruled out because it gives the vendor too much of a profit.

The second talk was an experience talk about open sourcing the MySQL Clusters code, which had been developed at Ericsson and then sold to MySQL. The talk was structured as a series of "shocks" that the development team had to deal with.

Shock 1 was that the code needed to install in less than 15 minutes. Prior to this, the team was proud of the fact that they had gotten the install time down from 1-2 days to 3-4 hours. But people can be impatient--if it doesn't install quickly enough, they'll give up and move on to something else that looks cool. And the database that gets included in a final product is often the database that was used for the prototype. Ease of installation means increased likelihood of being used for the prototype, which means increased likelihood of being used in someone's final product.

Shock 2 was what "easy to understand" means. At Ericsson the documentation could assume that the reader understood the basic concepts, because there were people whose job was to help the customer understand those issues. As an open source project, the documentation had to stand by itself. Also, the documentation (and code) got a lot more exposure as open source, so the weak spots showed up more clearly. Since going open source, they've put more documentation in the code and have less design documentation. In the future, they'd like to have more design documentation, which they plan to publish for early community feedback.

Shock 5 was that all their bug reports must be published on the web. Even security bugs. The reason they can get away with this for security bugs is that they don't have many, and they're usually fixed quickly.[2]

Shock 6 was adapting to distributed teams. One change was that they had to write more things down than they used to. They also use plain text more than they used to. They do have annual meetings for the whole developer organization, plus individual teams can get together more frequently if it seems necessary.

Shock 7 was the increase in email load. They also use IRC, but they're starting to move towards more use of the telephone. The advantage of asynchronous communication is that it encourages self-sufficiency, but it also makes it easier for people to proceed along the wrong track. They have been talking about using distributed whiteboards, but that hasn't happened yet, though they do sometimes use screen.

Shock 9 was the use of agile development techniques, such as monthly sprints. That is, they pick the goals for the month and then focus on them. They take less interruptions than they used to; those issues are instead deferred to the next month's sprint.

Shock 10 was the constant stream of feedback from the community.

The third talk was another experiences talk about opening closed code that BEA had acquired. This talk focused more on business issues. For example, the speaker (Neelan Choksi) talked about how guerilla marketing does not mean there is no place for more traditional marketing. He mentioned that BEA is out-sourcing their professional and training services. He said BEA isn't really set up to do it themselves, and that out-sourcing these services helps grow the community.

The fourth talk was about the best and worst of open source tactics. This talk was a grab-bag of things that Cliff Schmidt had found to work well, plus a few things that don't work so well.

Phased delivery seems to be useful. One slide was about the "maturity sweet spot": the code works well enough that people can play with it, but it could be even better with some help. Another slide talked about a "series of film shorts" model; he used OpenSolaris, and how Sun is delivering it in phases, as an example of this.

Modularity is important, of course ("modularity or death!"). It's what lets random people go off and hack on things and be able to easily integrate their changes later.

Some things to think about when implementing to a standard:

  • How is the standard licensed? For example, what is the patent clause, if any?
  • How mature is the standard? If the standard is not "done", make sure you'll continue to have access to the standard as it evolves.
  • Does the standards body encourage open participation?

Related to that was a caution about how hard it is to create a de-facto standard yourself (the "ubiquity play" model of open source). If there are competing standards, consider jumping on your competitor's bandwagon. I suppose this could include some sort of migration functionality, as well as finding ways to interoperate.

If you're trying to establish a standard platform, it's important that the platform be able to evolve gracefully. Focus on interfaces, and lay down the backward compatibility rules early on.

Marketing mistakes to avoid include marketing vaporware, tunnel vision, promoting your company over the community, ignorance of the "live web", and shooting yourself in the foot when selling your support services.

There was a Solaris BOF Wednesday evening, but I missed it. There was a reception that I had been planning to graze at before the BOF, but it turned out to have mostly food I couldn't eat. So I went off in search of a restaurant. By the time I got back, the BOF was pretty much over.[3]

Thursday talks

The first talk that I attended on Thursday was a fascinating talk about the history of copyright by Karl Fogel. Briefly, the introduction of the printing press made it easier for people to produce anti-government leaflets. The English government responded by granting monopoly powers over printing presses and distribution of printed works to a "stationers guild". In return, the guild had to run everything past government censors. Eventually the makeup of the government changed, and in the late 1600s Parliament decided to revoke this monopoly. The guild proposed copyright in response. It helped them retain some of the control and income that they had as a monopoly. Also, by basing copyright in property law, they made it harder for the government to take away, compared to how easily Parliament had dissolved the original monopoly setup.

Karl's point is that copyright is designed primarily to benefit the distributor, not the artist or author. So now that digital technology has made copying and distribution even easier than before, what should we do with current copyright law?

The second talk was Guido van Rossum's talk about Python 3000, especially about how he is approaching it and what some of the changes are likely to be. The actual release will probably be called Python 3.0. The "3000" name was a dig at "Windows 2000".

One theme for 3.0 is to take the opportunity to fix some bugs from the early design of Python. But it's not a redesign from the ground up. No major changes to the look-and-feel are on the table (e.g., no macros). Nor will the changes be decided by a community vote. Guido will make the final decision(s), with lots of community input.

Some of the things that will go away in Python 3000 are classic classes, string exceptions, differences between int and long, and at least some MT-unsafe APIs. Other incompatible changes include additional keywords, incompatible changes to various methods, and making strings Unicode (actual encoding has not been decided yet).

To go along with the strings changes there will be a new "bytes" data type for byte arrays, which will have some string-like methods, e.g., find(), but not things like case conversion. All data either be binary or text, and you won't be able to mix them. Guido mentioned learning from the Java streams API for things like stackable components and how to handle Unicode issues.

The time frame for Python 3.0 is still unclear. Guido was thinking of maybe doing an alpha release in early 2007, with the final release around a year later.

The migration from 2.x to 3.0 still needs to be worked out. Issues include the time frame, what 3.0 features to back-port to 2.x, and what migration tools to provide.

The challenge for migration tools is that there's a lot of information that's only available at runtime. The current plan is to have static analysis tools that will do around 80% of the job, and to provide an instrumented 2.x runtime that will warn about doomed code.

People who would like to keep current on Guido's plans for Python 3000 can follow his blog at artima.com/weblogs/.

The third talk I went to on Thursday was Simon Phipps' talk on Sun's Open Source Strategy. Part of the talk was explaining why Sun has not open sourced Java until now. Another part of the talk was about recent work, such as making the JDK redistributable. Someone asked if the compatibility test suite (TCK) will be open sourced. The answer was that folks were still trying to work that out.

The fourth Thursday talk that I went to was Jeff Waugh's talk on Building the Ubuntu Community. Some of the things that Jeff said are important for a community are shared values, shared vision, and governance. He broke down governance into 3 areas: code of conduct[4], technical policies, and governance policies. He also said that it's crucial to have people who help build the community and who keep it healthy.

Jeff talked a bit about authority and responsibility of community members. He said that communities who lack a "benevolent dictator" don't have a central person for making decisions and resolving conflicts, so it's easier for gridlock to set in. Jeff went on to say that if you give someone responsibility, they'll usually step up to it. But it's important to be clear who has the responsibility and authority for something. First, it helps other people figure out who to talk to. Second, it encourages the person to step up to the role.

Someone asked for the justification of including NVidia drivers in what's otherwise 100% free software. Jeff answered that the end-user visual experience is very important. Ubuntu has a limited number of non-free modules, all of which are drivers. Of course, they are pressuring the relevant hardware vendors to do what's needed to support open drivers.

I went to two BOFs on Thursday evening. One was the ZFS and Zones BOF; the other was the BOF on Sun's Open Source Strategy. I didn't take any notes from the ZFS and Zones BOF. I do remember that it was mostly attended by Sun employees.

The second BOF was run by Simon Phipps. He kicked off the BOF by asking the non-Sun employees to say what Sun is doing wrong. Most of the responses were familiar:

  • Sun is disjointed, non-unified, with no clear business strategy
  • Sun is not transparent enough
  • Sun is not getting its open source story out there and visible enough
  • Solaris needs broader hardware support and a better out-of-the-box experience for desktop users
  • Sun is not explaining how OpenSolaris and Solaris Express are well-suited to hobbyists
  • patches are not readily available
  • Sun is using the CDDL instead of the GPL

I was surprised by a remark that Sun has an "asymmetric" relationship with the community. The copyright assignment requirement in the contributor agreement was pointed at as an example of this. So perhaps one of the things Sun is doing wrong is not explaining the contributor agreement well enough. Later in the BOF, Simon mentioned that all Sun open source projects (JDK 6, OpenOffice, OpenSolaris, etc.) use the same joint copyright assignment.

At one point in the BOF there was a description of where Sun expects to find new customers: companies who want to put together a solution from Sun products, perhaps in combinations with others' products. Sun's value-add would be the ability to put the solution together more cheaply than the customer could. Someone pointed out that even with this business plan, Sun still has to provide things like a good desktop, in order to attract developers.

Friday talks

The Friday talks were fun. In the first one, Jonathan Oxer talked about using scripting languages to control hardware. He started by talking about the different ports that are available on a typical computer, with parallel ports being the easiest to work with, and IR ports not being as useful as the others. The reason that parallel ports are easy is that you can set or read bits directly--there are no protocols that you have to deal with. Most scripting languages do require a helper program to access the port. With Linux the parallel port is available to C programs (using <sys/io.h>) as a memory-mapped address. The helper program could also map the port to a network socket.

Jon also talked a bit about safety. Parallel ports are safe in that the signal is only 5 volts. On the other hand, if your application controls power to appliances or other things that you might plug into a wall outlet, Jon recommended using switchable power boxes, rather than messing with 110V (or higher) directly.

Jonathan then demoed several applications. One application would send his cell phone a text message when his mailbox at home was opened and closed (i.e., when he had mail). Another application was a magnetic lock that uses RFID tags as keys.

The last talk I went to was by Michael Sparks, who works in a research group at the BBC. The BBC generates a lot of audio and video data[5], and they want easy ways to manage and manipulate it. Michael talked about Kamaelia, which is a Python application that lets them do that. Kamaelia provides a toolbox of simple components that can be pipelined together using Python generators. Developers don't have to deal with concurrency issues thanks to the pipeline structure. Nor do they have to deal with low-level details related to multimedia data, because that's all managed by the components in the toolbox.

Kamaelia currently only runs on Linux because drivers for some of their hardware are not available on other flavors of Unix.

Wrap-up

All in all, it was a good week: lots of people doing interesting stuff, and Portland is always a fun city to visit. Next time I hope I finally make it to Powell's.


[1] This appears to be generational, with most of the concern coming from people who are older than 45.

[2] And they presumably don't have to coordinate the announcement of the fix with other vendors.

[3] It turns out that there's a perfectly fine sandwich shop a couple blocks from the convention center. But I didn't find out about it until Thursday.

[4] One of the rules in the code of conduct is that the code of conduct is not to be used as a weapon.

[5] One channel of video for a month is around 200 GB.


Technorati tags: OSCON OSCON06

Thursday Aug 10, 2006

FCS Quality All the Time

As one might expect, working with the external community on OpenSolaris is a bit of a learning experience, for both the people who work for Sun and the people who don't. Differences in goals, assumptions, and so on lead to different ways of doing things. One difference that I've seen pop up recently has to do with ON's policy of "FCS Quality All the Time" (aka Production Ready All the Time).

Ideally, we'd like the ON master gate to be good enough that if someone told us to cut a release tomorrow, we'd be happy to do it. In reality, there are usually showstopper bugs that need to be fixed before we could cut a release.

So what does "FCS Quality All the Time" mean for developers?

One thing that it means is that bugs that break the build, or which make the code sufficiently unusable, must either be fixed quickly (within hours), or the gatekeeper will back out the change that introduced the problem.

Another thing that it means--and this is what I want to focus on here--is that it's not acceptable to put something into the gate with known showstopper bugs. You can't say "yeah I know it's broken, but I promise to fix it before the release closes".

There are a couple reasons for this. One reason is to avoid a Quality Death Spiral. The other reason is to avoid a firedrill at the end of the release, where you have a bunch of deferred stoppers to fix. What happens in this situation, almost always, is that everyone has to put in extra time and the release is delayed. It's stressful, and because a lot of fixes are going in at the last minute, the final quality is probably worse than it would have been if the fixes had gone in earlier[1].

Notice I said a "bunch" of bugs. If the policy is that it's okay to putback with known stoppers, then it's okay for everyone, not just your project. And with a 100 putbacks going into each build, that just doesn't scale.

A common reason for wanting to putback early is to give the code more exposure. While more exposure will help quality, it's better for project teams to make binaries available on their web site. These can be packages, BFU archives, or tarballs. It's potentially a little more work for the project team, but doing it this way is a win for the community as a whole.

So if you catch yourself thinking "it's okay, I'll just putback the fix later", be sure to ask yourself: is it okay enough to ship it as it is? If it is, great. If it's not, fix it before you putback. Everyone will thank you for it.


Notes:

[1] This is also why we require architectural approval prior to putback.


Technorati tags: OpenSolaris Solaris

Friday Apr 01, 2005

v20z and PXE boot

I've got a v20z system that I use for testing, and I want to do a fresh install of Solaris on it. There's a PXE netinstall server on the same subnet, so I figured I'd just do a PXE boot. But then I had to figure out how to PXE boot a v20z whose console is a serial line connected to a tip server. Pressing F12 (repeatedly) during boot didn't work, using either xterm or gnome-terminal.

With some systems, there's a magic key sequence you have to enter instead of F12. With other systems, you press F2 to get into the BIOS, and you reconfigure the BIOS to boot off the network. The v20z is one of the latter, but there are a couple gotchas.

  1. Not only do you have to put the network at the top of the boot device list (use the "+" key), but you also have to put the hard disk at the bottom of the boot device list (use the "-" key).
  2. On a Solaris color xterm, the selected BIOS menu item is invisible.

Tuesday Nov 09, 2004

driver_aliases Magic

I just learned a trick for use with /etc/driver_aliases that I wanted to share.

Last Friday I did a net install of Solaris 10 on an internal test system. It's some sort of x86 box; I don't have access to the lab to verify the type. When I rebooted after the net install, the system wasn't recognizing its network. ``ifconfig -a'' would only list lo0. I would have suspected that the NIC wasn't plugged in, except I had just installed it over the network.

Alan DuBoff explained to me that a likely cause of the problem was that the card was not listed in /etc/driver_aliases. The fix procedure was pretty simple:

  1. do ``prtconf -pv'' and look for a line for the NIC that appears to contain a driver_aliases entry. In this case it was
    model: 'PCI: 14e4,1645.1028.121.15 - Broadcom 5701 Gigabit Ethernet'
  2. Add a corresponding line to /etc/driver_aliases. In this case:
    bge "pci14e4,1645.1028.121"
  3. Reboot.
About

Random information that I hope will be interesting to Oracle's technical community. The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today