Saturday Aug 04, 2012

Recovering a Totally Full ZFS Filesystem

If your ZFS filesystem is completely full, it can be difficult to free up space.  Most people's first impulse is to delete files, but that often fails because ZFS can require free space to record the deletion.  A colleague of mine ran into this a few weeks ago.  I advised him to try truncating a large file using shell redirection (e.g., "cat /dev/null > my_large_file"), but that didn't work.  He was able to free up space using the truncate command, which is available in Solaris 11.

 truncate -s 0 my_large_file

Tuesday Nov 29, 2011

IPS Facets and Info files

One of the unusual things about IPS is its "facet" feature. For example, if you're a developer using the foo library, you don't install a libfoo-dev package to get the header files. Intead, you install the libfoo package, and your facet.devel setting controls whether you get header files.

I was reminded of this recently when I tried to look at some documentation for Emacs Org mode. I was surprised when Emacs's Info browser said it couldn't find the top-level Info directory. I poked around in /usr/share but couldn't find any info files.

  $ ls -l /usr/share/info
  ls: cannot access /usr/share/info: No such file or directory

Was I was missing a package?

  $ pkg list -a | egrep "info|emacs"
  editor/gnu-emacs                                  23.1-     i--
  editor/gnu-emacs/gnu-emacs-gtk                    23.1-     i--
  editor/gnu-emacs/gnu-emacs-lisp                   23.1-     ---
  editor/gnu-emacs/gnu-emacs-no-x11                 23.1-     ---
  editor/gnu-emacs/gnu-emacs-x11                    23.1-     i--
  system/data/terminfo                              0.5.11-     i--
  system/data/terminfo/terminfo-core                0.5.11-     i--
  text/texinfo                                      4.7-      i--
  x11/diagnostic/x11-info-clients                   7.6-     i--

Hmm. I didn't have the gnu-emacs-lisp package. That seemed an unlikely place to stick the Info files, and pkg(1) confirmed that the info files were not there:

  $ pkg contents -r gnu-emacs-lisp | grep info

Well, if I have what look like the right packages but don't have the right files, the next thing to check are the facets.

The first check is whether there is a facet associated with the Info files:

  $ pkg contents -m gnu-emacs | grep usr/share/info
  dir group=bin mode=0755 owner=root path=usr/share/info
  file [...] chash=[...] group=bin mode=0444 owner=root path=usr/share/info/mh-e-1 [...] 
  file [...] chash=[...] group=bin mode=0444 owner=root path=usr/share/info/mh-e-2 [...]

Yes, they're associated with

Now let's look at the facet settings on my desktop:

  $ pkg facet
  FACETS           VALUE
  facet.locale.en* True
  facet.locale*    False    True
  facet.doc*       False

Oops. I've got man pages and various English documentation files, but not the Info files. Let's fix that:

  # pkg change-facet
  Packages to update: 970
  Variants/Facets to change:   1
  Create boot environment:  No
  Create backup boot environment: Yes
  Services to change:   1
  DOWNLOAD                                  PKGS       FILES    XFER (MB)
  Completed                              970/970     181/181      9.2/9.2
  PHASE                                        ACTIONS
  Install Phase                                226/226
  PHASE                                          ITEMS
  Image State Update Phase                         2/2
  PHASE                                          ITEMS
  Reading Existing Index                           8/8
  Indexing Packages                            970/970

Now we have the info files:

  $ ls -F /usr/share/info	  dir@      groff-2	  dired-x    groff-3	     remember

Wednesday May 30, 2007

What I Learned From Ubuntu

Mark Shuttleworth and a few Ubuntu developers stopped by the Sun Menlo Park campus on Friday May 4th. I'm not working with Ubuntu, but since I'm involved with the Solaris Companion and with general OpenSolaris issues, I wanted to see what they had to say about third-party packages and about how they do their releases.

You can organize Ubuntu packages along two dimensions. The first dimension is whether the package is free (libre). The second dimension is whether Canonical (Ubuntu's corporate sponsor) provides support (e.g., security fixes). This gives us the following table:

supported by Canonical not supported by Canonical
free main
(2,000 packages)
(18,000 packages)
not free restricted
(5 packages)
(200 packages)

Notice that Canonical only supports 10% of the packages in the distro.

There are two levels of access to the third-party packages. The first level is an engineering repository which bypasses Canonical. That is, people can update the repository at any time, without regard to the Ubuntu release schedule. The second level is the actual distro, which has tighter controls.

Some of the packages are available on the Ubuntu CD, but many are only available via network download. Canonical does not track the downloads. This would be heresy inside Sun, where there's a big emphasis on measuring things. But Mark said that Canonical doesn't really care about the download numbers, and it would be difficult to get accurate numbers anyway (e.g., because of mirroring).

Someone asked Mark how they deal with packages that potentially infringe on a patent. Mark said that there's no such thing as a global patent, so those packages are allowed in the distro, but they're only available via network download. The user self-certifies that it's okay for him or her to use the package.

Another issue that comes up with third-party packages is how to track bugs. Mark talked about this a bit, and it's is something we're facing with OpenSolaris, too. The basic problem is that for a given package, there may be two bug databases: one deployed by the upstream project and one deployed by the distro. So far, the industry best practice seems to be to push distro-independent information to the upstream database, leaving distro-specific details in the distro's database. This approach is less than ideal, because it requires a fair amount of manual effort to track the bug status and to keep the right information in the right database. Canonical developed a tracking application called Launchpad to help deal with this, but Mark mentioned that it's still not quite what they want, and that Canonical might be revisiting the issue in a couple years. It'd be nice if the Ubuntu and OpenSolaris communities could somehow work together on that.

Mark spent a little time describing Launchpad, and it does have some nice bug-tracking features. For example, you can create hyperlinks to the upstream database entry, and Launchpad can automatically query the upstream database to get the bug's status.

Launchpad also has more general collaboration support, such as mailing lists, project web space, and a code repository. Launchpad includes features that would be useful on, like a translation tracker and an application for proposing and tracking project ideas.

The other major topic that I was interested in was how Ubuntu releases are done. Ubuntu releases follow a train model, with releases appearing every 6 months. There is support for 18 months, except for Long Term Support (LTS) releases, where servers are supported for 5 years. For those who are not familiar with the train model, the basic idea is that if your code is not ready in time, it is bumped to the next release, rather than delaying the current release.

Sun tried a train model for Solaris in the 1990s, with releases every 6 months[1]. It didn't work for us, and we eventually gave it up. I wasn't involved with Solaris release management, so I probably have a limited perspective on what all the issues were. But as a developer I could see a couple things that contributed to abandoning 6-month trains.

The first problem that I saw was that we didn't stick to the cutoff dates. There was often some new feature that just couldn't wait for the next train, so we would bend the rules and let changes integrate after the nominal cutoff[2]. I suppose that having a late binding mechanism makes sense for exceptional circumstances, but I think it got overused. These days, it seems like late binding isn't just a safety net to keep the release from falling apart, it's a regular phase in the release cycle. I suppose the net effect isn't too horrible--it's effectively a gradual freezing of the code, rather than a hard freeze. But it does push back the real, final freeze date, which then reduces the time that is available for later parts of the release cycle.

This ties in to the other problem that I saw, which was that the Beta Test period was too short. I forget how long the Beta periods were, but they were short enough that by the time customers had actually deployed the code, identified and reported issues, and we had worked out a fix, it was too late to get the fix into that release.

Of course, this begs the question of why Canonical doesn't have the same problems with Ubuntu.

One explanation is that much of what goes into Ubuntu comes from an upstream source and is already (more or less) stable. There is some original work done for Ubuntu, but it's not the "deep R&D"[3] of things like SMF, DTrace, or ZFS. It's hard to predict the schedule for cutting-edge projects, particularly ones that affect large parts of the system. That's not an entirely satisfactory answer, though, because according to the train model, if a project is late, you just bump it to the next release. So there must be more going on than that.

One thing that could mess up a train model is technical dependencies. Suppose Project A depends on Project B. If you integrate parts of A under the assumption that B will integrate later in the release, there will be a strong temptation to delay the release if B is late. The Ubuntu folks try to avoid this problem by avoiding dependencies on upstream cde that's scheduled to be released near the feature freeze. How strict they are about this depends in part on how much they trust the upstream provider to meet its schedule. And in a pinch, they might take beta code if it's deemed to be stable enough. I don't know if technical dependencies were a factor in moving a way from the train model for Solaris releases. It shouldn't have been an issue for the OS/Net consolidation ("FCS Quality All the Time"), but I don't know about Solaris as a whole.

I suppose there could have also been a sort of "marketing and PR" dependency problem, where we feared a loss of face if Feature X didn't make its target release. I don't know if this was actually an issue, but Sun does seem to like big, flashy announcements, and there are quite a few analyst briefings that happen under embargo[4] prior to these events.

Another explanation for why Canonical can make 6-month trains work is that the 6-month releases serve a different target market than the one Solaris has been in. A noticeable chunk of the Solaris user base would go nuts with a 6-month release cycle and 18-month support tail. As soon as they got one release qualified and deployed, they'd have to do it all over again.

So one thing we might want to look at for Solaris is to have two release vehicles, similar to the 6-month and LTS releases that Canonical is doing with Ubuntu. But there are still some issues with that model that we'd want to figure out. For example, the Ubuntu folks said that most of the Ubuntu LTS customers just want security fixes, whereas Solaris customers often demand patches for non-security bugs.

Another thing that distinguishes Ubuntu releases from the 6-month Solaris trains is when customers actually get the bits to play with. There are only 3 weeks between the Beta release and final release for Gutsy, but there will be six snapshots that are available sooner, with the first (fairly unstable) one appearing 16 weeks before the Beta release. This gives users a larger window than we had with the 6-month Solaris trains in which to try out the release and give feedback.

So, to sum it all up: I learned that distros can successfully deal with issues that OpenSolaris and Sun are facing, like how to provide the many third-party packages that users want, and how to keep them current. What we need to do now is figure out how to make it work for OpenSolaris, without sacrificing the stability that attracted many Solaris users in the first place.

[1] The internal code names for SunOS 5.2, 5.3, and 5.4 were on493, on1093, and on494, respectively.

[2] At some point we came up with a formalized "late binding" process, but I don't remember just when that was introduced.

[3] That's the term Mark used.

[4] That is, the analyst isn't allowed to publish anything about it before a certain date and time.


Random information that I hope will be interesting to Oracle's technical community. The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.


« March 2017