Saturday Mar 29, 2014

15 years

Fifteen years ago today, March 29, 1999, I showed up at Sun’s MPK29 building for my first day of work as a student intern. I was off-cycle from the normal summer interns, since I wasn’t enrolled Spring semester but was going to finish my last two credits (English & Math) during the Summer Session at Berkeley. A friend from school convinced his manager at Sun to give me a chance, and after interviewing me, they offered to bring me on board for 3 months full-time, then dropping down to half-time when classes started.

I joined the Release Engineering team as a tools developer and backup RE in what was then the Power Client Desktop Software organization (to differentiate it from the Thin Client group) in the Workstation Products Group of Sun’s hardware arm. The organization delivered the X Window System, OpenWindows, and CDE into Solaris, and was in the midst of two big projects for the Solaris 7 8/99 & 11/99 releases: a project called “SolarEZ” to make the CDE desktop more usable and to add support for syncing CDE calendar & mail apps with Palm Pilot PDAs; and the project to merge the features from X11R6.1 through X11R6.4 (including Xinerama, Display Power Management, Xprint, and LBX) into Sun’s proprietary X11 fork, which was still based on the X11R6.0 release.

I started out with some simple bug fixes to learn the various build systems, before starting to try to write the scripts they wanted to simplify the builds. My first bug was to find out why every time they did a full build of Sun’s X gate they printed a copy of the specs. (To save trees, they’d implemented a temporary workaround of setting the default printer queue on the build machines to one not connected to a printer, but then they had to go delete all the queued jobs every few days to free disk space.) After a couple days of digging, and learning far too much about Imake for my own good, I found the cause was a merge error in the Imake configuration specs. The TroffCmd setting for how to convert troff documents to PostScript had been resynced to the upstream setting of psroff, but the flags were still the ones for Sun’s troff. These flags to Sun’s troff generated a PostScript file on disk, while they made psroff send the PostScript to the printer - switching TroffCmd back to Solaris troff solved the issue, so I filed Sun bug 4227466 and made my first commit to the Solaris X11 code base.

Six months later, at the end of my internship, diploma in hand, Sun offered to convert my position to a regular full-time employee, and I signed on. (This is why I always get my anniversary recognition in September, since they don’t count the internship.) Six months after that, the X11R6.4 project was finishing up and several of the X11 engineers decided to move on, making an opening for me to switch to the X11 engineering team as a developer. Like many junior developers joining new teams, I started out doing a lot of bug fixes, and some RFE’s, such as integrating Xvfb & Xnest into Solaris 9.

A couple years in, Sun’s attempts to reach agreement amongst all of the CDE co-owners to open source CDE had failed, resulting only in the not-really-open-source release of OpenMotif, so Sun chose to move on once again, to the GNOME Desktop. This required us to provide a lot of infrastructure in the X server and libraries that GNOME clients needed, and I got to take on some more challenging projects such as trying to port the Render extension from XFree86 to Xsun, assisting on the STSF font rendering project, adding wheel mouse support to Xsun, and working to ensure Xsun had the underlying support needed by GNOME’s accessibility projects. Unfortunately, the results of some of those projects were not that good, but we learned a lot about how hard it was to retrofit Xsun with the newly reinvigorated X11 development work and how maintaining our own fork of all the shared code was simply slowing us down.

Then one fateful day in 2004, our manager was on vacation, so I got a call from the director of the Solaris x86 platform team, who had been meeting with video card vendors to decide what graphics to include in Sun’s upcoming AMD64 workstations. The one consistent answer they’d gotten was that the time and cost to produce Solaris drivers went up and the feature set went down if they had to port to Xsun as well, instead of being able to reuse the XFree86 driver code they were already shipping for Linux. He asked what it would take to ship XFree86 on Solaris instead, and we discussed the options, then I talked with the other X engineers, and soon I was the lead of a project to replace the Solaris X server.

This was right at the time XFree86 was starting to come apart while the old X.Org industry consortium was trying to move X11 to an open development model, resulting in the formation of the X.Org Foundation. We chose to just go straight to Xorg, not XFree86, and not to fork as we’d done with Xsun, but instead to just apply any necessary changes as patches, and try to limit those to not changing core areas, so it was easier to keep up with new releases.

Thus, right at the end of the Solaris 10 development cycle we integrated the Xorg 6.8 X server for x86 platforms, and even made it the default for that release. We had no SPARC drivers ported yet, just all the upstream x86 drivers, so only provided Xsun in the SPARC release. The SPARC port, along with a 64-bit build for x64 systems, came in later Solaris 10 update releases. This worked out much better than our attempts to retrofit Xsun ever did.

Along with the Solaris 10 release, Sun announced that Solaris was going to be open sourced, and eventually as much of the OS source as we could release would be. But for the Solaris 10 release we only had time to add the Xorg server and the few libraries we needed for the new extensions. Most of the libraries and clients were still using the code from Sun’s old X11R6 fork, and we needed to figure out which changes we could publish and/or contribute back upstream. We weren’t ready to do this yet on the opening day of OpenSolaris.org, but I put together a plan for us to do so going forward, combing the tasks of migrating from our fork to a patched upstream code base, and the work of excising proprietary bits so it could be released as fully open source.

Due to my work with the X.Org open source community, my participation in the existing Solaris communities on Usenet and Yahoo Groups, and interest in X from the initial OpenSolaris participants, I was pulled into the larger OpenSolaris effort, culminating in serving 2 years on the OpenSolaris Governing Board. When the OpenSolaris distro effort (“Project Indiana”) kicked off, I got involved in it to figure out how to integrate the X11 packages to it. Between driving the source tree changes and the distro integration work for X, I became the Tech Lead for the Solaris X Window System for the Solaris 11 release.

Of course, the best laid plans are always beset by changes in the surrounding world, and these were no different. While we finished the migration to a pure open-source X code base in November 2009, it was shortly followed by the closing of Oracle’s acquisition of Sun, which eventually led to the shutdown of the OpenSolaris project. Since the X11 code is still open source and mostly external sources, it is still published on solaris-x11.java.net, alongside other open source code included in Solaris. So when Solaris 11 was released in November 2011, while the core OS code was closed again, it included the open source X11 code with the culmination of our efforts.

Solaris 11 also included a few bug fixes I’d found via experimenting with the Parfait static analysis tool created by Sun Labs. Since shortly after starting in the X11 group, I’d been handling the security bugs in X. Having tools to help find those seemed like a good idea, and that was reinforced when we used Coverity to find a privilege escalation bug in Xorg. When the Sun Labs team came to demo parfait to us, I decided to try it out, and found and fixed a number of bugs in X with it (though not in security sensitive areas, just places that could cause memory leaks or crashes in code not running with raised privileges).

After the Oracle acquisition, part of adopting Oracle’s security assurance policies was using static analysis on our code base. With my experience in using static analyzers on X, I helped create the overall Solaris plan for using static analysis tools on our entire code base. This plan was accepted and I was asked to take on the role of leading our security assurance efforts across Solaris.

Another part of integrating Sun into Oracle was migrating all the bugs from Sun’s bug database into Oracle’s, so we could have a unified system, allowing our Engineered Systems teams to work together more productively, tracking bugs from the hardware through the OS up to the database & application level in a single system. Valerie Fenwick & Scott Rotondo had managed the Solaris bug tracking for years, and I’d worked with them, representing both the X11 and larger Desktop software teams. When Scott decided to move to the Oracle Database Security team, Valerie asked me to replace him, just as the effort to migrate the bugs was beginning. That turned into 2 years of planning, mapping, updating tools and processes, coordinating changes across the organization, traveling to multiple sites to train our engineers on the Oracle tools, and lots of communication to prepare our teams for the changes.

As we looked to the finish line of that work, planning was beginning for the Solaris 12 release. All of the above experiences led to me being named as the Technical Lead for the Solaris 12 release as a whole, covering the entire OS, coordinating and working with the tech leads for the individual consolidations. We’re two years into that now, and while I can’t really say much more yet than the official roadmap, I think we’ve got some solid plans in place for the OS.

So here it is now 15 years after starting at Sun, 14 after joining the X11 team. I still do a little work on X in my spare time (especially around the area of X security bugs), and officially am still in the X11 group, but most of my day is spent on the rest of Solaris now, between planning Solaris 12, managing our bug tracking, and ensuring our software is secure.

In those 15 years, I’ve learned a lot, accomplished a bit, and made a lot of friends. I’ve also:

  • filed 2572 bugs & enhancement requests in the Sun/Oracle bug database, and fixed or closed 2155, assuming I got the bug queries correct.
  • worked under 3 CEO’s (Scott, Jonathan, Larry), and 4 managers, but more VP’s and Directors than I can count thanks to Sun’s love of regular reorgs.
  • been part of the Workstation Products Group, the Webtop Applications group, the Software Globalization group, the User Experience Engineering group, the x64 Platform group, the Solaris Open Source group, and a few I’ve forgotten, again thanks to Sun’s love of regular reorgs.
  • been seen by my co-workers in a suit exactly 4 times (my initial job interview, and funerals for 3 colleagues we’ve lost over the years).

Where will I be in 15 more years? Hard to guess, but I bet it involves making sure everything is ready for Y2038, so I can retire in peace when that arrives.

For now, I offer my heartfelt thanks to everyone whose help made the last 15 years possible - none of it was done alone, and I couldn't have got here without a lot of you.

Sunday Feb 23, 2014

The goto fail heard ‘round the world

There has been much discussion online this weekend of what happens when you introduce an extra “goto fail” line into your code, such as making a mistake using a merge tool to combine your changes into those made in parallel to a code repository under active development by other engineers. (That is the most common cause I’ve seen of such duplication in source code, and thus the one I see most likely — others guess similarly and discuss that side more in depth — but that’s not what this post is about.)

Adam Langley posted a good explanation of this bug, including a brief snippet of code showing the general class of problem:

 1     extern int f();
 2
 3     int g() {
 4             int ret = 1;
 5
 6             goto out;
 7             ret = f();
 8
 9     out:
10             return ret;
11     }
which he pointed out “If I compile with -Wall (enable all warnings), neither GCC 4.8.2 or Clang 3.3 from Xcode make a peep about the dead code.” Others pointed out that the -Wunreachable-code option is one of many that neither compiler includes in -Wall by default, since both compilers treat -Wall not as “all possible warnings” but rather more of “the subset of warnings that the compiler authors think are suitable for use on most software”, leaving out those that are experimental, unstable, controversial, or otherwise not quite ready for prime time.

Since this has a simple test case, and I’ve got a few compilers handy on my Solaris x86 machine for doing various builds with, I did some testing myself:


gcc 3.4.3

% /usr/gcc/3.4/bin/gcc -Wall -Wextra -c fail.c
[no warnings or other output]

% /usr/gcc/3.4/bin/gcc -Wall -Wextra -Wunreachable-code -c fail.c
fail.c: In function `g':
fail.c:7: warning: will never be executed

As expected, gcc warned when the specific flag was given.

gcc 4.5.2 & 4.7.3

% /usr/gcc/4.5/bin/gcc -Wall -Wextra -Wunreachable-code -c fail.c
[no warnings or other output]

% /usr/gcc/4.7/bin/gcc -Wall -Wextra -Wunreachable-code -c fail.c
[no warnings or other output]

It turns out this is due to the gcc maintainers removing the support for -Wunreachable-code in later versions.

clang 3.1

% clang -Wall -Wextra -c fail.c
[no warnings or other output]

% clang -Wall -Wunreachable-code -c fail.c
fail.c:7:8: warning: will never be executed [-Wunreachable-code]
        ret = f();
              ^
1 warning generated.

% clang -Wall -Weverything -c fail.c
fail.c:3:5: warning: no previous prototype for function 'g'
      [-Wmissing-prototypes]
int g() {
    ^
fail.c:7:8: warning: will never be executed [-Wunreachable-code]
        ret = f();
              ^
2 warnings generated.

So clang also finds it with -Wunreachable-code, which is also enabled by -Weverything, but not -Wall or -Wextra. (I didn’t see a similar -Weverything flag for gcc.)

Solaris Studio 12.3

% cc -c fail.c
"fail.c", line 7: warning: statement not reached

% lint -c fail.c

statement not reached
    (7)

Both Studio’s default compiler flags and old-school lint static analyzer caught this by default. Though, as I noted last year, warnings about some unreachable code chunks will be seen in some releases but not others, as the information available to the compiler about the control flow evolves.


As anyone who has compiled much software with more than one of the above (including just different versions of the same compiler) can attest, the warnings generated on real code bases by various compilers will almost always be different. There’s many cases where gcc & clang warn about things Studio doesn’t, and vice versa, as every compiler & analyzer has been fed different test cases over the years of bad code that developers wanted them to find, or specific problems they needed to solve. This is why when compiling the Solaris kernel, we run two compilation passes on all the sources - once with Studio cc to both get its code analysis and to make the binaries we ship, and once with gcc just to get its code analysis, plus running it through Oracle’s internal parfait static analyser as well.

Having a growing body of warnings doesn’t help either, if you never do anything about them. This is why Solaris kernel and core utility code have policies on not introducing new warnings and we’ve been making efforts in X.Org to clean up warnings, so that you can see new issues when you introduce them, not lose them in a sea of thousands of existing bits of noise.

Of course, it’s too late to stop the bug that caused this discussion from slipping into the code base, and easy to second guess after the fact how it may have been prevented. We have lots of tools at our disposal and they continue to grow and improve, including human code review (though human review is much less reliable and repeatable than most tools, humans are much more flexible at finding things the automated tools weren't prepared to catch), and we need to use multiple ones (more on the most sensitive code, such as security protocols) to improve our chances of finding and fixing the bugs before we integrate them. Hopefully we can learn from the ones that slip through how to improve our checks for the future, such as adding -Wunreachable-code compiler flags where appropriate.

Thursday Jan 02, 2014

Book Review: “Oracle Solaris 11: First Look”

I saw a tweet this weekend from Packt Publishing (@packtpub) offering their entire catalog of ebooks for $5 apiece until Friday, January 3, 2014, so I decided to check out their selection.

One of the books I found there was Oracle Solaris 11: First Look by long-time Solaris admin Phil Brown, whose website bolthole.com was a staple of the Solaris x86 mailing list a decade ago, and whose advocacy of Solaris x86 helped save it from permanent cancellation. I first met Phil when we were undergrads at U. C. Berkeley, where we served as sysadmin staff on the student-run cluster of Unix systems together, and we’ve kept in touch via the Solaris & OpenSolaris communities we both found ourselves in later.

Since it was only $5, I picked it up to see if it was worth recommending to others. When not on sale, the ebook is only $15, since it’s not a large book - the PDF is 168 pages, the epub on my iPad was 199 pages - and in both forms that includes an index of about 25 pages. That also made it a quick enough read that I could get through in an afternoon, skimming over the examples and reference materials.

This book is intended to get existing Solaris admins up to speed quickly on Solaris 11 - it’s not going to introduce the basics of system administration, but will tell you what commands to run now. If you don’t know what routers, subnets, or tunnels are, you probably want a more introductory book - if you know what they are, and need to know what to run instead of ifconfig or editing /etc/hostname.e1000g0 to configure them on Solaris 11, this book will help.

Phil’s biases as a long time server admin are obvious in some sections, such as the introduction to NWAM, the Network AutoMagic feature, which he suggests “from a server sysadmin perspective, it might perhaps be better named "Never Wake A Monster"” though he admits on a laptop it can be useful to adjust to networks in different locations. There’s also a few areas where you can tell the book was written before Solaris 11.1 was out and didn’t get updated for the latest changes.

As the author of the classic pkg-get tool for installing Solaris SVR4 packages from network repositories, he has plenty to say about the new IPS packaging system in Solaris 11 as well, providing some useful tips on finding packages and setting up local repositories, though he does discourage use of many of the more advanced pkg subcommands that can help admins take more control over exactly what gets installed and updated on their systems.

Overall it’s a decent aide, and something I may refer to in the future, as I don’t do system administration that often these days, and often need a refresher, especially when old habits no longer work. It’s more detailed in many sections that the official Transitioning From Oracle Solaris 10 to Oracle Solaris 11.1 manual, which mainly points off to the other Solaris 11 manuals for details; but concentrated only on the areas a typical system administrator will be configuring, not developer or end-user visible changes. I’d recommend it to experienced admins looking for a hands on guide for dealing with Solaris 11 systems, but it’s not the right level for those trying to plan a migration or learn Solaris administration from scratch.

As always, the above is solely my personal opinion, not an official Oracle corporate position or endorsement.

Friday Nov 01, 2013

Finding nuggets in ARC discussions

A bit over twenty years ago, Sun formed an Architecture Review Committee (ARC) that evaluates proposals to change interfaces between components in Sun software products. During the OpenSolaris days, we opened many of these discussions to the community. While they’re back behind closed doors, and at a different company now, we still continue to hold these reviews for the software from what’s now the Sun Systems Group division of Oracle.

Recently one of these reviews was held (via e-mail discussion) to review a proposal to update our GNU findutils package to the latest upstream release. One of the upstream changes discussed was the addition of an “oldfind” program. In findutils 4.3, find was modified to use the fts() function to walk the directory tree, and oldfind was created to provide the old mechanism in case there were bugs in the new implementation that users needed to workaround.

In Solaris 11 though, we still ship the find descended from SVR4 as /usr/bin/find and the GNU find is available as either /usr/bin/gfind or /usr/gnu/bin/find. This raised the discussion of if we should add oldfind, and if so what should we call it. Normally our policy is to only add the g* names for GNU commands that conflict with an existing Solaris command – for instance, we ship /usr/bin/emacs, not /usr/bin/gemacs. In this case however, that seemed like it would be more confusing to have /usr/bin/oldfind be the older version of /usr/bin/gfind not of /usr/bin/find. Thus if we shipped it, it would make more sense to call it /usr/bin/goldfind, which several ARC members noted read more naturally as “gold find” than as “g old find”.

One of the concerns we often discuss in ARC is if a change is likely to be understood by users or if it will result in more calls to support. As we hit this part of the discussion on a Friday at the end of a long week, I couldn’t resist putting forth a hypothetical support call for this command:

“Hello, Oracle Solaris Support, how may I help you?”

“My admin is out sick, but he sent an email that he put the findutils package on our server, and I can run goldfind now. I tried it, but goldfind didn’t find gold.”

“Did he get the binutils package too?”

“No he just said findutils, do we need binutils?”

“Well, gold comes in the binutils package, so goldfind would be able to find gold if you got that package.”

“How much does Oracle charge for that package?”

“It’s free for Solaris users.”

“You mean Oracle ships packages of gold to customers for free?”

“Yes, if you get the binutils package, it includes GNU gold.”

“New gold? Is that some sort of alchemy, turning stuff into gold?”

“Not new gold, gold from the GNU project.”

“Oracle’s taking gold from the GNU project and shipping it to me?”

“Yes, if you get binutils, that package includes gold along with the other tools from the GNU project.”

“And GNU doesn’t mind Oracle taking their gold and giving it to customers?”

“No, GNU is a non-profit whose goal is to share their software.”

“Sharing software sure, but gold? Where does a non-profit like GNU get gold anyway?”

“Oh, Google donated it to them.”

“Ah! So Oracle will give me the gold that GNU got from Google!”

“Yes, if you get the package from us.”

“How do I get the package with the gold?”

“Just run pkg install binutils and it will put it on your disk.”

“We’ve got multiple disks here - which one will it put it on?”

“The one with the system image - do you know which one that is?

“Well the note from the admin says the system is on the first disk and the users are on the second disk.”

“Okay, so it should go on the first disk then.”

“And where will I find the gold?”

“It will be in the /usr/bin directory.”

“In the user’s bin? So thats on the second disk?”

“No, it would be on the system disk, with the other development tools, like make, as, and what.”

“So what’s on the first disk?”

“Well if the system image is there the commands should all be there.”

“All the commands? Not just what?”

“Right, all the commands that come with the OS, like the shell, ps, and who.”

“So who’s on the first disk too?”

Picture of Abbott and Costello

“Yes. Did your admin say when he’d be back?”

“No, just that he had a massive headache and was going home after I tried to get him to explain this stuff to me.”

“I can’t imagine why.”

“Oh, is why a command too?”

“No, _why was a Ruby programmer.

“Ruby? Do you give those away with the gold too?”

“Yes, but it comes in the ruby package, not binutils.”

“Oh, I’ll have to have my admin get that package too! Thanks!”

Needless to say, we decided this might not be the best idea. Since the GNU package hasn’t had to release a serious bug fix in the new find in the past few years, the new GNU find seems pretty stable, and we always have the SVR4 find to use as a fallback in Solaris, so it didn’t seem that adding oldfind was really necessary, so we passed on including it when we update to the new findutils release.

[Apologies to Abbott, Costello, their fans, and everyone who read this far. The Gold (linker) page on Wikipedia may explain some of the above, but can’t explain why goldfind is the old GNU find, but gold is the new GNU ld.]

Sunday Aug 04, 2013

Some times it’s a book sprint, other times it’s a marathon

One of the areas the X.Org Foundation has been working on in recent years is trying to bring new developers on board. Programs like Google Summer of Code and The X.Org Endless Vacation of Code (EVoC) help with this somewhat, but we’ve also been trying to find ways to lower the barrier to entry, both so the students in those programs can learn more and so that other developers have an easier time joining us.

Some of our efforts here have been technological - one of the driving efforts of the conversions from Imake to automake and from CVS to git was to make use of tools developers would already be familiar and productive with from other projects. The Modularization project, which broke up X.Org from one giant tree into over 200 small ones, had the goal of making it possible to fix a bug in a single library or driver without having to download and build many megabytes of software & fonts that were not being changed.

Another area of effort has been around the X.Org documentation, as I’ve written about before. Several years ago, Bart Massey suggested we try organizing a Book Sprint to write a guide for new X.Org developers, similar to the sprint he’d participated in to write a Google Summer of Code Mentoring Guide and a matching GSoC Student Guide. The Board agreed to let him try organizing one, but finding a time to bring everyone together was challenging.

We finally ended up holding a first session as a virtual gathering over the internet one weekend in March 2012. Bart, Matt Dew, Keith Packard, Stéphane Marchesin, and I used Gobby and Mumble to agree on an outline and write several chapters, along with creating some accompanying diagrams using graphviz dot. Unfortunately, the combination of the work from home setting, in which people kept dropping in and out as their home life intervened, the small number of people, and lack of an experienced organizer made this not as productive as other book sprints have been for other projects.

After the first weekend we had drafts of about 7 chapters and a preface, and outlines of several more. We also had a large chunk of prewritten material on graphics drivers that Stéphane had been working on already to link with. Over the next few months, Bart edited and formatted the chapters we had, while we got a little more written, including recruiting additional authors such as Peter Hutterer.

We then gathered again, this time in person, hosted by Matthias Hopf at the Georg-Simon-Ohm-University in Nürnberg, Germany, over the two days prior to the 2012 X.Org Developer’s Conference. We had some additional people join us this time, including Martin Peres, Lucas Stach, Ben Skeggs, and probably more I’ve forgotten. At this session, Bart, Matt & I worked on polsihing the rough edges on the guide we’d been creating, while the others worked more on the driver guide that Stéphane had started. Bart figured out the tools needed to generate ebook formats of the guide, such as epub, and made a cover and pulled together an overall structure for the guide, and by the end of that sprint, we had something we could read on the Android ebook reader he'd brought with him.

And then, we waited. Bart tried to figure out how to make the setup maintainable and reproducible, and how to make it available to our users, but his day job as a University professor kept taking time away from that. Bart also gave a lightning talk on our book sprint experience at the Write the Docs conference in Portland earlier this year covering what worked and what we learned didn’t work in our virtual sprint, and asking for ideas or help on how to go forward.

Meanwhile, the burden of fighting the spammers on the X.Org & FreeDesktop.Org wikis had gotten overwhelming on the current wiki software, so the freedesktop sitewranglers evaluated different solutions, and after looking at how well ikiwiki had been working for the past few years on the XCB wiki decided to move the rest of the FreeDesktop hosted wikis to ikiwiki as well. One major advantage is that it let us use the existing freedesktop authentication infrastructure for wiki accounts instead of having to find different ways to let our users in while keeping the spammers out. Another benefit is that it uses Markdown and git to update the wiki, so we can easily push files in Markdown format to the wiki to publish them now.

In a shocking coincidence, the geeks who wrote the X.Org Developer's Guide also used Markdown & git to author it, so with just a very little conversion to make links work between sections, it was possible to publish the guide to the wiki, where you can find it now:

It’s not perfect, it’s not even fully finished, but it’s better than nothing, and since it’s now on a wiki, it’s even easier for others to fill in the missing pieces or improve the existing bits.

Stephane is still working on the companion guide to graphics drivers, covering the stack from DRM and KMS in the kernel up to Xorg & Mesa. He posted the PDF of the draft as it was at the end of the March book sprint, but not yet a version with the contributions added from the September followup.

When working on applications higher up the stack, not hacking on the graphics stack itself, we refer developers to the documentation for their toolkits, or to general guides to OpenGL programming, as those are beyond what we can document ourselves in X.Org.

And for other open source projects, if you’d like a chance to have a professionally organized, in person doc sprint for your project, applications are being taken until August 7 for 2-4 projects to hold sprints as part of the Google Summer of Code Doc Camp in October, including travel expenses for up to 5 participants from the sponsors.

Sunday Nov 11, 2012

Solaris 11.1 changes building of code past the point of __NORETURN

While Solaris 11.1 was under development, we started seeing some errors in the builds of the upstream X.Org git master sources, such as:

"Display.c", line 65: Function has no return statement : x_io_error_handler
"hostx.c", line 341: Function has no return statement : x_io_error_handler
from functions that were defined to match a specific callback definition that declared them as returning an int if they did return, but these were calling exit() instead of returning so hadn't listed a return value.

These had been generating warnings for years which we'd been ignoring, but X.Org has made enough progress in cleaning up code for compiler warnings and static analysis issues lately, that the community turned up the default error levels, including the gcc flag -Werror=return-type and the equivalent Solaris Studio cc flags -v -errwarn=E_FUNC_HAS_NO_RETURN_STMT, so now these became errors that stopped the build. Yet on Solaris, gcc built this code fine, while Studio errored out. Investigation showed this was due to the Solaris headers, which during Solaris 10 development added a number of annotations to the headers when gcc was being used for the amd64 kernel bringup before the Studio amd64 port was ready. Since Studio did not support the inline form of these annotations at the time, but instead used #pragma for them, the definitions were only present for gcc.

To resolve this, I fixed both sides of the problem, so that it would work for building new X.Org sources on older Solaris releases or with older Studio compilers, as well as fixing the general problem before it broke more software building on Solaris.

To the X.Org sources, I added the traditional Studio #pragma does_not_return to recognize that functions like exit() don't ever return, in patches such as this Xserver patch. Adding a dummy return statement was ruled out as that introduced unreachable code errors from compilers and analyzers that correctly realized you couldn't reach that code after a return statement.

And on the Solaris 11.1 side, I updated the annotation definitions in <sys/ccompile.h> to enable for Studio 12.0 and later compilers the annotations already existing in a number of system headers for functions like exit() and abort(). If you look in that file you'll see the annotations we currently use, though the forms there haven't gone through review to become a Committed interface, so may change in the future.

Actually getting this integrated into Solaris though took a bit more work than just editing one header file. Our ELF binary build comparison tool, wsdiff, actually showed a large number of differences in the resulting binaries due to the compiler using this information for branch prediction, code path analysis, and other possible optimizations, so after comparing enough of the disassembly output to be comfortable with the changes, we also made sure to get this in early enough in the release cycle so that it would get plenty of test exposure before the release.

It also required updating quite a bit of code to avoid introducing new lint or compiler warnings or errors, and people building applications on top of Solaris 11.1 and later may need to make similar changes if they want to keep their build logs similarly clean.

Previously, if you had a function that was declared with a non-void return type, lint and cc would warn if you didn't return a value, even if you called a function like exit() or panic() that ended execution. For instance:

#include <stdlib.h>

int
callback(int status)
{
    if (status == 0)
        return status;
    exit(status);
}
would previously require a never executed return 0; after the exit() to avoid lint warning "function falls off bottom without returning value".

Now the compiler & lint will both issue "statement not reached" warnings for a return 0; after the final exit(), allowing (or in some cases, requiring) it to be removed. However, if there is no return statement anywhere in the function, lint will warn that you've declared a function returning a value that never does so, suggesting you can declare it as void. Unfortunately, if your function signature is required to match a certain form, such as in a callback, you not be able to do so, and will need to add a /* LINTED */ to the end of the function.

If you need your code to build on both a newer and an older release, then you will either need to #ifdef these unreachable statements, or, to keep your sources common across releases, add to your sources the corresponding #pragma recognized by both current and older compiler versions, such as:

#pragma does_not_return(exit)
#pragma does_not_return(panic) 
Hopefully this little extra work is paid for by the compilers & code analyzers being able to better understand your code paths, giving you better optimizations and more accurate errors & warning messages.

Sunday Oct 28, 2012

Solaris 11.1: Changes to included FOSS packages

Besides the documentation changes I mentioned last time, another place you can see Solaris 11.1 changes before upgrading is in the online package repository, now that the 11.1 packages have been published to http://pkg.oracle.com/solaris/release/, as the “0.175.1.0.0.24.2” branch. (Oracle Solaris Package Versioning explains what each field in that version string means.)

When you’re ready to upgrade to the packages from either this repo, or the support repository, you’ll want to first read How to Update to Oracle Solaris 11.1 Using the Image Packaging System by Pete Dennis, as there are a couple issues you will need to be aware of to do that upgrade, several of which are due to changes in the Free and Open Source Software (FOSS) packages included with Solaris, as I’ll explain in a bit.

Solaris 11 can update more readily than Solaris 10

In the Solaris 10 and older update models, the way the updates were built constrained what changes we could make in those releases. To change an existing SVR4 package in those releases, we created a Solaris Patch, which applied to a given version of the SVR4 package and replaced, added or deleted files in it. These patches were released via the support websites (originally SunSolve, now My Oracle Support) for applying to existing Solaris 10 installations, and were also merged into the install images for the next Solaris 10 update release. (This Solaris Patches blog post from Gerry Haskins dives deeper into that subject.)

Some of the restrictions of this model were that package refactoring, changes to package dependencies, and even just changing the package version number, were difficult to do in this hybrid patch/OS update model. For instance, when Solaris 10 first shipped, it had the Xorg server from X11R6.8. Over the first couple years of update releases we were able to keep it up to date by replacing, adding, & removing files as necessary, taking it all the way up to Xorg server release 1.3 (new version numbering begun after the X11R7 split of the X11 tree into separate modules gave each module its own version). But if you run pkginfo on the SUNWxorg-server package, you’ll see it still displayed a version number of 6.8, confusing users as to which version was actually included.

We stopped upgrading the Xorg server releases in Solaris 10 after 1.3, as later versions added new dependencies, such as HAL, D-Bus, and libpciaccess, which were very difficult to manage in this patching model. (We later got libpciaccess to work, but HAL & D-Bus would have been much harder due to the greater dependency tree underneath those.) Similarly, every time the GNOME team looked into upgrading Solaris 10 past GNOME 2.6, they found these constraints made it so difficult it wasn’t worthwhile, and eventually GNOME’s dependencies had changed enough it was completely infeasible. Fortunately, this worked out for both the X11 & GNOME teams, with our management making the business decision to concentrate on the “Nevada” branch for desktop users - first as Solaris Express Desktop Edition, and later as OpenSolaris, so we didn’t have to fight to try to make the package updates fit into these tight constraints.

Meanwhile, the team designing the new packaging system for Solaris 11 was seeing us struggle with these problems, and making this much easier to manage for both the development teams and our users was one of their big goals for the IPS design they were working on. Now that we’ve reached the first update release to Solaris 11, we can start to see the fruits of their labors, with more FOSS updates in 11.1 than we had in many Solaris 10 update releases, keeping software more up to date with the upstream communities.

Of course, just because we can more easily update now, doesn’t always mean we should or will do so, it just removes the package system limitations from forcing the decision for us. So while we’ve upgraded the X Window System in the 11.1 release from X11R7.6 to 7.7, the Solaris GNOME team decided it was not the right time to try to make the jump from GNOME 2 to GNOME 3, though they did update some individual components of the desktop, especially those with security fixes like Firefox. In other parts of the system, decisions as to what to update were prioritized based on how they affected other projects, or what customer requests we’d gotten for them.

So with all that background in place, what packages did we actually update or add between Solaris 11.0 and 11.1?

Core OS Functionality

One of the FOSS changes with the biggest impact in this release is the upgrade from Grub Legacy (0.97) to Grub 2 (1.99) for the x64 platform boot loader. This is the cause of one of the upgrade quirks, since to go from Solaris 11.0 to 11.1 on x64 systems, you first need to update the Boot Environment tools (such as beadm) to a new version that can handle boot environments that use the Grub2 boot loader. System administrators can find the details they need to know about the new Grub in the Administering the GRand Unified Bootloader chapter of the Booting and Shutting Down Oracle Solaris 11.1 Systems guide. This change was necessary to be able to support new hardware coming into the x64 marketplace, including systems using UEFI firmware or booting off disk drives larger than 2 terabytes.

For both platforms, Solaris 11.1 adds rsyslog as an optional alternative to the traditional syslogd, and OpenSCAP for checking security configuration settings are compliant with site policies.

Note that the support repo actually has newer versions of BIND & fetchmail than the 11.1 release, as some late breaking critical fixes came through from the community upstream releases after the Solaris 11.1 release was frozen, and made their way to the support repository. These are responsible for the other big upgrade quirk in this release, in which to upgrade a system which already installed those versions from the support repo, you need to either wait for those packages to make their way to the 11.1 branch of the support repo, or follow the steps in the aforementioned upgrade walkthrough to let the package system know it's okay to temporarily downgrade those.

Developer Stack

While Solaris 11.0 included Python 2.7, many of the bundled python modules weren’t packaged for it yet, limiting its usability. For 11.1, many more of the python modules include 2.7 versions (enough that I filtered them out of the below table, but you can always search on the package repository server for them.

For other language runtimes and development tools, 11.1 expands the use of IPS mediated links to choose which version of a package is the default when the packages are designed to allow multiple versions to install side by side.

For instance, in Solaris 11.0, GNU automake 1.9 and 1.10 were provided, and developers had to run them as either automake-1.9 or automake-1.10. In Solaris 11.1, when automake 1.11 was added, also added was a /usr/bin/automake mediated link, which points to the automake-1.11 program by default, but can be changed to another version by running the pkg set-mediator command.

Mediated links were also used for the Java runtime & development kits in 11.1, changing the default versions to the Java 7 releases (the 1.7.0.x package versions), while allowing admins to switch links such as /usr/bin/javac back to Java 6 if they need to for their site, to deal with Java 7 compatibility or other issues, without having to update each usage to use the full versioned /usr/jdk/jdk1.6.0_35/bin/javac paths for every invocation.

Desktop Stack

As I mentioned before, we upgraded from X11R7.6 to X11R7.7, since a pleasant coincidence made the X.Org release dates line up nicely with our feature & code freeze dates for this release. (Or perhaps it wasn’t so coincidental, after all, one of the benefits of being the person making the release is being able to decide what schedule is most convenient for you, and this one worked well for me.) For the table below, I’ve skipped listing the packages in which we use the X11 “katamari” version for the Solaris package version (mainly packages combining elements of multiple upstream modules with independent version numbers), since they just all changed from 7.6 to 7.7.

In the graphics drivers, we worked with Intel to update the Intel Integrated Graphics Processor support to support 3D graphics and kernel mode setting on the Ivy Bridge chipsets, and updated Nvidia’s non-FOSS graphics driver from 280.13 to 295.20.

Higher up in the desktop stack, PulseAudio was added for audio support, and liblouis for Braille support, and the GNOME applications were built to use them.

The Mozilla applications, Firefox & Thunderbird moved to the current Extended Support Release (ESR) versions, 10.x for each, to bring up-to-date security fixes without having to be on Mozilla’s agressive 6 week feature cycle release train.

Detailed list of changes

This table shows most of the changes to the FOSS packages between Solaris 11.0 and 11.1. As noted above, some were excluded for clarity, or to reduce noise and duplication. All the FOSS packages which didn't change the version number in their packaging info are not included, even if they had updates to fix bugs, security holes, or add support for new hardware or new features of Solaris.

Package11.011.1
archiver/unrar 3.8.5 4.1.4
audio/sox 14.3.0 14.3.2
backup/rdiff-backup 1.2.1 1.3.3
communication/im/pidgin 2.10.0 2.10.5
compress/gzip 1.3.5 1.4
compress/xz not included 5.0.1
database/sqlite-3 3.7.6.3 3.7.11
desktop/remote-desktop/tigervnc 1.0.90 1.1.0
desktop/window-manager/xcompmgr 1.1.5 1.1.6
desktop/xscreensaver 5.12 5.15
developer/build/autoconf 2.63 2.68
developer/build/autoconf/xorg-macros 1.15.0 1.17
developer/build/automake-111 not included 1.11.2
developer/build/cmake 2.6.2 2.8.6
developer/build/gnu-make 3.81 3.82
developer/build/imake 1.0.4 1.0.5
developer/build/libtool 1.5.22 2.4.2
developer/build/makedepend 1.0.3 1.0.4
developer/documentation-tool/doxygen 1.5.7.1 1.7.6.1
developer/gnu-binutils 2.19 2.21.1
developer/java/jdepend not included 2.9
developer/java/jdk-6 1.6.0.26 1.6.0.35
developer/java/jdk-7 1.7.0.0 1.7.0.7
developer/java/jpackage-utils not included 1.7.5
developer/java/junit 4.5 4.10
developer/lexer/jflex not included 1.4.1
developer/parser/byaccj not included 1.14
developer/parser/java_cup not included 0.10
developer/quilt 0.47 0.60
developer/versioning/git 1.7.3.2 1.7.9.2
developer/versioning/mercurial 1.8.4 2.2.1
developer/versioning/subversion 1.6.16 1.7.5
diagnostic/constype 1.0.3 1.0.4
diagnostic/nmap 5.21 5.51
diagnostic/scanpci 0.12.1 0.13.1
diagnostic/wireshark 1.4.8 1.8.2
diagnostic/xload 1.1.0 1.1.1
editor/gnu-emacs 23.1 23.4
editor/vim 7.3.254 7.3.600
file/lndir 1.0.2 1.0.3
image/editor/bitmap 1.0.5 1.0.6
image/gnuplot 4.4.0 4.6.0
image/library/libexif 0.6.19 0.6.21
image/library/libpng 1.4.8 1.4.11
image/library/librsvg 2.26.3 2.34.1
image/xcursorgen 1.0.4 1.0.5
library/audio/pulseaudio not included 1.1
library/cacao 2.3.0.0 2.3.1.0
library/expat 2.0.1 2.1.0
library/gc 7.1 7.2
library/graphics/pixman 0.22.0 0.24.4
library/guile 1.8.4 1.8.6
library/java/javadb 10.5.3.0 10.6.2.1
library/java/subversion 1.6.16 1.7.5
library/json-c not included 0.9
library/libedit not included 3.0
library/libee not included 0.3.2
library/libestr not included 0.1.2
library/libevent 1.3.5 1.4.14.2
library/liblouis not included 2.1.1
library/liblouisxml not included 2.1.0
library/libtecla 1.6.0 1.6.1
library/libtool/libltdl 1.5.22 2.4.2
library/nspr 4.8.8 4.8.9
library/openldap 2.4.25 2.4.30
library/pcre 7.8 8.21
library/perl-5/subversion 1.6.16 1.7.5
library/python-2/jsonrpclib not included 0.1.3
library/python-2/lxml 2.1.2 2.3.3
library/python-2/nose not included 1.1.2
library/python-2/pyopenssl not included 0.11
library/python-2/subversion 1.6.16 1.7.5
library/python-2/tkinter-26 2.6.4 2.6.8
library/python-2/tkinter-27 2.7.1 2.7.3
library/security/nss 4.12.10 4.13.1
library/security/openssl 1.0.0.5 (1.0.0e) 1.0.0.10 (1.0.0j)
mail/thunderbird 6.0 10.0.6
network/dns/bind 9.6.3.4.3 9.6.3.7.2
package/pkgbuild not included 1.3.104
print/filter/enscript not included 1.6.4
print/filter/gutenprint 5.2.4 5.2.7
print/lp/filter/foomatic-rip 3.0.2 4.0.15
runtime/java/jre-6 1.6.0.26 1.6.0.35
runtime/java/jre-7 1.7.0.0 1.7.0.7
runtime/perl-512 5.12.3 5.12.4
runtime/python-26 2.6.4 2.6.8
runtime/python-27 2.7.1 2.7.3
runtime/ruby-18 1.8.7.334 1.8.7.357
runtime/tcl-8/tcl-sqlite-3 3.7.6.3 3.7.11
security/compliance/openscap not included 0.8.1
security/nss-utilities 4.12.10 4.13.1
security/sudo 1.8.1.2 1.8.4.5
service/network/dhcp/isc-dhcp 4.1 4.1.0.6
service/network/dns/bind 9.6.3.4.3 9.6.3.7.2
service/network/ftp (ProFTPD) 1.3.3.0.5 1.3.3.0.7
service/network/samba 3.5.10 3.6.6
shell/conflict 0.2004.9.1 0.2010.6.27
shell/pipe-viewer 1.1.4 1.2.0
shell/zsh 4.3.12 4.3.17
system/boot/grub 0.97 1.99
system/font/truetype/liberation 1.4 1.7.2
system/library/freetype-2 2.4.6 2.4.9
system/library/libnet 1.1.2.1 1.1.5
system/management/cim/pegasus 2.9.1 2.11.0
system/management/ipmitool 1.8.10 1.8.11
system/management/wbem/wbemcli 1.3.7 1.3.9.1
system/network/routing/quagga 0.99.8 0.99.19
system/rsyslog not included 6.2.0
terminal/luit 1.1.0 1.1.1
text/convmv 1.14 1.15
text/gawk 3.1.5 3.1.8
text/gnu-grep 2.5.4 2.10
web/browser/firefox 6.0.2 10.0.6
web/browser/links 1.0 1.0.3
web/java-servlet/tomcat 6.0.33 6.0.35
web/php-53 not included 5.3.14
web/php-53/extension/php-apc not included 3.1.9
web/php-53/extension/php-idn not included 0.2.0
web/php-53/extension/php-memcache not included 3.0.6
web/php-53/extension/php-mysql not included 5.3.14
web/php-53/extension/php-pear not included 5.3.14
web/php-53/extension/php-suhosin not included 0.9.33
web/php-53/extension/php-tcpwrap not included 1.1.3
web/php-53/extension/php-xdebug not included 2.2.0
web/php-common not included 11.1
web/proxy/squid 3.1.8 3.1.18
web/server/apache-22 2.2.20 2.2.22
web/server/apache-22/module/apache-sed 2.2.20 2.2.22
web/server/apache-22/module/apache-wsgi not included 3.3
x11/diagnostic/xev 1.1.0 1.2.0
x11/diagnostic/xscope 1.3 1.3.1
x11/documentation/xorg-docs 1.6 1.7
x11/keyboard/xkbcomp 1.2.3 1.2.4
x11/library/libdmx 1.1.1 1.1.2
x11/library/libdrm 2.4.25 2.4.32
x11/library/libfontenc 1.1.0 1.1.1
x11/library/libfs 1.0.3 1.0.4
x11/library/libice 1.0.7 1.0.8
x11/library/libsm 1.2.0 1.2.1
x11/library/libx11 1.4.4 1.5.0
x11/library/libxau 1.0.6 1.0.7
x11/library/libxcb 1.7 1.8.1
x11/library/libxcursor 1.1.12 1.1.13
x11/library/libxdmcp 1.1.0 1.1.1
x11/library/libxext 1.3.0 1.3.1
x11/library/libxfixes 4.0.5 5.0
x11/library/libxfont 1.4.4 1.4.5
x11/library/libxft 2.2.0 2.3.1
x11/library/libxi 1.4.3 1.6.1
x11/library/libxinerama 1.1.1 1.1.2
x11/library/libxkbfile 1.0.7 1.0.8
x11/library/libxmu 1.1.0 1.1.1
x11/library/libxmuu 1.1.0 1.1.1
x11/library/libxpm 3.5.9 3.5.10
x11/library/libxrender 0.9.6 0.9.7
x11/library/libxres 1.0.5 1.0.6
x11/library/libxscrnsaver 1.2.1 1.2.2
x11/library/libxtst 1.2.0 1.2.1
x11/library/libxv 1.0.6 1.0.7
x11/library/libxvmc 1.0.6 1.0.7
x11/library/libxxf86vm 1.1.1 1.1.2
x11/library/mesa 7.10.2 7.11.2
x11/library/toolkit/libxaw7 1.0.9 1.0.11
x11/library/toolkit/libxt 1.0.9 1.1.3
x11/library/xtrans 1.2.6 1.2.7
x11/oclock 1.0.2 1.0.3
x11/server/xdmx 1.10.3 1.12.2
x11/server/xephyr 1.10.3 1.12.2
x11/server/xorg 1.10.3 1.12.2
x11/server/xorg/driver/xorg-input-keyboard 1.6.0 1.6.1
x11/server/xorg/driver/xorg-input-mouse 1.7.1 1.7.2
x11/server/xorg/driver/xorg-input-synaptics 1.4.1 1.6.2
x11/server/xorg/driver/xorg-input-vmmouse 12.7.0 12.8.0
x11/server/xorg/driver/xorg-video-ast 0.91.10 0.93.10
x11/server/xorg/driver/xorg-video-ati 6.14.1 6.14.4
x11/server/xorg/driver/xorg-video-cirrus 1.3.2 1.4.0
x11/server/xorg/driver/xorg-video-dummy 0.3.4 0.3.5
x11/server/xorg/driver/xorg-video-intel 2.10.0 2.18.0
x11/server/xorg/driver/xorg-video-mach64 6.9.0 6.9.1
x11/server/xorg/driver/xorg-video-mga 1.4.13 1.5.0
x11/server/xorg/driver/xorg-video-openchrome 0.2.904 0.2.905
x11/server/xorg/driver/xorg-video-r128 6.8.1 6.8.2
x11/server/xorg/driver/xorg-video-trident 1.3.4 1.3.5
x11/server/xorg/driver/xorg-video-vesa 2.3.0 2.3.1
x11/server/xorg/driver/xorg-video-vmware 11.0.3 12.0.2
x11/server/xserver-common 1.10.3 1.12.2
x11/server/xvfb 1.10.3 1.12.2
x11/server/xvnc 1.0.90 1.1.0
x11/session/sessreg 1.0.6 1.0.7
x11/session/xauth 1.0.6 1.0.7
x11/session/xinit 1.3.1 1.3.2
x11/transset 0.9.1 1.0.0
x11/trusted/trusted-xorg 1.10.3 1.12.2
x11/x11-window-dump 1.0.4 1.0.5
x11/xclipboard 1.1.1 1.1.2
x11/xclock 1.0.5 1.0.6
x11/xfd 1.1.0 1.1.1
x11/xfontsel 1.0.3 1.0.4
x11/xfs 1.1.1 1.1.2

P.S. To get the version numbers for this table, I ran a quick perl script over the output from:

% pkg contents -H -r -t depend -a type=incorporate -o fmri \
  `pkg contents -H -r -t depend -a type=incorporate -o fmri entire@0.5.11,5.11-0.175.1.0.0.24` \
  | sort >> /tmp/11.1
% pkg contents -H -r -t depend -a type=incorporate -o fmri \
  `pkg contents -H -r -t depend -a type=incorporate -o fmri entire@0.5.11,5.11-0.175.0.0.0.2` \
  | sort >> /tmp/11.0

Thursday Oct 25, 2012

Documentation Changes in Solaris 11.1

One of the first places you can see Solaris 11.1 changes are in the docs, which have now been posted in the Solaris 11.1 Library on docs.oracle.com. I spent a good deal of time reviewing documentation for this release, and thought some would be interesting to blog about, but didn't review all the changes (not by a long shot), and am not going to cover all the changes here, so there's plenty left for you to discover on your own.

Just comparing the Solaris 11.1 Library list of docs against the Solaris 11 list will show a lot of reorganization and refactoring of the doc set, especially in the system administration guides. Hopefully the new break down will make it easier to get straight to the sections you need when a task is at hand.

Packaging System

Unfortunately, the excellent in-depth guide for how to build packages for the new Image Packaging System (IPS) in Solaris 11 wasn't done in time to make the initial Solaris 11 doc set. An interim version was published shortly after release, in PDF form on the OTN IPS page. For Solaris 11.1 it was included in the doc set, as Packaging and Delivering Software With the Image Packaging System in Oracle Solaris 11.1, so should be easier to find, and easier to share links to specific pages the HTML version.

Beyond just how to build a package, it includes details on how Solaris is packaged, and how package updates work, which may be useful to all system administrators who deal with Solaris 11 upgrades & installations. The Adding and Updating Oracle Solaris 11.1 Software Packages was also extended, including new sections on Relaxing Version Constraints Specified by Incorporations and Locking Packages to a Specified Version that may be of interest to those who want to keep the Solaris 11 versions of certain packages when they upgrade, such as the couple of packages that had functionality removed by an (unusual for an update release) End of Feature process in the 11.1 release.

Also added in this release is a document containing the lists of all the packages in each of the major package groups in Solaris 11.1 (solaris-desktop, solaris-large-server, and solaris-small-server). While you can simply get the contents of those groups from the package repository, either via the web interface or the pkg command line, the documentation puts them in handy tables for easier side-by-side comparison, or viewing the lists before you've installed the system to pick which one you want to initially install.

X Window System

We've not had good X11 coverage in the online Solaris docs in a while, mostly relying on the man pages, and upstream X.Org docs. In this release, we've integrated some X coverage into the Solaris 11.1 Desktop Adminstrator's Guide, including sections on installing fonts for fontconfig or legacy X11 clients, X server configuration, and setting up remote access via X11 or VNC. Of course we continue to work on improving the docs, including a lot of contributions to the upstream docs all OS'es share (more about that another time).

Security

One of the things Oracle likes to do for its products is to publish security guides for administrators & developers to know how to build systems that meet their security needs. For Solaris, we started this with Solaris 11, providing a guide for sysadmins to find where the security relevant configuration options were documented. The Solaris 11.1 Security Guidelines extend this to cover new security features, such as Address Space Layout Randomization (ASLR) and Read-Only Zones, as well as adding additional guidelines for existing features, such as how to limit the size of tmpfs filesystems, to avoid users driving the system into swap thrashing situations.

For developers, the corresponding document is the Developer's Guide to Oracle Solaris 11 Security, which has been the source for years for documentation of security-relevant Solaris API's such as PAM, GSS-API, and the Solaris Cryptographic Framework. For Solaris 11.1, a new appendix was added to start providing Secure Coding Guidelines for Developers, leveraging the CERT Secure Coding Standards and OWASP guidelines to provide the base recommendations for common programming languages and their standard API's. Solaris specific secure programming guidance was added via links to other documentation in the product doc set.

In parallel, we updated the Solaris C Libary Functions security considerations list with details of Solaris 11 enhancements such as FD_CLOEXEC flags, additional *at() functions, and new stdio functions such as asprintf() and getline(). A number of code examples throughout the Solaris 11.1 doc set were updated to follow these recommendations, changing unbounded strcpy() calls to strlcpy(), sprintf() to snprintf(), etc. so that developers following our examples start out with safer code. The Writing Device Drivers guide even had the appendix updated to list which of these utility functions, like snprintf() and strlcpy(), are now available via the Kernel DDI.

Little Things

Of course all the big new features got documented, and some major efforts were put into refactoring and renovation, but there were also a lot of smaller things that got fixed as well in the nearly a year between the Solaris 11 and 11.1 doc releases - again too many to list here, but a random sampling of the ones I know about & found interesting or useful:

Saturday Mar 31, 2012

Solaris: What comes next?

As you probably know by now, a few months ago, we released Solaris 11 after years of development. That of course means we now need to figure out what comes next - if Solaris 11 is “The First Cloud OS”, then what do we need to make future releases of Solaris be, to be modern and competitive when they're released? So we've been having planning and brainstorming meetings, and I've captured some notes here from just one of those we held a couple weeks ago with a number of the Silicon Valley based engineers.

Now before someone sees an idea here and calls their product rep wanting to know what's up, please be warned what follows are rough ideas, and as I'll discuss later, none of them have any committment, schedule, working code, or even plan for integration in any possible future product at this time. (Please don't make me force you to read the full Oracle future product disclaimer here, you should know it by heart already from the front of every Oracle product slide deck.)

To start with, we did some background research, looking at ideas from other Oracle groups, and competitive OS'es. We examined what was hot in the technology arena and where the interesting startups were heading. We then looked at Solaris to see where we could apply those ideas.


Making Network Admins into Socially Networking Admins

Wall art at the new Facebook HQ in the former Sun MPK Campus

We all know an admin who has grumbled about being the only one stuck late at work to fix a problem on the server, or having to work the weekend alone to do scheduled maintenance. But admins are humans (at least most are), and crave companionship and community with their fellow humans. And even when they're alone in the server room, they're never far from a network connection, allowing access to the wide world of wonders on the Internet.

Our solution here is not building a new social network - there's enough of those already, and Oracle even has its own Oracle Mix social network already. What we proposed is integrating Solaris features to help engage our system admins with these social networks, building community and bringing them recognition in the workplace, using achievement recognition systems as found in many popular gaming platforms.

For instance, if you had a Facebook account, and a group of admin friends there, you could register it with our Social Network Utility For Facebook, and then your friends might see:

Alan earned the achievement Critically Patched (April 2012) for patching all his servers.
Matt is only at 50% - encourage him to complete this achievement today!
To avoid any undue risk of advertising who has unpatched servers that are easier targets for hackers to break into, this information would be tightly protected via Facebook's world-renowned privacy settings to avoid it falling into the wrong hands.

A related form of gamification we considered was replacing simple certfications with role-playing-game-style Experience Levels. Instead of just knowing an admin passed a test establishing a given level of competency, these would provide recruiters with a more detailed level of how much real-world experience an admin has. Achievements such as the one above would feed into it, but larger numbers of experience points would be gained by tougher or more critical tasks - such as recovering a down system, or migrating a service to a new platform. (As long as it was an Oracle platform of course - migrating to an HP or IBM platform would cause the admin to lose points with us.)

Unfortunately, we couldn't figure out a good way to prevent (if you will) “gaming” the system. For instance, a disgruntled admin might decide to start ignoring warnings from FMA that a part is beginning to fail or skip preventative maintenance, in the hopes that they'd cause a catastrophic failure to earn more points for bolstering their resume as they look for a job elsewhere, and not worrying about the effect on your business of a mission critical server going down.

More Z's for ZFS

Our suggested new feature for ZFS was inspired by the worlds most successful Z-startup of all time: Zynga.

Using the Social Network Utility For Facebook described above, we'd tie it in with ZFS monitoring to help you out when you find yourself in a jam needing more disk space than you have, and can't wait a month to get a purchase order through channels to buy more. Instead with the click of a button you could post to your group:

Alan can't find any space in his server farm! Can you help?
Friends could loan you some space on their connected servers for a few weeks, knowing that you'd return the favor when needed. ZFS would create a new filesystem for your use on their system, and securely share it with your system using Kerberized NFS.

If none of your friends have space, then you could buy temporary use space in small increments at affordable rates right there in Facebook, using your Facebook credits, and then file an expense report later, after the urgent need has passed.

Universal Single Sign On

One thing all the engineers agreed on was that we still had far too many "Single" sign ons to deal with in our daily work. On the web, every web site used to have its own password database, forcing us to hope we could remember what login name was still available on each site when we signed up, and which unique password we came up with to avoid having to disclose our other passwords to a new site.

In recent years, the web services world has finally been reducing the number of logins we have to manage, with many services allowing you to login using your identity from Google, Twitter or Facebook. So we proposed following their lead, introducing PAM modules for web services - no more would you have to type in whatever login name IT assigned and try to remember the password you chose the last time password aging forced you to change it - you'd simply choose which web service you wanted to authenticate against, and would login to your Solaris account upon reciept of a cookie from their identity service.

Pinning notes to the cloud

We also all noted that we all have our own pile of notes we keep in our daily work - in text files in our home directory, in notebooks we carry around, on white boards in offices and common areas, on sticky notes on our monitors, or on scraps of paper pinned to our bulletin boards. The contents of the notes vary, some are things just for us, some are useful for our groups, some we would share with the world.

For instance, when our group moved to a new building a couple years ago, we had a white board in the hallway listing all the NIS & DNS servers, subnets, and other network configuration information we needed to set up our Solaris machines after the move. Similarly, as Solaris 11 was finishing and we were all learning the new network configuration commands, we shared notes in wikis and e-mails with our fellow engineers.

Users may also remember one of the popular features of Sun's old BigAdmin site was a section for sharing scripts and tips such as these. Meanwhile, the online "pin board" at Pinterest is taking the web by storm. So we thought, why not mash those up to solve this problem?

We proposed a new BigAddPin site where users could “pin” notes, command snippets, configuration information, and so on. For instance, once they had worked out the ideal Automated Installation manifest for their app server, they could pin it up to share with the rest of their group, or choose to make it public as an example for the world. Localized data, such as our group's notes on the servers for our subnet, could be shared only to users connecting from that subnet. And notes that they didn't want others to see at all could be marked private, such as the list of phone numbers to call for late night pizza delivery to the machine room, the birthdays and anniversaries they can never remember but would be sleeping on the couch if they forgot, or the list of automatically generated completely random, impossible to remember root passwords to all their servers.

For greater integration with Solaris, we'd put support right into the command shells — redirect output to a pinned note, set your path to include pinned notes as scripts you can run, or bring up your recent shell history and pin a set of commands to save for the next time you need to remember how to do that operation.

Location service for Solaris servers

A longer term plan would involve convincing the hardware design groups to put GPS locators with wireless transmitters in future server designs. This would help both admins and service personnel trying to find servers in todays massive data centers, and could feed into location presence apps to help show potential customers that while they may not see many Solaris machines on the desktop any more, they are all around. For instance, while walking down Wall Street it might show “There are over 2000 Solaris computers in this block.”

[Note: this proposal was made before the recent media coverage of a location service aggregrator app with less noble intentions, and in hindsight, we failed to consider what happens when such data similarly falls into the wrong hands. We certainly wouldn't want our app to be misinterpreted as “There are over $20 million dollars of SPARC servers in this building, waiting for you to steal them.” so it's probably best it was rejected.]

Harnessing the power of the GPU for Security

Most modern OS'es make use of the widespread availability of high powered GPU hardware in today's computers, with desktop environments requiring 3-D graphics acceleration, whether in Ubuntu Unity, GNOME Shell on Fedora, or Aero Glass on Windows, but we haven't yet made Solaris fully take advantage of this, beyond our basic offering of Compiz on the desktop.

Meanwhile, more businesses are interested in increasing security by using biometric authentication, but must also comply with laws in many countries preventing discrimination against employees with physical limations such as missing eyes or fingers, not to mention the lost productivity when employees can't login due to tinted contacts throwing off a retina scan or a paper cut changing their fingerprint appearance until it heals.

Fortunately, the two groups considering these problems put their heads together and found a common solution, using 3D technology to enable authentication using the one body part all users are guaranteed to have - pam_phrenology.so, a new PAM module that uses an array USB attached web cams (or just one if the user is willing to spin their chair during login) to take pictures of the users head from all angles, create a 3D model and compare it to the one in the authentication database. While Mythbusters has shown how easy it can be to fool common fingerprint scanners, we have not yet seen any evidence that people can impersonate the shape of another user's cranium, no matter how long they spend beating their head against the wall to reshape it.

This could possibly be extended to group users, using modern versions of some of the older phrenological studies, such as giving all users with long grey beards access to the System Architect role, or automatically placing users with pointy spikes in their hair into an easy use mode.

Unfortunately, there are still some unsolved technical challenges we haven't figured out how to overcome. Currently, a visit to the hair salon causes your existing authentication to expire, and some users have found that shaving their heads is the only way to avoid bad hair days becoming bad login days.


Reaction to these ideas

After gathering all our notes on these ideas from the engineering brainstorming meeting, we took them in to present to our management. Unfortunately, most of their reaction cannot be printed here, and they chose not to accept any of these ideas as they were, but they did have some feedback for us to consider as they sent us back to the drawing board.

They strongly suggested our ideas would be better presented if we weren't trying to decipher ink blotches that had been smeared by the condensation when we put our pint glasses on the napkins we were taking notes on, and to that end let us know they would not be approving any more engineering offsites in Irish themed pubs on the Friday of a Saint Patrick's Day weekend. (Hopefully they mean that situation specifically and aren't going to deny the funding for travel to this year's X.Org Developer's Conference just because it happens to be in Bavaria and ending on the Friday of the weekend Oktoberfest starts.)

They recommended our research techniques could be improved over just sitting around reading blogs and checking our Facebook, Twitter, and Pinterest accounts, such as considering input from alternate viewpoints on topics such as gamification.

They also mentioned that Oracle hadn't fully adopted some of Sun's common practices and we might have to try harder to get those to be accepted now that we are one unified company.

So as I said at the beginning, don't pester your sales rep just yet for any of these, since they didn't get approved, but if you have better ideas, pass them on and maybe they'll get into our next batch of planning.

Monday Dec 19, 2011

Flame on: Graphing hot paths through X server code

Last week, Brendan Gregg, who wrote the book on DTrace, published a new tool for visualizing hot sections of code by sampling the process stack every few milliseconds and then graphing which stacks were seen the most during the run: Flame Graphs. This looked neat, but my experience has been that I get the most understanding out of things like this by trying them out myself, so I did.

Fortunately, it's very easy to setup, as Brendan provided the tools as two standalone perl scripts you can download from the Flame Graph github repo. Then the next step is deciding what you want to run it on and capturing the data from a run of that.

It just so turns out that a few days before the Flame Graph release, another mail had crossed my inbox that gave me something to look at. Several years ago, I added some Userland Statically Defined Tracing probes to the Xserver code base, creating an Xserver provider for DTrace. These got integrated first to the Solaris Xservers (both Xsun & Xorg) and then merged upstream to the X.Org community release for Xserver 1.4, where they're available for all platforms with DTrace support.

Earlier this year, Adam Jackson enabled the dtrace hooks in the Fedora Xorg server package, using the SystemTap static probe support for DTrace probe compatibility. He then noticed a performance drop, which isn't supposed to happen, as DTrace probes are simply noops when not being actively traced, and submitted a fix for it upstream.

This was due to two of the probes I'd defined - when the Xserver processes a request from a client, there's a request-start probe just before the request is processed, and a request-done probe right after the request is processed. If you just want to see what requests a client is making you can trace either one, but if you want to measure the time taken to run a request, or determine if something is happening while a request is being processed, you need both of the probes. When they first got integrated, the code was simple:

#ifdef XSERVER_DTRACE
                XSERVER_REQUEST_START(GetRequestName(MAJOROP), MAJOROP,
                              ((xReq *)client->requestBuffer)->length,
                              client->index, client->requestBuffer);
#endif
                // skipping over input checking and auditing, to the main event,
                // the call to the request handling proc for the specified opcode:
                    result = (* client->requestVector[MAJOROP])(client);
#ifdef XSERVER_DTRACE
                XSERVER_REQUEST_DONE(GetRequestName(MAJOROP), MAJOROP,
                              client->sequence, client->index, result);
#endif

The compiler sees XSERVER_REQUEST_START and XSERVER_REQUEST_DONE as simple function calls, so it does whatever work is necessary to set up their arguments and then calls them. Later, during the linking process, the actual call instructions are replaced with noops and the addresses recorded so that when a dtrace user enables the probe the call can be activated at that time. In these cases, that's not so bad, just a bunch of register access and memory loads of things that are going to be needed nearby. The one outlier is GetRequestName(MAJOROP) which looks like a function call, but was really just a macro that used the opcode as the index an array of strings and returned the string name for the opcode so that DTrace probes could see the request names, especially for extensions which don't have static opcode mappings. For that the compiler would just load a register with the address of the base of the array and then add the offset of the entry specified by MAJOROP in that array.

All was well and good for a bit, until a later project came along during the Xorg 1.5 development cycle to unify all the different lists of protocol object names in the Xserver, as there were different ones in use by the DTrace probes, the security extensions, and the resource management system. That replaced the simple array lookup macro with a function call. While the function doesn't do a lot more work, it does enough to be noticed, and thus the performance hit was taken in the hot path of request dispatching. Adam's patch to fix this simply uses is-enabled probes to only make those function calls when the probes are actually enabled. x11perf testing showed the win on a Athlon 64 3200+ test system running Solaris 11:

Before: 250000000 trep @   0.0001 msec ( 8500000.0/sec): X protocol NoOperation
After:  300000000 trep @   0.0001 msec (10400000.0/sec): X protocol NoOperation 

But numbers are boring, and this gave an excuse to try out Flame Graphs. To capture the data, I took advantage of synchronization features built into xinit and dtrace -c:

# xinit /usr/sbin/dtrace -x ustackframes=100 \
  -n 'profile-997 /execname == "Xorg"/ { @[ustack()] = count(); }' \
  -o out.user_stacks.noop -c "x11perf -noop" \
  -- -config xorg.conf.dummy
To explain this command, I'll start at the end. For xinit, everything after the double dash is a set of arguments to the Xserver it starts, in this case, Xorg is told to look in the normal config paths for xorg.conf.dummy, which it would find is this simple config file in /etc/X11/xorg.conf.dummy setting the driver to the Xvfb-like “dummy” driver, which just uses RAM as a frame buffer to take graphics driver considerations out of this test:
Section "Device"
	Identifier  "Card0"
	Driver      "dummy"
EndSection
Since I'm using a modern Xorg version, that's all the configuration needed, all the unspecified sections are autoconfigured. xinit starts the Xserver, waits for the Xserver to signal that it's finished its start up, and then runs the first half of the command line as a client with the DISPLAY set to the new Xserver. In this case it runs the dtrace command, which sets up the probes based on the examples in the Flame Graphs README, and then runs the command specified as the -c argument, the x11perf benchmark tool. When x11perf exits, dtrace stops the probes, generates its report, and then exits itself, which in turn causes xinit to shut down the X server and exit.

The resulting Flame Graphs are, in their native SVG interactive form:

Before

After


If your browser can't handle those as inline SVG's, or they're scrunched too small to read, try Before (SVG) or Before (PNG), and After (SVG) or After (PNG).

You can see in the first one a bar showing a little over 10% of the time was in stacks involving LookupMajorName, which is completely gone in the second patch. Those who saw Adam's patch series come across the xorg-devel list last week may also notice the presence of XaceHook calls, which Adam optimized in another patch. Unfortunately, while I did build with that patch as well, we don't get the benefits of it since the XC-Security extension is on by default, and those fill in the hooks, so it can't just bypass them as it does when the hooks are empty.

I also took measurements of what Xorg did as gdm started up and a test user logged in, which produced the much larger flame graph you can see in SVG or PNG. As you can see the recursive calls in the font catalogue scanning functions make for some really tall flame graphs. You can also see that, to no one's surprise, xf86SlowBCopy is slow, and a large portion of the time is spent “blitting” bits from one place to another. Some potential areas for improvement stand out - like the 5.7% of time spent rescanning the font path because the Solaris gnome session startup scripts make xset fp calls to add the fonts for the current locale to the legacy font path for old clients that still use it, and another nearly 5% handling the ListFonts and ListFontsWithInfo calls, which dtracing with the request-start probes turned out to be the Java GUI for the Solaris Visual Panels gnome-panel applet.

Now because of the way the data for these is gathered, from looking at them alone you can't tell if a wide bar is one really long call to a function (as it is for the main() function bar in all these) or millions of individual calls (as it was for the ProcNoOperation calls in the x11perf -noop trace), but it does give a quick and easy way to pick out which functions the program is spending most of its time in, as a first pass for figuring out where to dig deeper for potential performance improvements.

Brendan has made these scripts easy to use to generate these graphs, so I encourage you to try them out as well on some sample runs to get familiar with them, so that when you really need them, you know what cases they're good for and how to capture the data and generate the graphs for yourself. Trying really is the best method of learning.

Wednesday Nov 09, 2011

S11 X11: ye olde window system in today's new operating system

Today's the big release for Oracle Solaris 11, after 7 years of development. For me, the Solaris 11 release comes a little more than 11 years after I joined the X11 engineering team at what was then Sun, and finishes off some projects that were started all the way back then.

For instance, when I joined the X team, Sun was finishing off the removal of the old OpenWindows desktop, and we kept getting questions asking about the rest of the stuff being shipped in /usr/openwin, the directory that held both the OpenLook applications and the X Window System software. I wrote up an ARC case at the time to move the X software to /usr/X11, but there were various issues and higher priority work, so we didn't end up starting that move until near the end of the Solaris 10 development cycle several years later. Solaris 10 thus had a mix of the recently added Xorg server and related code delivered in /usr/X11, while most of the existing bits from Sun's proprietary fork of X11R6 were still in /usr/openwin.

During Solaris 11 development, we finished that move, and then jumped again, moving the programs directly into /usr/bin, following the general Solaris 11 strategy of using /usr/bin for most of the programs shipped with the OS, and using other directories, such as /usr/gnu/bin, /usr/xpg4/bin, /usr/sunos/bin, and /usr/ucb for conflicting alternate implementations of the programs shipped in /usr/bin, no longer as a way to segregate out various subsystems to allow the OS to better fit onto the 105Mb hard disks that shipped with Sun workstations back when /usr/openwin was created. However, if for some reason you wanted to build your own set of X binaries, you could put them in /usr/X11R7 (as I do for testing builds of the upstream git master repos), and then put that first in your $PATH, so nothing is really lost here.

The other major project that was started during Solaris 10 development and finished for Solaris 11 was replacing that old proprietary fork of X11R6, including the Xsun server, with the modernized, modularized, open source X11R7.* code base from the new X.Org, including the Xorg server. The final result, included in this Solaris 11 release, is based mostly on the X11R7.6 release, including recent additions such as the XCB API I blogged about last year, though we did include newer versions of modules that had upstream releases since the X11R7.6 katamari, such as Xorg server version 1.10.3.

That said, we do still apply some local patches, configuration options, and other changes, for things from just fitting into the Solaris man page style or adding support for Trusted Extensions labeled desktops. You can see all of those changes in our source repository, which is searchable and browsable via OpenGrok on src.opensolaris.org (or via hgweb on community mirrors such as openindiana.org) and available for anonymous hg cloning as well. That xnv-clone tree is now frozen, a permanent snapshot of the Solaris 11 sources, while we've created a new x-s11-update-clone tree for the Solaris 11 update releases now being developed to follow on from here.

Naturally, when your OS has 7 years between major release cycles, the hardware environment you run on greatly changes in the meantime as well, and as the layer that handles the graphics hardware, there have been changes due to that. Most of the SPARC graphics devices that were supported in Solaris 10 aren't any more, because the platforms they ran in are no longer supported - we still ship a couple SPARC drivers that are supported, the efb driver for the Sun XVR-50, XVR-100, and XVR-300 cards based on the ATI Radeon chipsets, and the astfb driver for the AST2100 remote Keyboard/Video/Mouse/Storage (rKVMS) chipset in the server ILOM devices. On the x86 side, the EOL of 32-bit platforms let us clear out a lot of the older x86 video device drivers for chipsets and cards you wouldn't find in x64 systems - of course, there's still many supported there, due to the wider variety of graphics hardware found in the x64 world, and even some recent updates, such as the addition of Kernel Mode Setting (KMS) support for Intel graphics up through the Sandy Bridge generation.

For those who followed the development as it happened, either via watching our open source code releases or using one of the many development builds and interim releases such as the various Solaris Express trains, much of this is old news to you. For those who didn't, or who want a refresher on the details, you can see last year's summary in my X11 changes in the 2010.11 release blog post. Once again, the detailed change logs for the X11 packages are available, though unfortunately, all the links in them to the bug reports are now broken, so browsing the hg history log is probably more informative.

Since that update, which covered up to the build 151 released as 2010.11, we've continued development and polishing to get this Solaris 11 release finished up. We added a couple more components, including the previously mentioned xcb libraries, the FreeGLUT library, and the Xdmx Distributed Multihead X server. We cleaned up documentation, including the addition of some docs for the Xserver DTrace provider in /usr/share/doc/Xserver/. The packaging was improved, clearing up errors and optimizing the builds to reduce unnecessary updates. A few old and rarely used components were dropped, including the rstart program for starting up X clients remotely (ssh X forwarding replaces this in a more secure fashion) and the xrx plugin for embedding X applications in a web browser page (which hasn't been kept up to date with the rapidly evolving browser environment). Because Solaris 11 only supports 64-bit systems, and most of the upstream X code was already 64-bit clean, the X servers and most of the X applications are now shipped as 64-bit builds, though the libraries of course are delivered in both 32-bit and 64-bit versions for binary compatibility with applications of each flavor. The Solaris auditing system can now record each attempt by a client to connect to the Xorg server and whether or not it succeeded, for sites which need that level of detail.

In total, we recorded 1512 change request id's during Solaris 11 development, from the time we forked the “Nevada” gate from the Solaris 10 release until the final code freeze for todays release - some were one line bug fixes, some were man page updates, some were minor RFE's and some were major projects, but in the end, the result is both very different (and hopefully much better) than what we started with, and yet, still contains the core X11 code base with 24 years of backwards compatibility in the core protocols and APIs.

Saturday May 07, 2011

New blog, same as the old blog

After 7 years at http://blogs.sun.com/alanc/, this blog has a new URL: http://blogs.oracle.com/alanc/. I'm sure you can figure out the change to the RSS & Atom feed URL's as well - and apologies if the URL changes in those feeds caused your feed readers to determine the posts were all new, though fortunately, I get around to posting here so rarely these days, there shouldn't have been too many.

I didn't realize it until I went back to look for writing this post, but my blog was live for exactly 7 years on the old site - my first post was April 29, 2004 - the site went live a few days earlier, but I was traveling for that year's X.Org conference, so didn't get my account setup and post to it until I got there. They froze blogs.sun.com on April 29 this year to start migrating to the new blogs.oracle.com, integrating the Sun & Oracle bloggers, using the same Apache Roller software Sun used.

Three years later, I joined the people looking back for the third anniversary of blogs.sun.com and wrote “So much change in 3 short years - who knows where the next years will lead us?” — certainly I never expected then that three years after that Oracle would have bought Sun. I can only wonder what will be written in April 2014, on what would have been the tenth birthday of blogs.sun.com.

Friday Mar 25, 2011

R_AMD64_PC32 error? There, I Fixed It!

I try to fairly regularly build recent git checkouts of all the upstream modules from X.Org (at least all those listed in the current build.sh) on Solaris. Normally I do this in 32-bit mode on x86 machines using the Sun compilers on the latest Solaris 11 internal development build, but I also occasionally do it in 64-bit mode, or with gcc compilers, or on a SPARC machine. This helps me catch issues that would break our builds when we integrate the new releases before those releases happen. (Ideally I'd set up a Solaris client of the X.Org tinderbox, but I've not gotten around to that.)

Anyways, recently I finally decided to track down an error that only shows up in the 64-bit builds of the xscope protocol monitor/decoder for X11 on Solaris. The builds run fine up until the final link stage, which fails with:

ld: fatal: relocation error: R_AMD64_PC32: file audio.o: symbol littleEndian: value 0x8086c355 does not fit
ld: fatal: relocation error: R_AMD64_PC32: file audio.o: symbol ServerHostName: value 0x8086b4fe does not fit
ld: fatal: relocation error: R_AMD64_PC32: file decode11.o: symbol LBXEvent: value 0x808664c3 does not fit
(and over 150 more symbols that didn't fit)

A google search turned up some forum posts, a blog post, and an article on the AMD64 ABI support in the Sun Studio compilers. And indeed, the solutions they offered did work - building with -Kpic did allow the program to link.

But is that really the best answer? xscope is a simple program, and shouldn't be overflowing the normal memory model. Once it linked, looking at the resulting binary was a bit shocking:

% /usr/gnu/bin/size  xscope
   text	   data	    bss	    dec	    hex	filename
 416753	   5256	2155921980	2156343989	808732b5	xscope

% /usr/bin/size -f xscope

23(.interp) + 32(.SUNW_cap) + 5860(.eh_frame_hdr) + 27200(.eh_frame)
 + 2964(.SUNW_syminfo) + 5944(.hash) + 4224(.SUNW_ldynsym)
 + 17784(.dynsym) + 14703(.dynstr) + 192(.SUNW_version)
 + 1482(.SUNW_versym) + 3168(.SUNW_dynsymsort) + 96(.SUNW_reloc)
 + 1944(.rela.plt) + 1312(.plt) + 291018(.text) + 33(.init) + 33(.fini)
 + 280(.rodata) + 38461(.rodata1) + 1376(.got) + 784(.dynamic)
 + 1952(.data) + 0(.bssf) + 1144(.picdata) + 0(.tdata) + 0(.tbss)
 + 2155921980(.bss) = 2156343989

% pmap -x `pgrep xscope`
26151:	./xscope
         Address     Kbytes        RSS       Anon     Locked Mode   Mapped File
0000000000400000        408        408          -          - r-x--  xscope
0000000000476000          8          8          8          - rw---  xscope
0000000000478000    2105388       1064       1064          - rw---  xscope
0000000080C83000         52         52         52          - rw---    [ heap ]
[....]
FFFFFD7FFFDF8000         32         32         32          - rw---    [ stack ]
---------------- ---------- ---------- ---------- ----------
        total Kb    2108668       3204       1300          -

Two gigabytes of .bss space allocated!?!?! That can't be right. Looking through the output of the elfdump and nm programs a single symbol stood out:

Symbol Table Section:  .SUNW_ldynsym
     index    value              size              type bind oth ver shndx          name
[...]
      [89]  0x00000000009ff280 0x0000000080280000  OBJT GLOB  D    1 .bss           FDinfo

[Index]   Value                Size                Type  Bind  Other Shndx   Name
[...]
[528]   |            10482304|          2150105088|OBJT |GLOB |0    |28     |FDinfo

Unfortunately, that wasn't one of the ones listed in the linker errors, since it's starting address fit inside the normal memory model, but everything that came after it was out of range.

So what is this giant static allocation for? It's defined in scope.h:

#define BUFFER_SIZE (1024 \* 32)

struct fdinfo
{
  Boolean Server;
  long    ClientNumber;
  FD      pair;
  unsigned char   buffer[BUFFER_SIZE];
  int     bufcount;
  int     bufstart;
  int     buflimit;     /\* limited writes \*/
  int     bufdelivered; /\* total bytes delivered \*/
  Boolean writeblocked;
};

extern struct fdinfo   FDinfo[StaticMaxFD];

So it allocates a 32k buffer for up to StaticMaxFD file descriptors. How many is that? For that we need to look in xscope's fd.h:

/\* need to change the MaxFD to allow larger number of fd's \*/
#define StaticMaxFD FD_SETSIZE

and from there to the Solaris system headers, which define FD_SETSIZE in <sys/select.h>:

/\*
 \* Select uses bit masks of file descriptors in longs.
 \* These macros manipulate such bit fields.
 \* FD_SETSIZE may be defined by the user, but the default here
 \* should be >= NOFILE (param.h).
 \*/
#ifndef FD_SETSIZE
#ifdef _LP64
#define FD_SETSIZE      65536
#else
#define FD_SETSIZE      1024
#endif  /\* _LP64 \*/

So this makes the buffer fields alone in FDinfo become 65536 \* 32 \* 1024 bytes, aka 2 gigabytes.

Thus in this case, while compiler flags like -Kpic allow the code to link, using -DFD_SETSIZE=256 instead, builds code that's a little bit saner, fits in the normal memory model, and is less likely to fail with out of memory errors when you need it most:

% /usr/gnu/bin/size -f xscope
   text	   data	    bss	    dec	    hex	filename
 409388	   3352	8449804	8862544	 873b50	xscope

% pmap -x `pgrep xscope`
         Address     Kbytes        RSS       Anon     Locked Mode   Mapped File
0000000000400000        404        404          -          - r-x--  xscope
0000000000475000          4          4          4          - rw---  xscope
0000000000476000       8248         20         20          - rw---  xscope
0000000000C84000         52         52         52          - rw---    [ heap ]
[...]
FFFFFD7FFFDFD000         12         12         12          - rw---    [ stack ]
---------------- ---------- ---------- ---------- ----------
        total Kb      11500       2136        232          -

Of course that assumes that xscope is not going to be monitoring more than about 120 clients at a time (since it opens two file descriptors for each client, one connected to the client and one to the real X server), and still wastes many page mappings if you're only monitoring one client. The real fix being worked on for the next upstream release is to make the buffer allocation be dynamic, and allocate just enough for the number of clients we actually are monitoring.

The moral of this story? Just because you can make it build doesn't mean you've fixed it well, and sometimes it's useful to understand why the linker is giving you a hard time.

Sunday Jan 23, 2011

X11R7.6 Documentation Improvements

Late last month (about three months late in fact, sorry about that), X.Org announced the release of X11R7.6. While the announcement, release notes, and changelog give details on all the changes that went into this release since the X11R7.5 release a year earlier, the one I worked on the most (aside from the work of producing and posting the modules to be released) was the modernization of the X documentation.

When the X.Org Foundation inherited stewardship of the X Window System, the documentation was in a wide variety of formats. The programs and most libraries included man pages in the traditional nroff format. The source tree also contained a large number of specifications of the protocols and libraries, and these were in a variety of formats, including troff, TeX, FrameMaker, and LinuxDoc. These were mostly optimized for print output and generated Postscript documents which were included in the releases since few people had all the tools necessary to process all of those, especially the commercial FrameMaker software. This also was a problem for developers who needed to update the docs, and either didn't know all the different formats or who didn't have tools like FrameMaker.

One of the early decisions of X.Org was that we wanted all our docs to be in open formats with open source toolchains. The format we decided to standardize on for the long form documents such as the specifications was DocBook, a open format that was already adopted by projects such as GNOME. Later, during the split of the monolithic X Window System, we decided that once the documents were converted, they would be moved to the modules that they documented, so that the documents would be updated and released in sync with the code. Over the following years, we made a little progress, converting a few documents here and there, but not making a serious dent.

In 2010 though, two volunteers really turned this around - Matt Dew converted the bulk of our specifications to DocBook/XML, and Gaetan Nadon integrated those into the modules and set up the autoconf macros and Makefile build rules to convert them to the various output formats in a standardized fashion. The xmlto tool is used as a front-end processor to drive backend tools including xsltproc and Apache FOP to generate output documents in text, html, PostScript, and/or PDF formats.

You can see the effects of this if you compare documents from the X11R7.5 release with some from the X11R7.6 release. For instance, in the html versions of the Xlib spec, not only has the formatting greatly improved from the 7.5 version to the 7.6 version, the new one also now has a hyperlinked table of contents and index, and many cross-reference links between individual sections. In the pdf version, the 7.5 version is a simple encapsulation of the postscript output that is at least nicely formatted since the original troff was designed for print output. In the 7.6 version, fop generated the output directly for PDF, and was able to produce output that took advantage of the additional features of PDF, such as including the same extended set of hyperlinks as the HTML version, and a set of "bookmarks" for quick navigation to sections listed in the table of contents.

If you're building the sources, there's a standardized set of configure flags to control the documentation tools used during the build, such as --with-fop to enable the use of fop to generate Postscript and PDF output or --without-fop to disable it. If you're installing prebuilt packages, you may find the documents in various formats provided in the module specific subdirectory of the documentation directory for your system, such as /usr/share/doc/libX11 for the documents provided with libX11. Even if the packager didn't preformat the html, text, or pdf for you, the default Makefiles install the xml versions there so you can format them when needed, or read online with a Docbook-viewing tool such as GNOME's Yelp.

While this is a huge step forward, we're not done yet. Many of the API docs still need to be converted from old-style K&R C prototypes to the ANSI C89 style used in the code base, and other formatting cleanups are needed. Matt is working on ways to not just have hyperlinked cross-references inside a single document, but across the documentation set. There's plenty of work to go around for additional people to help, such as improving the style-sheets, proofreading the documents to find more areas to fix, adding cross-references where they could be useful, so more help is always welcome. And of course, the huge pile of docs we have are almost exclusively developer focused - end user documentation is mainly limited to man pages, which aren't always as helpful as they could be, but we need user feedback to let us know the areas that need more help, since the developers don't have to rely on the man pages to figure out how to use the software, so don't notice the gaps.

But now at least we've got good momentum going, and I'm hopeful each future release will continue to show improvement.

Monday Nov 15, 2010

X11 changes in the 2010.11 release

Another OS release came out today, 2010.11, and as usual, it has a number of X11 changes. The biggest change in X is probably... Hmm, I can see by the look on your face, you're not buying the casual use of “as usual” there. Okay, you caught me, this OS release isn't quite following our previous pattern, so I guess we better get that out of the way first. Please remember I am not an Oracle spokesman, and can't speak on behalf of Oracle, so don't even think of quoting this as “Oracle says...”

In many ways, this release is simply the continuation of the OpenSolaris distro releases of the last few years. It's built the same way, using the IPS packaging system and repositories, and Caiman installers, as the OpenSolaris 2009.06 and prior releases were. Where OpenSolaris 2009.06 (the last full release) was the biweeekly build numbered 111b, and the release we'd planned to put out as OpenSolaris 2010.03 earlier this year (and which made it to the package repository, but was not put up as downloadable ISO's) would have been biweekly build 134b, this release is 151a. You should be able to upgrade to it from OpenSolaris 2009.06 or OpenSolaris /dev builds via the package repository following the instructions in the 2010.11 release notes.

So what's different about this OS release? Well, it's not named OpenSolaris anymore for starters - it's Oracle Solaris 11 Express. We'd always said that OpenSolaris releases were leading up to Solaris 11 eventually, and this name emphasizes we're getting closer to that (though still not there yet). It also recognizes that this release is built by Oracle, not Sun nor the OpenSolaris community. While it's built on the work done by the OpenSolaris community, and many portions of it are still developed as open projects on opensolaris.org, the kernel and core utilities are once again being developed behind closed doors, and the final assembly and testing are similarly done in house. The license terms for the free downloads have changed as well (though it's still offered under support contract for commercial production use as well), and the OS images include some of the encumbered packages we'd had to keep out of OpenSolaris in order to allow OpenSolaris to be freely redistributable. (Not all of them, since some were simply EOL'ed as they were for hardware well past the end of its supported lifetime, like many of the old SPARC frame buffers.)

So with that out of the way, back to the topic at hand - what's new in the X Window System in this release? Well that depends on how far back you're coming from. You can browse the complete changelogs for X going back to the point we branched the Nevada branches from the Solaris 10 release, so I'll try to stick to the highlights.

Changes since the last OpenSolaris X11 source release

None, since the X sources on opensolaris.org are still updated automatically from our internal master gate on each commit. (In fact, since the source gates currently reflect a point between biweekly builds 153 & 154, they have changes newer than this release, such as the integrations of libxcb and FreeGLUT.)

Changes since the last OpenSolaris developer build release (b134)

There were 17 biweekly builds between the last one published to pkg.opensolaris.org/dev in March and this release. The biggest change in the X packages in this period was their packaging. Previously we built our packages using the old SVR4 package format that was used since Solaris 2.0, and in many cases following the breakdown used in the old Solaris 2 releases (SUNWxwinc for most headers, SUNWxwplt for most libraries, SUNWxwman for most man pages), and then the release team converted those to the IPS format used in the OpenSolaris releases. Like several of the other consolidations, X has now converted to building IPS packages directly, and in the process refactored the X packages to better follow the way the upstream X.Org sources were split into modules at X11R7, which also happens to be more similar to the way most Linux distros break them up. This should allow easier creation of minimized environments with the subset of X packages you need.

As for headers and man pages, they are now included in the packages they are used with - for instance all the libX11 headers and API man pages are directly in the x11/library/libx11 package. System admins can still decide to include or exclude them in their installs though, since they are tagged with the devel and docfacets”, which are the IPS mechanism for controlling optional package components. To read more about how to use these with X or the other changes in the refactoring, see the heads up messages I posted when this work integrated.

Of course, there were also the usual updates to new upstream releases - Xorg 1.7.7, freetype 2.4.2, fontconfig 2.8.0, among many others. The X server packages now also include the mdb modules and scripts for getting client and grab information from the server that I blogged about back in April.

Changes since the last OpenSolaris stable release (2009.06 / b111b)

This period saw the completion of our multiyear project to completely replace the old Solaris X code base with the X11R7 open source code base from X.Org. Solaris 10 and earlier shipped with Sun's proprietary fork of X11R6, with bits of X11R5, X11R6.4, X11R6.6, & X11R6.8 mixed in. We're now set up to much more easily track upstream and are deviated from upstream in much fewer places than before (partially due to pushing a number of our previous fixes back upstream, in other cases, we determined the upstream code was better and went with it).

We also had a very large user-visible change in build 130: all the files moved from /usr/X11 directly into /usr/bin & /usr/lib, following the work done in other parts of Solaris to move files from locations like /usr/ccs/bin and /usr/sfw to the common /usr directories. We still have symlinks in /usr/openwin and /usr/X11 for backwards compatibility, so we shouldn't break your .xinitrc calls to /usr/openwin/bin/xrdb or /usr/X11/bin/xmodmap.

Since 2009.06, we moved from Xorg 1.5 to 1.7.4. Of course, with this upgrade, we got the HAL support for input device configuration working just as X.Org started moving off HAL upstream, something we still need to deal with for Solaris - for this release, input devices are still configured in HAL .fdi files. The xorgcfg and xorgconfig programs did go away as part of this move though - fortunately more and more systems are working without any xorg.conf at all, and when one is needed, only the sections being changed have to be included, lessening the utility of programs to generate full configuration files. The new Xorg also includes support for virtual consoles on systems with the necessary kernel driver support (all x86 systems and SPARCs with frame buffers supporting “coherent console”).

We also added the synaptics touchpad driver, synergy software for sharing input devices with multiple systems, the simple xcompmgr composite manager, the xinput client for input device configuration, and finally provided IPS packaged versions of the classic xdm display manager and xfs legacy font server. The Xprint server and several related commands did go away, but the libXp library was kept for binary compatibility.

Our VNC implementation was converted from RealVNC 4.1.3 to TigerVNC 1.0.1, which is being kept up-to-date with new Xorg releases, unlike RealVNC, which hasn't really been updating it's open source release in the last few years. xscreensaver was finally updated from 5.01 to 5.11, and was actually moved out of the X gate in OpenSolaris to building as a RPM-style pkgbuild spec file with the other higher-level desktop software - hopefully in the process we fixed some long-standing bugs in our forked code.

Graphics updates included Nvidia's driver support for various new devices and OpenGL 4.0, and Intel's DRI updates, including GEM support in their DRM module. Mesa was added on SPARC to provide a matching OpenGL implementation, but with only the software renderer, no hardware acceleration.

What else has changed?

Besides the official Solaris 11 Express release information, you can find more details on changes in this release on a bunch of other blogs, such as:

But here's some changes in other parts of the OS you may not see listed on those:

Of course, that's just a small sample, the full changelogs are a few thousand items long (and unfortunately, some of the consolidations haven't published theirs outside the firewall).

About

Engineer working on Oracle Solaris and with the X.Org open source community.

Disclaimer

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle, the X.Org Foundation, or anyone else.

See Also
Follow me on twitter

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today