Monday Dec 19, 2011

Flame on: Graphing hot paths through X server code

Last week, Brendan Gregg, who wrote the book on DTrace, published a new tool for visualizing hot sections of code by sampling the process stack every few milliseconds and then graphing which stacks were seen the most during the run: Flame Graphs. This looked neat, but my experience has been that I get the most understanding out of things like this by trying them out myself, so I did.

Fortunately, it's very easy to setup, as Brendan provided the tools as two standalone perl scripts you can download from the Flame Graph github repo. Then the next step is deciding what you want to run it on and capturing the data from a run of that.

It just so turns out that a few days before the Flame Graph release, another mail had crossed my inbox that gave me something to look at. Several years ago, I added some Userland Statically Defined Tracing probes to the Xserver code base, creating an Xserver provider for DTrace. These got integrated first to the Solaris Xservers (both Xsun & Xorg) and then merged upstream to the X.Org community release for Xserver 1.4, where they're available for all platforms with DTrace support.

Earlier this year, Adam Jackson enabled the dtrace hooks in the Fedora Xorg server package, using the SystemTap static probe support for DTrace probe compatibility. He then noticed a performance drop, which isn't supposed to happen, as DTrace probes are simply noops when not being actively traced, and submitted a fix for it upstream.

This was due to two of the probes I'd defined - when the Xserver processes a request from a client, there's a request-start probe just before the request is processed, and a request-done probe right after the request is processed. If you just want to see what requests a client is making you can trace either one, but if you want to measure the time taken to run a request, or determine if something is happening while a request is being processed, you need both of the probes. When they first got integrated, the code was simple:

                              ((xReq *)client->requestBuffer)->length,
                              client->index, client->requestBuffer);
                // skipping over input checking and auditing, to the main event,
                // the call to the request handling proc for the specified opcode:
                    result = (* client->requestVector[MAJOROP])(client);
                XSERVER_REQUEST_DONE(GetRequestName(MAJOROP), MAJOROP,
                              client->sequence, client->index, result);

The compiler sees XSERVER_REQUEST_START and XSERVER_REQUEST_DONE as simple function calls, so it does whatever work is necessary to set up their arguments and then calls them. Later, during the linking process, the actual call instructions are replaced with noops and the addresses recorded so that when a dtrace user enables the probe the call can be activated at that time. In these cases, that's not so bad, just a bunch of register access and memory loads of things that are going to be needed nearby. The one outlier is GetRequestName(MAJOROP) which looks like a function call, but was really just a macro that used the opcode as the index an array of strings and returned the string name for the opcode so that DTrace probes could see the request names, especially for extensions which don't have static opcode mappings. For that the compiler would just load a register with the address of the base of the array and then add the offset of the entry specified by MAJOROP in that array.

All was well and good for a bit, until a later project came along during the Xorg 1.5 development cycle to unify all the different lists of protocol object names in the Xserver, as there were different ones in use by the DTrace probes, the security extensions, and the resource management system. That replaced the simple array lookup macro with a function call. While the function doesn't do a lot more work, it does enough to be noticed, and thus the performance hit was taken in the hot path of request dispatching. Adam's patch to fix this simply uses is-enabled probes to only make those function calls when the probes are actually enabled. x11perf testing showed the win on a Athlon 64 3200+ test system running Solaris 11:

Before: 250000000 trep @   0.0001 msec ( 8500000.0/sec): X protocol NoOperation
After:  300000000 trep @   0.0001 msec (10400000.0/sec): X protocol NoOperation 

But numbers are boring, and this gave an excuse to try out Flame Graphs. To capture the data, I took advantage of synchronization features built into xinit and dtrace -c:

# xinit /usr/sbin/dtrace -x ustackframes=100 \
  -n 'profile-997 /execname == "Xorg"/ { @[ustack()] = count(); }' \
  -o out.user_stacks.noop -c "x11perf -noop" \
  -- -config xorg.conf.dummy
To explain this command, I'll start at the end. For xinit, everything after the double dash is a set of arguments to the Xserver it starts, in this case, Xorg is told to look in the normal config paths for xorg.conf.dummy, which it would find is this simple config file in /etc/X11/xorg.conf.dummy setting the driver to the Xvfb-like “dummy” driver, which just uses RAM as a frame buffer to take graphics driver considerations out of this test:
Section "Device"
	Identifier  "Card0"
	Driver      "dummy"
Since I'm using a modern Xorg version, that's all the configuration needed, all the unspecified sections are autoconfigured. xinit starts the Xserver, waits for the Xserver to signal that it's finished its start up, and then runs the first half of the command line as a client with the DISPLAY set to the new Xserver. In this case it runs the dtrace command, which sets up the probes based on the examples in the Flame Graphs README, and then runs the command specified as the -c argument, the x11perf benchmark tool. When x11perf exits, dtrace stops the probes, generates its report, and then exits itself, which in turn causes xinit to shut down the X server and exit.

The resulting Flame Graphs are, in their native SVG interactive form:



If your browser can't handle those as inline SVG's, or they're scrunched too small to read, try Before (SVG) or Before (PNG), and After (SVG) or After (PNG).

You can see in the first one a bar showing a little over 10% of the time was in stacks involving LookupMajorName, which is completely gone in the second patch. Those who saw Adam's patch series come across the xorg-devel list last week may also notice the presence of XaceHook calls, which Adam optimized in another patch. Unfortunately, while I did build with that patch as well, we don't get the benefits of it since the XC-Security extension is on by default, and those fill in the hooks, so it can't just bypass them as it does when the hooks are empty.

I also took measurements of what Xorg did as gdm started up and a test user logged in, which produced the much larger flame graph you can see in SVG or PNG. As you can see the recursive calls in the font catalogue scanning functions make for some really tall flame graphs. You can also see that, to no one's surprise, xf86SlowBCopy is slow, and a large portion of the time is spent “blitting” bits from one place to another. Some potential areas for improvement stand out - like the 5.7% of time spent rescanning the font path because the Solaris gnome session startup scripts make xset fp calls to add the fonts for the current locale to the legacy font path for old clients that still use it, and another nearly 5% handling the ListFonts and ListFontsWithInfo calls, which dtracing with the request-start probes turned out to be the Java GUI for the Solaris Visual Panels gnome-panel applet.

Now because of the way the data for these is gathered, from looking at them alone you can't tell if a wide bar is one really long call to a function (as it is for the main() function bar in all these) or millions of individual calls (as it was for the ProcNoOperation calls in the x11perf -noop trace), but it does give a quick and easy way to pick out which functions the program is spending most of its time in, as a first pass for figuring out where to dig deeper for potential performance improvements.

Brendan has made these scripts easy to use to generate these graphs, so I encourage you to try them out as well on some sample runs to get familiar with them, so that when you really need them, you know what cases they're good for and how to capture the data and generate the graphs for yourself. Trying really is the best method of learning.

Wednesday Nov 09, 2011

S11 X11: ye olde window system in today's new operating system

Today's the big release for Oracle Solaris 11, after 7 years of development. For me, the Solaris 11 release comes a little more than 11 years after I joined the X11 engineering team at what was then Sun, and finishes off some projects that were started all the way back then.

For instance, when I joined the X team, Sun was finishing off the removal of the old OpenWindows desktop, and we kept getting questions asking about the rest of the stuff being shipped in /usr/openwin, the directory that held both the OpenLook applications and the X Window System software. I wrote up an ARC case at the time to move the X software to /usr/X11, but there were various issues and higher priority work, so we didn't end up starting that move until near the end of the Solaris 10 development cycle several years later. Solaris 10 thus had a mix of the recently added Xorg server and related code delivered in /usr/X11, while most of the existing bits from Sun's proprietary fork of X11R6 were still in /usr/openwin.

During Solaris 11 development, we finished that move, and then jumped again, moving the programs directly into /usr/bin, following the general Solaris 11 strategy of using /usr/bin for most of the programs shipped with the OS, and using other directories, such as /usr/gnu/bin, /usr/xpg4/bin, /usr/sunos/bin, and /usr/ucb for conflicting alternate implementations of the programs shipped in /usr/bin, no longer as a way to segregate out various subsystems to allow the OS to better fit onto the 105Mb hard disks that shipped with Sun workstations back when /usr/openwin was created. However, if for some reason you wanted to build your own set of X binaries, you could put them in /usr/X11R7 (as I do for testing builds of the upstream git master repos), and then put that first in your $PATH, so nothing is really lost here.

The other major project that was started during Solaris 10 development and finished for Solaris 11 was replacing that old proprietary fork of X11R6, including the Xsun server, with the modernized, modularized, open source X11R7.* code base from the new X.Org, including the Xorg server. The final result, included in this Solaris 11 release, is based mostly on the X11R7.6 release, including recent additions such as the XCB API I blogged about last year, though we did include newer versions of modules that had upstream releases since the X11R7.6 katamari, such as Xorg server version 1.10.3.

That said, we do still apply some local patches, configuration options, and other changes, for things from just fitting into the Solaris man page style or adding support for Trusted Extensions labeled desktops. You can see all of those changes in our source repository, which is searchable and browsable via OpenGrok on (or via hgweb on community mirrors such as and available for anonymous hg cloning as well. That xnv-clone tree is now frozen, a permanent snapshot of the Solaris 11 sources, while we've created a new x-s11-update-clone tree for the Solaris 11 update releases now being developed to follow on from here.

Naturally, when your OS has 7 years between major release cycles, the hardware environment you run on greatly changes in the meantime as well, and as the layer that handles the graphics hardware, there have been changes due to that. Most of the SPARC graphics devices that were supported in Solaris 10 aren't any more, because the platforms they ran in are no longer supported - we still ship a couple SPARC drivers that are supported, the efb driver for the Sun XVR-50, XVR-100, and XVR-300 cards based on the ATI Radeon chipsets, and the astfb driver for the AST2100 remote Keyboard/Video/Mouse/Storage (rKVMS) chipset in the server ILOM devices. On the x86 side, the EOL of 32-bit platforms let us clear out a lot of the older x86 video device drivers for chipsets and cards you wouldn't find in x64 systems - of course, there's still many supported there, due to the wider variety of graphics hardware found in the x64 world, and even some recent updates, such as the addition of Kernel Mode Setting (KMS) support for Intel graphics up through the Sandy Bridge generation.

For those who followed the development as it happened, either via watching our open source code releases or using one of the many development builds and interim releases such as the various Solaris Express trains, much of this is old news to you. For those who didn't, or who want a refresher on the details, you can see last year's summary in my X11 changes in the 2010.11 release blog post. Once again, the detailed change logs for the X11 packages are available, though unfortunately, all the links in them to the bug reports are now broken, so browsing the hg history log is probably more informative.

Since that update, which covered up to the build 151 released as 2010.11, we've continued development and polishing to get this Solaris 11 release finished up. We added a couple more components, including the previously mentioned xcb libraries, the FreeGLUT library, and the Xdmx Distributed Multihead X server. We cleaned up documentation, including the addition of some docs for the Xserver DTrace provider in /usr/share/doc/Xserver/. The packaging was improved, clearing up errors and optimizing the builds to reduce unnecessary updates. A few old and rarely used components were dropped, including the rstart program for starting up X clients remotely (ssh X forwarding replaces this in a more secure fashion) and the xrx plugin for embedding X applications in a web browser page (which hasn't been kept up to date with the rapidly evolving browser environment). Because Solaris 11 only supports 64-bit systems, and most of the upstream X code was already 64-bit clean, the X servers and most of the X applications are now shipped as 64-bit builds, though the libraries of course are delivered in both 32-bit and 64-bit versions for binary compatibility with applications of each flavor. The Solaris auditing system can now record each attempt by a client to connect to the Xorg server and whether or not it succeeded, for sites which need that level of detail.

In total, we recorded 1512 change request id's during Solaris 11 development, from the time we forked the “Nevada” gate from the Solaris 10 release until the final code freeze for todays release - some were one line bug fixes, some were man page updates, some were minor RFE's and some were major projects, but in the end, the result is both very different (and hopefully much better) than what we started with, and yet, still contains the core X11 code base with 24 years of backwards compatibility in the core protocols and APIs.

Saturday May 07, 2011

New blog, same as the old blog

After 7 years at, this blog has a new URL: I'm sure you can figure out the change to the RSS & Atom feed URL's as well - and apologies if the URL changes in those feeds caused your feed readers to determine the posts were all new, though fortunately, I get around to posting here so rarely these days, there shouldn't have been too many.

I didn't realize it until I went back to look for writing this post, but my blog was live for exactly 7 years on the old site - my first post was April 29, 2004 - the site went live a few days earlier, but I was traveling for that year's X.Org conference, so didn't get my account setup and post to it until I got there. They froze on April 29 this year to start migrating to the new, integrating the Sun & Oracle bloggers, using the same Apache Roller software Sun used.

Three years later, I joined the people looking back for the third anniversary of and wrote “So much change in 3 short years - who knows where the next years will lead us?” — certainly I never expected then that three years after that Oracle would have bought Sun. I can only wonder what will be written in April 2014, on what would have been the tenth birthday of

Friday Mar 25, 2011

R_AMD64_PC32 error? There, I Fixed It!

I try to fairly regularly build recent git checkouts of all the upstream modules from X.Org (at least all those listed in the current on Solaris. Normally I do this in 32-bit mode on x86 machines using the Sun compilers on the latest Solaris 11 internal development build, but I also occasionally do it in 64-bit mode, or with gcc compilers, or on a SPARC machine. This helps me catch issues that would break our builds when we integrate the new releases before those releases happen. (Ideally I'd set up a Solaris client of the X.Org tinderbox, but I've not gotten around to that.)

Anyways, recently I finally decided to track down an error that only shows up in the 64-bit builds of the xscope protocol monitor/decoder for X11 on Solaris. The builds run fine up until the final link stage, which fails with:

ld: fatal: relocation error: R_AMD64_PC32: file audio.o: symbol littleEndian: value 0x8086c355 does not fit
ld: fatal: relocation error: R_AMD64_PC32: file audio.o: symbol ServerHostName: value 0x8086b4fe does not fit
ld: fatal: relocation error: R_AMD64_PC32: file decode11.o: symbol LBXEvent: value 0x808664c3 does not fit
(and over 150 more symbols that didn't fit)

A google search turned up some forum posts, a blog post, and an article on the AMD64 ABI support in the Sun Studio compilers. And indeed, the solutions they offered did work - building with -Kpic did allow the program to link.

But is that really the best answer? xscope is a simple program, and shouldn't be overflowing the normal memory model. Once it linked, looking at the resulting binary was a bit shocking:

% /usr/gnu/bin/size  xscope
   text	   data	    bss	    dec	    hex	filename
 416753	   5256	2155921980	2156343989	808732b5	xscope

% /usr/bin/size -f xscope

23(.interp) + 32(.SUNW_cap) + 5860(.eh_frame_hdr) + 27200(.eh_frame)
 + 2964(.SUNW_syminfo) + 5944(.hash) + 4224(.SUNW_ldynsym)
 + 17784(.dynsym) + 14703(.dynstr) + 192(.SUNW_version)
 + 1482(.SUNW_versym) + 3168(.SUNW_dynsymsort) + 96(.SUNW_reloc)
 + 1944(.rela.plt) + 1312(.plt) + 291018(.text) + 33(.init) + 33(.fini)
 + 280(.rodata) + 38461(.rodata1) + 1376(.got) + 784(.dynamic)
 + 1952(.data) + 0(.bssf) + 1144(.picdata) + 0(.tdata) + 0(.tbss)
 + 2155921980(.bss) = 2156343989

% pmap -x `pgrep xscope`
26151:	./xscope
         Address     Kbytes        RSS       Anon     Locked Mode   Mapped File
0000000000400000        408        408          -          - r-x--  xscope
0000000000476000          8          8          8          - rw---  xscope
0000000000478000    2105388       1064       1064          - rw---  xscope
0000000080C83000         52         52         52          - rw---    [ heap ]
FFFFFD7FFFDF8000         32         32         32          - rw---    [ stack ]
---------------- ---------- ---------- ---------- ----------
        total Kb    2108668       3204       1300          -

Two gigabytes of .bss space allocated!?!?! That can't be right. Looking through the output of the elfdump and nm programs a single symbol stood out:

Symbol Table Section:  .SUNW_ldynsym
     index    value              size              type bind oth ver shndx          name
      [89]  0x00000000009ff280 0x0000000080280000  OBJT GLOB  D    1 .bss           FDinfo

[Index]   Value                Size                Type  Bind  Other Shndx   Name
[528]   |            10482304|          2150105088|OBJT |GLOB |0    |28     |FDinfo

Unfortunately, that wasn't one of the ones listed in the linker errors, since it's starting address fit inside the normal memory model, but everything that came after it was out of range.

So what is this giant static allocation for? It's defined in scope.h:

#define BUFFER_SIZE (1024 * 32)

struct fdinfo
  Boolean Server;
  long    ClientNumber;
  FD      pair;
  unsigned char   buffer[BUFFER_SIZE];
  int     bufcount;
  int     bufstart;
  int     buflimit;     /* limited writes */
  int     bufdelivered; /* total bytes delivered */
  Boolean writeblocked;

extern struct fdinfo   FDinfo[StaticMaxFD];

So it allocates a 32k buffer for up to StaticMaxFD file descriptors. How many is that? For that we need to look in xscope's fd.h:

/* need to change the MaxFD to allow larger number of fd's */
#define StaticMaxFD FD_SETSIZE

and from there to the Solaris system headers, which define FD_SETSIZE in <sys/select.h>:

 * Select uses bit masks of file descriptors in longs.
 * These macros manipulate such bit fields.
 * FD_SETSIZE may be defined by the user, but the default here
 * should be >= NOFILE (param.h).
#ifndef FD_SETSIZE
#ifdef _LP64
#define FD_SETSIZE      65536
#define FD_SETSIZE      1024
#endif  /* _LP64 */

So this makes the buffer fields alone in FDinfo become 65536 * 32 * 1024 bytes, aka 2 gigabytes.

Thus in this case, while compiler flags like -Kpic allow the code to link, using -DFD_SETSIZE=256 instead, builds code that's a little bit saner, fits in the normal memory model, and is less likely to fail with out of memory errors when you need it most:

% /usr/gnu/bin/size -f xscope
   text	   data	    bss	    dec	    hex	filename
 409388	   3352	8449804	8862544	 873b50	xscope

% pmap -x `pgrep xscope`
         Address     Kbytes        RSS       Anon     Locked Mode   Mapped File
0000000000400000        404        404          -          - r-x--  xscope
0000000000475000          4          4          4          - rw---  xscope
0000000000476000       8248         20         20          - rw---  xscope
0000000000C84000         52         52         52          - rw---    [ heap ]
FFFFFD7FFFDFD000         12         12         12          - rw---    [ stack ]
---------------- ---------- ---------- ---------- ----------
        total Kb      11500       2136        232          -

Of course that assumes that xscope is not going to be monitoring more than about 120 clients at a time (since it opens two file descriptors for each client, one connected to the client and one to the real X server), and still wastes many page mappings if you're only monitoring one client. The real fix being worked on for the next upstream release is to make the buffer allocation be dynamic, and allocate just enough for the number of clients we actually are monitoring.

The moral of this story? Just because you can make it build doesn't mean you've fixed it well, and sometimes it's useful to understand why the linker is giving you a hard time.

Sunday Jan 23, 2011

X11R7.6 Documentation Improvements

Late last month (about three months late in fact, sorry about that), X.Org announced the release of X11R7.6. While the announcement, release notes, and changelog give details on all the changes that went into this release since the X11R7.5 release a year earlier, the one I worked on the most (aside from the work of producing and posting the modules to be released) was the modernization of the X documentation.

When the X.Org Foundation inherited stewardship of the X Window System, the documentation was in a wide variety of formats. The programs and most libraries included man pages in the traditional nroff format. The source tree also contained a large number of specifications of the protocols and libraries, and these were in a variety of formats, including troff, TeX, FrameMaker, and LinuxDoc. These were mostly optimized for print output and generated Postscript documents which were included in the releases since few people had all the tools necessary to process all of those, especially the commercial FrameMaker software. This also was a problem for developers who needed to update the docs, and either didn't know all the different formats or who didn't have tools like FrameMaker.

One of the early decisions of X.Org was that we wanted all our docs to be in open formats with open source toolchains. The format we decided to standardize on for the long form documents such as the specifications was DocBook, a open format that was already adopted by projects such as GNOME. Later, during the split of the monolithic X Window System, we decided that once the documents were converted, they would be moved to the modules that they documented, so that the documents would be updated and released in sync with the code. Over the following years, we made a little progress, converting a few documents here and there, but not making a serious dent.

In 2010 though, two volunteers really turned this around - Matt Dew converted the bulk of our specifications to DocBook/XML, and Gaetan Nadon integrated those into the modules and set up the autoconf macros and Makefile build rules to convert them to the various output formats in a standardized fashion. The xmlto tool is used as a front-end processor to drive backend tools including xsltproc and Apache FOP to generate output documents in text, html, PostScript, and/or PDF formats.

You can see the effects of this if you compare documents from the X11R7.5 release with some from the X11R7.6 release. For instance, in the html versions of the Xlib spec, not only has the formatting greatly improved from the 7.5 version to the 7.6 version, the new one also now has a hyperlinked table of contents and index, and many cross-reference links between individual sections. In the pdf version, the 7.5 version is a simple encapsulation of the postscript output that is at least nicely formatted since the original troff was designed for print output. In the 7.6 version, fop generated the output directly for PDF, and was able to produce output that took advantage of the additional features of PDF, such as including the same extended set of hyperlinks as the HTML version, and a set of "bookmarks" for quick navigation to sections listed in the table of contents.

If you're building the sources, there's a standardized set of configure flags to control the documentation tools used during the build, such as --with-fop to enable the use of fop to generate Postscript and PDF output or --without-fop to disable it. If you're installing prebuilt packages, you may find the documents in various formats provided in the module specific subdirectory of the documentation directory for your system, such as /usr/share/doc/libX11 for the documents provided with libX11. Even if the packager didn't preformat the html, text, or pdf for you, the default Makefiles install the xml versions there so you can format them when needed, or read online with a Docbook-viewing tool such as GNOME's Yelp.

While this is a huge step forward, we're not done yet. Many of the API docs still need to be converted from old-style K&R C prototypes to the ANSI C89 style used in the code base, and other formatting cleanups are needed. Matt is working on ways to not just have hyperlinked cross-references inside a single document, but across the documentation set. There's plenty of work to go around for additional people to help, such as improving the style-sheets, proofreading the documents to find more areas to fix, adding cross-references where they could be useful, so more help is always welcome. And of course, the huge pile of docs we have are almost exclusively developer focused - end user documentation is mainly limited to man pages, which aren't always as helpful as they could be, but we need user feedback to let us know the areas that need more help, since the developers don't have to rely on the man pages to figure out how to use the software, so don't notice the gaps.

But now at least we've got good momentum going, and I'm hopeful each future release will continue to show improvement.

Monday Nov 15, 2010

X11 changes in the 2010.11 release

Another OS release came out today, 2010.11, and as usual, it has a number of X11 changes. The biggest change in X is probably... Hmm, I can see by the look on your face, you're not buying the casual use of “as usual” there. Okay, you caught me, this OS release isn't quite following our previous pattern, so I guess we better get that out of the way first. Please remember I am not an Oracle spokesman, and can't speak on behalf of Oracle, so don't even think of quoting this as “Oracle says...”

In many ways, this release is simply the continuation of the OpenSolaris distro releases of the last few years. It's built the same way, using the IPS packaging system and repositories, and Caiman installers, as the OpenSolaris 2009.06 and prior releases were. Where OpenSolaris 2009.06 (the last full release) was the biweeekly build numbered 111b, and the release we'd planned to put out as OpenSolaris 2010.03 earlier this year (and which made it to the package repository, but was not put up as downloadable ISO's) would have been biweekly build 134b, this release is 151a. You should be able to upgrade to it from OpenSolaris 2009.06 or OpenSolaris /dev builds via the package repository following the instructions in the 2010.11 release notes.

So what's different about this OS release? Well, it's not named OpenSolaris anymore for starters - it's Oracle Solaris 11 Express. We'd always said that OpenSolaris releases were leading up to Solaris 11 eventually, and this name emphasizes we're getting closer to that (though still not there yet). It also recognizes that this release is built by Oracle, not Sun nor the OpenSolaris community. While it's built on the work done by the OpenSolaris community, and many portions of it are still developed as open projects on, the kernel and core utilities are once again being developed behind closed doors, and the final assembly and testing are similarly done in house. The license terms for the free downloads have changed as well (though it's still offered under support contract for commercial production use as well), and the OS images include some of the encumbered packages we'd had to keep out of OpenSolaris in order to allow OpenSolaris to be freely redistributable. (Not all of them, since some were simply EOL'ed as they were for hardware well past the end of its supported lifetime, like many of the old SPARC frame buffers.)

So with that out of the way, back to the topic at hand - what's new in the X Window System in this release? Well that depends on how far back you're coming from. You can browse the complete changelogs for X going back to the point we branched the Nevada branches from the Solaris 10 release, so I'll try to stick to the highlights.

Changes since the last OpenSolaris X11 source release

None, since the X sources on are still updated automatically from our internal master gate on each commit. (In fact, since the source gates currently reflect a point between biweekly builds 153 & 154, they have changes newer than this release, such as the integrations of libxcb and FreeGLUT.)

Changes since the last OpenSolaris developer build release (b134)

There were 17 biweekly builds between the last one published to in March and this release. The biggest change in the X packages in this period was their packaging. Previously we built our packages using the old SVR4 package format that was used since Solaris 2.0, and in many cases following the breakdown used in the old Solaris 2 releases (SUNWxwinc for most headers, SUNWxwplt for most libraries, SUNWxwman for most man pages), and then the release team converted those to the IPS format used in the OpenSolaris releases. Like several of the other consolidations, X has now converted to building IPS packages directly, and in the process refactored the X packages to better follow the way the upstream X.Org sources were split into modules at X11R7, which also happens to be more similar to the way most Linux distros break them up. This should allow easier creation of minimized environments with the subset of X packages you need.

As for headers and man pages, they are now included in the packages they are used with - for instance all the libX11 headers and API man pages are directly in the x11/library/libx11 package. System admins can still decide to include or exclude them in their installs though, since they are tagged with the devel and docfacets”, which are the IPS mechanism for controlling optional package components. To read more about how to use these with X or the other changes in the refactoring, see the heads up messages I posted when this work integrated.

Of course, there were also the usual updates to new upstream releases - Xorg 1.7.7, freetype 2.4.2, fontconfig 2.8.0, among many others. The X server packages now also include the mdb modules and scripts for getting client and grab information from the server that I blogged about back in April.

Changes since the last OpenSolaris stable release (2009.06 / b111b)

This period saw the completion of our multiyear project to completely replace the old Solaris X code base with the X11R7 open source code base from X.Org. Solaris 10 and earlier shipped with Sun's proprietary fork of X11R6, with bits of X11R5, X11R6.4, X11R6.6, & X11R6.8 mixed in. We're now set up to much more easily track upstream and are deviated from upstream in much fewer places than before (partially due to pushing a number of our previous fixes back upstream, in other cases, we determined the upstream code was better and went with it).

We also had a very large user-visible change in build 130: all the files moved from /usr/X11 directly into /usr/bin & /usr/lib, following the work done in other parts of Solaris to move files from locations like /usr/ccs/bin and /usr/sfw to the common /usr directories. We still have symlinks in /usr/openwin and /usr/X11 for backwards compatibility, so we shouldn't break your .xinitrc calls to /usr/openwin/bin/xrdb or /usr/X11/bin/xmodmap.

Since 2009.06, we moved from Xorg 1.5 to 1.7.4. Of course, with this upgrade, we got the HAL support for input device configuration working just as X.Org started moving off HAL upstream, something we still need to deal with for Solaris - for this release, input devices are still configured in HAL .fdi files. The xorgcfg and xorgconfig programs did go away as part of this move though - fortunately more and more systems are working without any xorg.conf at all, and when one is needed, only the sections being changed have to be included, lessening the utility of programs to generate full configuration files. The new Xorg also includes support for virtual consoles on systems with the necessary kernel driver support (all x86 systems and SPARCs with frame buffers supporting “coherent console”).

We also added the synaptics touchpad driver, synergy software for sharing input devices with multiple systems, the simple xcompmgr composite manager, the xinput client for input device configuration, and finally provided IPS packaged versions of the classic xdm display manager and xfs legacy font server. The Xprint server and several related commands did go away, but the libXp library was kept for binary compatibility.

Our VNC implementation was converted from RealVNC 4.1.3 to TigerVNC 1.0.1, which is being kept up-to-date with new Xorg releases, unlike RealVNC, which hasn't really been updating it's open source release in the last few years. xscreensaver was finally updated from 5.01 to 5.11, and was actually moved out of the X gate in OpenSolaris to building as a RPM-style pkgbuild spec file with the other higher-level desktop software - hopefully in the process we fixed some long-standing bugs in our forked code.

Graphics updates included Nvidia's driver support for various new devices and OpenGL 4.0, and Intel's DRI updates, including GEM support in their DRM module. Mesa was added on SPARC to provide a matching OpenGL implementation, but with only the software renderer, no hardware acceleration.

What else has changed?

Besides the official Solaris 11 Express release information, you can find more details on changes in this release on a bunch of other blogs, such as:

But here's some changes in other parts of the OS you may not see listed on those:

Of course, that's just a small sample, the full changelogs are a few thousand items long (and unfortunately, some of the consolidations haven't published theirs outside the firewall).

Friday Nov 05, 2010

Is Wayland going to replace X?

The interwebs are full this morning of reports on Mark Shuttleworth’s announcement that someday he’d like to use Wayland instead of Xorg as Ubuntu’s primary display driver.

My favorite so far is Ryan Paul’s Ars Technica article for this dead-on estimate of the amount of effort it would take to completely replace X11:

Although the Linux ecosystem would benefit greatly from a lighter and more easily extensible alternative, a concerted effort to displace X11 on the Linux desktop hasn’t really emerged yet because the task of bringing drivers, third-party software, and all of the other layers of the stack into alignment with such a move would be prohibitively cumbersome. Like, in the sense that using only your toes to build a full-scale replica of the Statue of Liberty out of toothpicks is prohibitively cumbersome.

And that’s the important point - despite what some of the sensationalist headlines are saying, Ubuntu is not dropping X. They’re not even sure when it would be possible to start shipping Wayland - the announcement from Mark Shuttleworth was that this is the direction they want to go in hopes that a show of support will help get more people working on Wayland (which has mostly been a single person effort) in order to make it a usable solution in a year or more.

Even once they start shipping Wayland, since most applications will still be written for X, they’ll have to include an X server which now has to share the graphics devices with Wayland. (The Wayland site has architecture diagrams showing how this works.) Sure, their Unity desktop (the Ubuntu replacement for GNOME 3.0’s gnome-shell) will be written to Wayland, but more widespread adoption depends on how many other platforms also adopt Wayland, since then the creators of other applications will have to decide if its worth the effort of porting just for those distros when they can do nothing and continue to work on all X11 platforms, including those running Wayland.

One of the challenges in widespread Wayland adoption, especially on non-Linux kernels, is that Wayland requires a kernel Direct Rendering Module (DRM) with Kernel Modesetting Support (KMS) for every graphics device.

That’s especially challenging in Solaris, where even in the latest OpenSolaris and Solaris Express releases, we only have DRM drivers for Intel and ATI Radeon graphics (and the Radeon one currently doesn’t work, but that’s fixable).

That leaves these non-EOL graphics devices with no DRM/KMS support in Solaris:

  • AST (service processor/remote KVM in Sun servers)
  • ATI Rage and Mach64
  • Cirrus
  • Matrox
  • Nvidia
  • VIA Unichrome
  • Trident
  • VESA (fallback for hardware without specific driver support)
  • VirtualBox
  • VMWare
  • Sun XVR-50/100/300 (OEM ATI Radeon series, but with separate driver)
  • Sun XVR-2500
  • Sun Ray

Of course, that’s the list of devices Oracle is supporting in Solaris - other distros/forks from the OpenSolaris code base, such as OpenIndiana or Belenix, may choose to support more or less, but I’ve not yet seen any sign of them porting DRM drivers on their own.

For some of those, support could be provided by porting the existing open source dual-BSD/GPL-licensed DRM code - others require creating drivers where there are none.

Nvidia appears on their list since we ship their driver, which has their own kernel/driver architecture instead of DRI. Getting DRI/KMS support for nvidia either requires them to rearchitect their driver or moving from their driver to the open source / reverse engineered Nouveau project.

From Owain Ainsworth’s talk at this year’s X Developer Summit, we know that work is in progress to port KMS support to the BSD DRI, but again, that’s not yet done, though the BSDs do have a lot more of the device-specific DRM modules already ported than Solaris did.

So Xorg is in little danger of disappearing overnight - our desktop architecture continues to evolve, as it has over many years, and Wayland may play an increasingly larger role in it, or may end up an interesting side note, like Xgl before it, but either way it will be an evolutionary process, not a big bang flag day sudden change.

Friday Oct 08, 2010

Porting X apps to XCB

I pushed the release of xwininfo 1.1 to the X.Org release archives last week. xwininfo is a command-line utility to print information about windows on an X server. The major new feature of this release is the rewrite to use libxcb instead of libX11 for the connection to the X server.

For those who haven't heard, XCB is a new library (well, new compared to Xlib) to communicate to the X server over the X11 protocol. While Xlib is designed to look like a traditional library API, hiding the fact that calls result in protocol requests to a server, XCB makes the client-server nature of the protocol explicit in its design. For instance, to lookup a window property, the Xlib code is a single function call:

XGetWindowProperty(dpy, win, atom, 0, 0, False, AnyPropertyType,
                   &type_ret, &format_ret, &num_ret,
                   &bytes_after, &prop_ret);

Xlib generates the request to the X server to retrieve the property and appends it to its buffer of requests. Since this is a request that requires a response (many requests, such as those to draw something, do not generate responses from the server unless an error occurs), Xlib then flushes the buffer, sending the contents to the X server, and waits until the server processes all the requests in turn and then sends the response to the client with the property requested. Xlib also provides utility functions to wrap that to retrieve specific properties and decode them, knowing the details of each property and how to request and decode them, such as XGetWMName, XGetWMHints, and so on.

XCB on the other hand, provides functions generated directly from the protocol descriptions, so that they map directly onto the protocol, with separate functions to put requests into the outgoing buffer, and to read results back from them asynchronously later. The xcb version of the above code is:

prop_cookie = xcb_get_property (dpy, False, win, atom,
                                XCB_GET_PROPERTY_TYPE_ANY, 0, 0);
prop_reply = xcb_get_property_reply (dpy, prop_cookie, NULL);

The power of xcb is in allowing those two steps to have as much code as you want between them, letting the programmer decide when to wait for data instead of forcing you to wait everytime you make a request that returns data. For instance, xwininfo knows in advance from the command line options most of the data it needs to request from the server for each window, so it can request it all at once, and then wait for the results to start coming in. When using the -tree option to walk the window tree, it can request the data for all the children of the current window at once, batching even further. On a local connection on a single CPU server, this means less context switches between X client and server. On a multi-core/CPU server, it can allow the X server to be processing requests on one core while the client is handling the responses as they become available, better utilizing the multiprocessing nature of the system. And on remote connections, the requests can be grouped into packets closer to the MTU size of the connection, instead of whatever requests are in the buffer when one is made that needs a response.

For xwininfo, when I ported to xcb I tested with a GNOME desktop session on OpenSolaris with a few clients open and ran it as xwininfo -root -all starting at the root of the window hierarchy and climbing down the tree, requesting all the information available for each window along the way. In my sample session it found 114 windows (in X, a window is simply a container for drawing output and receiving events, and often windows in terms of the protocol are subsets of, or borders around, the objects users think of as windows). When running locally on my Ultra 27 with it's 4-core Nehalem CPU, both versions ran so fast (0.05 seconds or less) that the difference in time was too small to really accurately measure. So I used ssh -X to tunnel an X11 connection from my office in California to a server in the Sun office in Beijing, China, and from there back to my desktop, introducing a huge amount of latency. With this, the difference was dramatic between the two:

Xlib: 0.03u 0.02s 8:19.12 0.0% 
 xcb: 0.00u 0.00s 0:45.26 0.0% 

Of course, xwininfo is an unusual X application in a few ways:

  • It runs through the requests as fast as it can, then exits, not waiting for user input (unless you use the mode where you click on a window to choose it, after which it runs through as normal). Most X applications spend most of their time waiting for user input, so the overall runtime won't decrease as much by reducing the time spent communicating with the X server.
  • It uses only the core protocol and shape extension, none of the extensions like Render or Xinput that more complex, modern applications use. Xinput, XKB, and GLX are especially problematic, as those have not yet been fully supported in an XCB release, though support has been worked on through some Google Summer of Code projects.
  • It's small enough that a complete reworking to use xcb in one shot was feasible. I couldn't imagine trying that for something as complex as Firefox, Gimp, or Nautilus.
  • It used only raw Xlib - no toolkits. Any application programmer who wants to stay sane uses toolkits like Gtk or Qt and helper libraries such as Pango and Cairo to deal with all the things that all applications need to do, and shouldn't have to reinvent - widgets, input methods, accessibility support, interacting with window managers, complex text layout for non-Latin character sets, and so on.
  • It doesn't use much of the Xlib helper functions - not a whole lot more than the protocol calls, so is more directly mappable to xcb. Applications that rely on Xlib's input method framework, Compose key handling, character set conversion, or other functions, would be harder to port without duplicating all that work (though the modern toolkits handle most of that in the toolkit layer now anyway). It did rely on the helper functions for converting the window name property from other character sets, and the xcb version right now has a regression in that it only works for UTF-8 and Latin-1 window names, but since most modern toolkits use UTF-8, you may not notice unless you run older applications with localized window names.

Fortunately, XCB also provides a method for incremental conversion from Xlib to XCB, where you can use libX11 to open the display, and pass the Display pointer it returns to all your existing code, toolkits, and libraries, and then when you want to call an XCB function, convert it to an xcb_connection_t pointer for the same connection, allowing mixing calls to Xlib & XCB API's.

This is done by building libX11 as a layer on top of libxcb, so they share the same socket to the X server and pass control of it back and forth. That option was introduced in libX11 1.2, and is now always present (no longer optional) in the libX11 1.4 release coming out this fall (Release Candidate 2 is out now).

So as another example, xdpyinfo also prints a lot of information about the X server, but it calls many extensions, and a lot of its calls aren't blocking on a response from the server. If you add the -queryExt option though, then for every extension it calls XQueryExtension to print which request, event, and error ids are assigned to that extension in the currently running server. Since those are dynamically assigned, and vary depending on the set of extensions enabled in a given server build/configuration, this is critical information to have when debugging X error reports that reference these ids. Using xdpyinfo -queryExt is thus especially needed when reporting an X error message that come from custom error handlers like the one in the gtk toolkit that omit the extension information found in the default Xlib error handler, or the person reading the bug will have no idea which extension you hit the error in.

XQueryExtension takes one extension name at a time, sends a request to the X server for the id codes for that extension, and waits for a response so it can return those ids to the caller. On the Xorg 1.7 server on my Solaris 11 test system, there are currently 30 active X extensions, so that's 30 tiny packets sent to the X server, 30 times the xdpyinfo client blocks in poll() waiting for a response, and 30 times the X server goes through the client handling & request scheduling code before going back to block again on its own select() loop.

A simple patch to xdpyinfo replaced just that loop of calls to XQueryExtension with two loops - the first calling xcb_query_extension for each extension, and then when the entire batch was ready to go to the server, a second loop called xcb_query_extension_reply to start collecting the batched replies. Gathering system call counts with truss -c showed the expected reduction in a number of system calls made by the xdpyinfo client itself:

System callXlibxcb

\* total includes all system calls, including many not shown since their count did not change significantly. There was one additional set of open/mmap/close etc. for loading the added libX11-xcb library.

Over a tcp connection, this reduced both the number of packets, and due to tcp packet header overhead, the overall amount of data:

TCP packets9335
TCP bytes115547726

This sort of change is far more feasible for most applications - finding the hotspots where your user is waiting for data from the server, especially at the startup of the application when you're gathering the information about the server and session to initialize your application with, and converting just those to more efficient sets of xcb calls. This is much like earlier work to reduce latency by converting repeated calls to XInternAtom with a single call to fetch multiple atoms at once via XInternAtoms.

Of course, if you still have to maintain support for older platforms, such as Solaris 10 or RHEL 5, you'll have to keep this code #ifdef'ed, since those platforms won't have xcb support, but availablity is growing on newer systems. (We didn't get it into any of the OpenSolaris distro releases before the end unfortunately, nor the first Solaris 11 Express release, but are working on it for a future Solaris release.)

Sunday Jul 25, 2010

Revelations: 145

Phoronix published a sensationalist article last week claiming that my regular e-mail updates of our biweekly builds somehow signified some out of the ordinary newsworthy event, without bothering to do even the most basic of fact checking. While I pointed this out in their forums within hours of publication, I'm still seeing it cited by other web magazines that don't bother to fact check, as well as in various e-mails and blogs, so am publishing a more complete explanation here of why it really is a non-event.

The article claimed:

As the first email of its kind in months, Alan Coopersmith who is a known X.Org contributor and longtime Sun Microsystems employee now working for Oracle, has written a new email entitled "IPS distro-import changes needed for X packages for nv_145." Alan immediately began this public email by saying, "Just when you thought you'd never see another one of these biweekly mails..."

Sadly, all they needed to do to disprove the claim that it was the “first of its kind in months” was simply follow the links from the e-mail archive page they linked to, to see that I had sent a similar message two weeks earlier for the previous biweekly build nv_144. In fact, if they checked the archives for previous months, they would have found that, except for missing build 143 (a mistake on my part), I've sent these approximately every two weeks for every biweekly build for a very long time.

Perhaps I'd confused the article's author with the offhand comment he seems to have misinterpreted, but explaining that requires a bit of background explaining what these e-mails are and why I send them in the first place.

As many OpenSolaris users know, over the course of the last couple of years, we've been transitioning from the old SVR4 package system used in Solaris 2.0 through Solaris 10 to the new Image Packaging System (also sometimes known as “IPS” or “pkg(5)”) being developed by a team of Solaris engineers and community members. Initially, we maintained two parallel distros, Solaris Express: Community Edition (SXCE), which was built using the SVR4 packages and the old Solaris installer that worked with them, and OpenSolaris (originally codenamed “Project Indiana”), which was built using the IPS packages and the new Caiman install software designed to work with them. The teams providing the package contents continued to deliver SVR4 packages, and the OpenSolaris distro team converted those to IPS.

One of the goals of the OpenSolaris distro was to provide a set of ISO install images and package repository that was completely, freely redistributable, so that it could be easily mirrored, copied, and downloaded without having to deal with the various encumbrances required by some of the third-party licenses in the traditional Solaris and SXCE packages. Unfortunately, at that time, we had not yet finished separating the encumbered code from the open source code in our X packages so that they could be included, since when Sun made its proprietary fork of X11R6 in the early 90's, the engineers never figured we'd be open sourcing Solaris a decade later and need to easily separate out the encumbered bits they were merging into the main code base.

The initial Developer Preview releases of the OpenSolaris distro thus included a set of X packages that Moinak Ghosh built from the Fully Open X (FOX) project work we'd done to rebuild our source trees from the ground up using the open source code from the X11R7 modular releases in order to ensure everything was either open source or cleanly separated out. Over the first few months, we migrated from those to the packages our team delivered to the OS as we integrated the FOX project work into our main source tree. Because these changes weren't always obvious to the external observer, I started sending notes for each biweekly build to let the team maintaining the SVR4 to IPS conversion tables know which parts of our packages they could now include, as well as any other changes they needed to know about, such as version number changes. (Our SVR4 X packages all used the same version number, a holdover from the monolithic X source days, but we've migrated to using the upstream version numbers as much as possible in the new X IPS packages.) I did this not only to try to help the distro building team, but also to help myself keep on top of the changes they made in converting our packages, so that we could understand issues users hit and know how to help them, and to learn better what we'd need to do when it came time for us to start building the IPS packages ourselves.

Last year, Sun announced that the time had finally come to start converting the builds to generate IPS packages directly, taking the next step in transitioning off SVR4 and ending the production of the SXCE distro, since the SVR4 packages used to build it would no longer be made. The ON consolidation went first, converting the packages containing the kernel, drivers and core utilities in build snv_136. The Install consolidation went next, in build snv_143, and X followed in build snv_144.

So, when I wrote “Just when you thought you'd never see another one of these biweekly mails...”, I simply meant that after build 144, all the X packages are already delivered in IPS format, so there are no SVR4 to IPS conversion files to update for them any more, so I won't need to send those - except in cases like we had in 145, where I relocated one of the files that was listed as a dependency in one of the other packages still being converted by the distro builders, so they needed to update the dependency statement for it to list the new path to the file.

Despite what Phoronix seems to have assumed, I was not in any way referring to the limbo state the OpenSolaris distro is currently in (and unfortunately, as much as I'd like to explain that, I can't), nor stating anything about build 145 that is fundamentally different than the previous builds. It should come as no surprise to anyone that while build 134 was the last build to be publicly released, we have continued work on the Nevada builds after that - after all as we've said since 2005, Nevada is the code name for the development branch in which we're working towards the next full release of Solaris (i.e. not another Solaris 10 update release, but the one we may someday call Solaris 11) - while that's been released under various forms, as the OpenSolaris open source code, or via the binary releases of the original Solaris Express, Solaris Express: Community Edition, Solaris Express: Developer Edition, or the OpenSolaris 2008.05, 2008.11, and 2009.06 distros, we've always kept driving towards the same goal, with biweekly builds assembled to test the current progress.

My mail is hardly the only externally visible sign of this - you can see changelogs for the major consolidations (ON, SFW, X, and Desktop/JDS) for build 144 (the last fully finished build, as 145 is just finishing pre-integration testing now, with a delivery deadline of Monday for packages to be included, and the distro build process starting after that. Of course, the sources are also available, and there's plenty of activity on the various commit notification and discussion mailing lists showing that we're continuing to work away on Nevada.

So unfortunately, Phoronix succeeded in making a mountain out of a molehill, confusing their readers and fellow webzine authors, but likely meeting their goal of driving more traffic to their site to generate page views for their ad revenue as people passed the link around twitter, IRC & email or linked to it from their blogs & articles. As others have pointed out, checking the facts or contacting the developers to find out the story is less juicy than it seems doesn't play well with that business model (and that's not just true for Phoronix - look at any number of the columnists for other web-based "trade publications" that generate traffic via controversial posts, and the more outraged the community gets over them and angrily passes them around to denounce, the better their numbers are - you can just imagine how many of their articles are designed to bait Groklaw or Slashdot readers).

Monday Apr 26, 2010

Grabbing information from the X server

Jay came into my office last week complaining his mouse was grabbed by a client and wouldn't let him click in any windows, but he couldn't tell which one. He remembered I said I had a script to find this out, but hadn't told him where it was. So I logged into his workstation, su'ed to root and ran a couple commands:

root@overthere:~# ps -ef | grep Xorg
jacotton   881   880   0   Apr 08 vt/2      274:24 /usr/bin/Xorg :0 -nolisten tcp -br -auth /var/run/gdm/auth-for-gdm-O_aWTb/datab
root@overthere:~# /usr/demo/Xserver/mdb/list_Xserver_devicegrab_client -p 881
Device "Virtual core pointer" id 2: 
  -- active grab 3a00000 by client 29

Device "Virtual core keyboard" id 3: 
  -- active grab 3a00000 by client 29

Device "Virtual core XTEST pointer" id 4: 
  -- no active grab on device

Device "Virtual core XTEST keyboard" id 5: 
  -- no active grab on device

Device "mouse" id 6: 
  -- no active grab on device

Device "input" id 7: 
  -- no active grab on device

Device "hotkey" id 8: 
  -- no active grab on device

Device "keyboard" id 9: 
  -- no active grab on device

root@overthere:~# /usr/demo/Xserver/mdb/list_Xserver_clients -p 881
currentMaxClients = 41
    0           0 ??? - NULL ClientPtr->osPrivate
    1          22  33 880 
    2          20  35 912 
    3           9  36 915 
   29     6926083  59 15839 
15822 /bin/bash /usr/bin/firefox
  15835 /bin/bash /usr/lib/firefox/ /usr/lib/firefox/firefox-bin
    15839 /usr/lib/firefox/firefox-bin

So we knew it was firefox holding the grab, and killing that process released it.

The scripts are actually simple wrappers around a loadable module for the Solaris mdb modular debugger. While I prefer source level debuggers like dbx & gdb for interactive use, the mdb module system works well for this style of use. The mdb modules are compiled C code, so can use the headers from the X server to get the type information, instead of requiring debugging information such as stabs or DWARF be built into the X server binary.

These were originally written for Solaris 9, in the dark days before DTrace, but are still useful, because they get information you don't know you need until after it's happened - for instance, with the Xserver DTrace probes you can get the client information from the client-connect and client-auth probes when the client connects, or see which clients send grab requests from the request probes when they make the calls - but when your server is already grabbed it's too late to trace unless you know how to reproduce the issue.

These have been kept in our source tree for a while, but weren't built as part of the regular builds so were often out of date when the server internals changed how things like the devPrivate fields were stored, and thus needed porting to the new X server versions when you needed them. And even when they were up-to-date, you needed to have a full X server source tree to build them, since they used headers not included in the server SDK package.

To solve this, in Nevada build 135 (sorry, that was just after the stable branch for the 2010.03 release forked off, so it won't be in there) I integrated them into the normal build process and delivered them in the X server packages. The mdb module is delivered in /usr/lib/mdb/proc/ with links to it named,, and so that it should be automatically loaded when mdb attaches to a process or core of any of those programs. The scripts were delivered in /usr/share/demo/Xserver/mdb, along with a README file explaining how to use either the direct mdb module or the wrapper scripts.

Porting these to other debuggers architectures for scripting should be possible, though you'd need to deal with either having the X server debugging information available or converting the C language headers & traversal algorithms to Studio dbx's ksh or gdb's Python or whatever scripting solution your debugger uses. The grab and client information should be identical across the X servers from the xorg-server sources on all platforms (though as noted above, varying by xorg-server version as the internals change), except for the client process id information. Most platforms get the client pid at connection time, so it can be logged in the client audit log if you run with -audit, or made available to the client-auth dtrace probe, but then discard that information. On Solaris & OpenSolaris however, we keep it in a client devPrivate for the SolarisIA extension to use to adjust the priority boost given to the process with focus, so the mdb module can look it up there. Of course, there's no guarantee that the client hasn't forked a child or somehow else passed control of its X connection in a way the server wasn't informed about, but it's accurate far more often than not. Since the file descriptor information is available on all platforms, you might be able to determine the client via that somehow, such as through lsof - for instance, I've implemented an enhancement to pfiles on Solaris to show peer processes for local IPC that hopefully I'll find time to get reviewed and integrated soon.

There's of course a lot more information in the X server that might be useful to find - these were each written to solve specific problems on machines we couldn't login to directly to debug ourselves. What other problems could be solved if we had similar modules to expose other information? We can certainly enhance these as we find more - let us know if you find some.

Wednesday Feb 18, 2009

Xorg 1.5.3 known issues in OpenSolaris 2009.06 & Solaris Express build 107

As Phoronix recently noticed, we upgraded the Xorg server from 1.3 to 1.5.3 in Nevada build 107 (currently available via Solaris Express Community Edition (SXCE) Install ISO’s and OpenSolaris IPS package updates in the /dev repo).

When we integrated, I sent out a heads up message with information about the driver compatibility changes and issues we knew about at that time.

So far it’s gone mostly smooth, thanks to the help of the people who tested the pre-integration builds I posted both internally and on the X community on, but there’s a few issues that have hit some people so far, most with workarounds you can use to get past them:

6801598 Xorg 1.5 ignores kernel keyboard layout setting, always uses "us"

Fixed in build 109. Xorg on Solaris has always shipped with a patch to query the kernel for the console keyboard layout (which in turn either gets set automatically by the keyboard if, like Sun keyboards, it provides the layout in the optional USB HID layout identifier, or gets set by querying the user on initial installation), but the code we apply this patch to moved from the Xorg binary to the binary in Xorg 1.5 and our patch didn’t get migrated correctly. We didn’t notice in testing since our testing was done with input devices provided by HAL, which was handling this for us, but when that didn’t integrate as expected, this bug was exposed.

Workaround: Create an /etc/X11/xorg.conf file with a keyboard section specifying your desired layout, such as this one for a British layout:

Section "InputDevice"
	Identifier		"Keyboard0"
	Driver			"kbd"
	Option "XkbRules"	"xorg"
	Option "XkbModel"	"pc105"
	Option "XkbLayout"	"gb"

6791361 SUNWxorg-xkb should deliver "base" rules

If you do need to set a localized keyboard layout, you may notice the default XKB rules file upstream changed from "xorg" (which is currently shipped in our packages) to "base" (which isn’t yet, though we’ve asked the localization teams who build our XKB packages to add it).

Workaround: Include an XkbRules option line in your keyboard settings as shown in the example above, or make a symlink from /usr/X11/lib/X11/xkb/rules/base to the xorg file in that directory. The fix for the above bug in build 109 should also set the default used by our Xorg builds back to "xorg" for now.

6801386 Xorg core dumps on startup if hald not running in snv_107

Fixed in build 109. Yet another instance of our old friend 6724478 libc printf should not SEGV when passed NULL for %s format. This was a design decision made in SVR4 and Solaris 2.0 many years ago, to force programmers to catch NULL pointer misuse - but since Linux & the BSD’s allow it, today it’s mostly become a source of pain in porting code that while technically incorrect, works on those platforms, so the decision was made last year to change Solaris libc to match. Unfortunately, that change is still undergoing standards conformance testing, so hasn’t integrated yet, and we still have to check for NULL first.

Workaround: Make sure hald starts before Xorg does. This shouldn’t be a problem for gdm users (default in OpenSolaris), since the gdm SMF service lists the hal service as a dependency that needs to start first, but may cause problems for dtlogin, xinit or startx users.

6806763 Xorg 1.5 doesn’t start if xorg.conf contains RgbPath entry

Fixed in 109, and in upstream head. This entry in the Files section of xorg.conf was obsoleted upstream, but became a parse error instead of just being ignored, which breaks people who upgraded a system with a working xorg.conf that happened to have this line.

Workaround: Remove RgbPath lines from /etc/X11/xorg.conf.

6799573 Metacity not starting on TX Xorg 1.5

Fixed in 108. The XACE framework used by extensions such as Xtsol & Xselinux added new permission types to check in the upstream release, but Xtsol doesn’t handle all of them yet. In this case, Metacity’s constant checking of round-trip time by appending to an existing property failed because Xtsol was only handling the WriteAccess check and not the BlendAccess check.

I don’t know of a workaround for this other than not running the desktop in multi-level mode, so users of the labeled/multi-level security desktop in Trusted Extensions may want to wait to upgrade until then.

6798452 X Server will not start on x86 systems with multiple graphics devices

Not yet fixed, either in our builds or upstream, but has a simple workaround of creating an xorg.conf file listing the devices you want to use. This may affect people with a single card if your motherboard has on-board graphics as well.

6797940 Can’t do gui installs on x86 systems with Nevada 107/Nevada 108

We haven’t yet tracked down this issue, which only affects the SXCE installer on x86/x64 platforms. Workarounds include using the OpenSolaris LiveCD installer or the text only mode of the SXCE installer instead. After installation, the installed Xorg works fine (modulo the above issues).

6686 SUNWefb[w] should be part of slim_install

Fixed in 108. The SPARC drivers added in OpenSolaris build 107 for Xorg on XVR-50, XVR-100, and XVR-300 graphics cards aren’t included in the default install. You can add them after the install by doing pfexec pkg install SUNWefb SUNWefbw and then rebooting. (These drivers are only available in OpenSolaris installs, not SXCE ones, since they conflict with the Xsun drivers for these cards included in SXCE.)

Saturday Jun 14, 2008

June 11, 2008 X Server security advisories

On June 11, iDefense & the X.Org Foundation released security advisories for a set of issues in extension protocol parsing code in the open source X server common code base that iDefense discovered and X.Org fixed.

Their advisories/reports are at:

Sun has released a Security Sun Alert for the X server versions in Solaris 8, 9, 10 and OpenSolaris 2008.05 at:

Preliminary T-patches are available for Solaris 8, 9, and 10 from the locations shown in the Sun Alert - these are not fully tested yet (hence the "T" in T-patch).

The fix for these issues has integrated into the X gate for Nevada in Nevada build 92, so users of SXCE or SXDE will get the fixes by upgrading to SXCE build 92 when it becomes available (probably in 3-4 weeks, though the first week of July is traditionally a holiday week in Sun's US offices, so may affect availability).

Fixes for OpenSolaris 2008.05 users following the development build trains will be available when the Nevada 92 packages are pushed to the repo (also probably in about 3 weeks from now).

Fixes are planned for OpenSolaris 2008.05 users staying on the stable branch (i.e. nv_86 equivalent), but I do not have information yet on how or when those will be available.

Fixes for users building X from the OpenSolaris sources are currently available in the Mercurial repository of the FOX project in open-src/xserver/xorg/6683567.patch.

For users of all OS versions, the best defenses against this class of attacks is to never, ever, ever run “xhost +”, and if possible, to run X with incoming TCP connections disabled, since if the attacker can't connect to your X server in the first place, they can't cause the X server to parse the protocol stream incorrectly. This is not a complete defense, as anyone who can connect to the Xserver can still exploit it, so if you're in a situation where the X users don't have root access it won't protect you from them, but it is a strong first line of defense against attacks from other machines on the network.

Releases based on the Solaris Nevada train (including OpenSolaris & Solaris Express), default to “Secure by Default” mode, which disables incoming TCP connections to the X server. Current Solaris 10 releases offer to set the Secure by Default mode at install time. On both Solaris 10 & Nevada, the netservices command may be used to change the Secure by Default settings for all services, or the svccfg command may be used to disable listening for TCP connections for just X by running:

svccfg -s svc:/application/x11/x11-server setprop options/tcp_listen=false

and then restarting the X server (logout of your desktop session and log back in).

On older releases, the “-nolisten tcp” flag may be appended to the X server command line in /etc/dt/config/Xservers (copied from /usr/dt/config/Xservers if it doesn't exist) or in whatever other method is being used to start the X server.

See the Sun Alert for other prevention methods, such as disabling the vulnerable extensions if your applications can run effectively without them.

Wednesday Mar 26, 2008

Too many dry eyes in the house

I've been falling further and further behind on blogging for a while - maybe I'll catch up someday with the cache of posts I have stored in my brain half written (there's a new issue of the X11 DTrace companion, a post on the recent X server security fixes and MIT-SHM regression they introduced, several posts for the OpenSolaris elections which are too late now, and a post or two on X in Indiana bouncing around in there).

This post on the other hand, may actually be early, since Sjögren's Syndrome Awareness Month isn't until April, but I'm posting now since the 2008 Salt Lake City Sjögren's Walkabout is this coming Saturday, March 29, 2008. My mother is one of the organizers, and my sister was going to be one of the walkers until she broke her leg last month. I'm still pledging money to the Sjögren's Syndrome Foundation for both of them though.

For those who haven't heard of Sjögren's — and that included me a few years ago — the article What is Sjögren's Syndrome? and the Wikipedia page on Sjögren's Syndrome can tell you far more than I can. If you're a female who has been noticing problems with dry eyes or mouth, or you know one, you probably want to read more about it, as those can be early symptoms of something far more serious. The foundation estimates 4 million Americans suffer from the disease, and that women are approximately 90% of those.

Even as widespread as that is, it still sounds like one of the random auto-immune disorders they throw out every week on an episode of House and then rule out and you never hear of again. For my wife and I though, it became a permanent part of our lives several years ago when coincidentally both my mother and my wife's mother were diagnosed with it, and they became active in trying to raise awareness of the syndrome in their communities (as you can see in those two articles).

While they've had to adjust to the dryness in many little ways, such as always carrying a water bottle and really appreciating a gift of saliva-inducing Xylitol gum, the constant tiredness is what has really affected them most. They were both busy professionals (a pediatrician and a dental hygenist), active in professional organizations and volunteering in their communities, but have had to cut back as they just didn't have the energy to keep doing it all any more.

Our bodies are mostly water, so you can imagine the problems that happen when various parts start drying up, and that's where the real dangers of the syndrome come in, as the body's immune system starts attacking other systems. While there are treatments available for the effects on some parts of the body, there is no cure, and the Sjögren's Syndrome Foundation is devoted both to research and to spreading the word so that sufferrers can be diagnosed earlier and begin treatments sooner.

So what can you do? If you want to join me in donating, you can do so at either my mom's page or directly on the foundation's web page. If not, just remember what you've read here the next time you hear a woman near you complain her eyes or mouth are always so much dryer than they used to be and pass it on.

Thursday Oct 11, 2007

Stupid nv_74 cursor tricks

For those who have already grabbed Nevada build 74 from the Solaris Express Community Edition downloads, you can try out the new cursor code available with the integration of libXcursor (which most XFree86 and Xorg platforms have had for a while).

If your graphics card supports the Render extension & 32-bit alpha cursors, (i.e. most x86 graphics, but not Sun Rays nor SPARC graphics yet - I've mainly used it on nvidia cards and my laptop with a ATI Radeon chipset), you can see them in action by doing:

% su -
# mkdir -p /etc/dt/config/C/
# cat > /etc/dt/config/C/styleModern
#include "/usr/dt/config/C/styleModern"
Xcursor.theme: whiteglass
# exit

% echo 'Xcursor.theme: whiteglass' >> ~/.Xresources

and then logout & log back in.

Change 'whiteglass' to 'redglass' in the above if you want something that stands out more.

Monday Oct 08, 2007

X Font Server (xfs) Security Hole in Solaris

As noted in the ZDNet posting X Font Server flaw hits Sun Solaris hard, the recently announced X font server vulnerabilities not only affect Solaris, but are exposed to the network by default in some Solaris installs.

What the article fails to mention is that it's only older installs that are vulnerable by default - Solaris versions up through Solaris 10 6/06 run xfs by default from inetd listening to the network. Solaris 10 11/06 and later Solaris 10 releases ask you at install time if you want your network services to default to being open or closed. Solaris Nevada/Express just closes them all by default and requires you to turn back on the ones you want. (These changes came from the Solaris Secure by Default project, which has more information on its project pages.)

Our sustaining teams are producing patches and a Sun Alert covering this issue, but until then, if you don't need the X font server (on Solaris it's really only used for remote desktop sessions from computers without the standard Solaris fonts already installed - unlike some Linux'es, local sessions don't use it), you can easily turn it off in several ways:

  • On all Solaris releases: “/usr/openwin/bin/fsadmin -d”, which will either break the link that inetd uses (Solaris 2.6-Solaris 9) or use inetadm to disable the svc:/application/x11/xfs service (Solaris 10 & later).
  • On Solaris 10 and later, you can do the same thing explicitly with “/usr/sbin/inetadm -d svc:/application/x11/xfs:default”.
  • On Solaris 2.6 through 9, you can do the traditional editing of /etc/inetd.conf to disable it, then “pkill -HUP inetd”.
  • If you'll never need it, and want to be sure it's gone, remove the xfs package with “pkgrm SUNWxwfs”.

Update: Oops, had a typo in one of the instructions above - should have been “pkill -HUP inetd”, not kill. Also, as Paul noted in the comments the Sun Alert is now published, with interim fixes soon to follow, at


Engineer working on Oracle Solaris and with the X.Org open source community.


The views expressed on this blog are my own and do not necessarily reflect the views of Oracle, the X.Org Foundation, or anyone else.

See Also
Follow me on twitter


« August 2015