Porting X apps to XCB
By Alanc-Oracle on Oct 08, 2010
I pushed the release of xwininfo 1.1 to the X.Org release archives last week. xwininfo is a command-line utility to print information about windows on an X server. The major new feature of this release is the rewrite to use libxcb instead of libX11 for the connection to the X server.
For those who haven't heard, XCB is a new library (well, new compared to Xlib) to communicate to the X server over the X11 protocol. While Xlib is designed to look like a traditional library API, hiding the fact that calls result in protocol requests to a server, XCB makes the client-server nature of the protocol explicit in its design. For instance, to lookup a window property, the Xlib code is a single function call:
XGetWindowProperty(dpy, win, atom, 0, 0, False, AnyPropertyType, &type_ret, &format_ret, &num_ret, &bytes_after, &prop_ret);
Xlib generates the request to the X server to retrieve the property and appends it to its buffer of requests. Since this is a request that requires a response (many requests, such as those to draw something, do not generate responses from the server unless an error occurs), Xlib then flushes the buffer, sending the contents to the X server, and waits until the server processes all the requests in turn and then sends the response to the client with the property requested. Xlib also provides utility functions to wrap that to retrieve specific properties and decode them, knowing the details of each property and how to request and decode them, such as XGetWMName, XGetWMHints, and so on.
XCB on the other hand, provides functions generated directly from the protocol descriptions, so that they map directly onto the protocol, with separate functions to put requests into the outgoing buffer, and to read results back from them asynchronously later. The xcb version of the above code is:
prop_cookie = xcb_get_property (dpy, False, win, atom, XCB_GET_PROPERTY_TYPE_ANY, 0, 0); prop_reply = xcb_get_property_reply (dpy, prop_cookie, NULL);
The power of xcb is in allowing those two steps to have as much code as you want between them, letting the programmer decide when to wait for data instead of forcing you to wait everytime you make a request that returns data. For instance, xwininfo knows in advance from the command line options most of the data it needs to request from the server for each window, so it can request it all at once, and then wait for the results to start coming in. When using the -tree option to walk the window tree, it can request the data for all the children of the current window at once, batching even further. On a local connection on a single CPU server, this means less context switches between X client and server. On a multi-core/CPU server, it can allow the X server to be processing requests on one core while the client is handling the responses as they become available, better utilizing the multiprocessing nature of the system. And on remote connections, the requests can be grouped into packets closer to the MTU size of the connection, instead of whatever requests are in the buffer when one is made that needs a response.
For xwininfo, when I ported to xcb I tested with a GNOME desktop session on OpenSolaris with a few clients open and ran it as xwininfo -root -all starting at the root of the window hierarchy and climbing down the tree, requesting all the information available for each window along the way. In my sample session it found 114 windows (in X, a window is simply a container for drawing output and receiving events, and often windows in terms of the protocol are subsets of, or borders around, the objects users think of as windows). When running locally on my Ultra 27 with it's 4-core Nehalem CPU, both versions ran so fast (0.05 seconds or less) that the difference in time was too small to really accurately measure. So I used ssh -X to tunnel an X11 connection from my office in California to a server in the Sun office in Beijing, China, and from there back to my desktop, introducing a huge amount of latency. With this, the difference was dramatic between the two:
Xlib: 0.03u 0.02s 8:19.12 0.0% xcb: 0.00u 0.00s 0:45.26 0.0%
Of course, xwininfo is an unusual X application in a few ways:
- It runs through the requests as fast as it can, then exits, not waiting for user input (unless you use the mode where you click on a window to choose it, after which it runs through as normal). Most X applications spend most of their time waiting for user input, so the overall runtime won't decrease as much by reducing the time spent communicating with the X server.
- It uses only the core protocol and shape extension, none of the extensions like Render or Xinput that more complex, modern applications use. Xinput, XKB, and GLX are especially problematic, as those have not yet been fully supported in an XCB release, though support has been worked on through some Google Summer of Code projects.
- It's small enough that a complete reworking to use xcb in one shot was feasible. I couldn't imagine trying that for something as complex as Firefox, Gimp, or Nautilus.
- It used only raw Xlib - no toolkits. Any application programmer who wants to stay sane uses toolkits like Gtk or Qt and helper libraries such as Pango and Cairo to deal with all the things that all applications need to do, and shouldn't have to reinvent - widgets, input methods, accessibility support, interacting with window managers, complex text layout for non-Latin character sets, and so on.
- It doesn't use much of the Xlib helper functions - not a whole lot more than the protocol calls, so is more directly mappable to xcb. Applications that rely on Xlib's input method framework, Compose key handling, character set conversion, or other functions, would be harder to port without duplicating all that work (though the modern toolkits handle most of that in the toolkit layer now anyway). It did rely on the helper functions for converting the window name property from other character sets, and the xcb version right now has a regression in that it only works for UTF-8 and Latin-1 window names, but since most modern toolkits use UTF-8, you may not notice unless you run older applications with localized window names.
Fortunately, XCB also provides a method for incremental conversion from Xlib to XCB, where you can use libX11 to open the display, and pass the Display pointer it returns to all your existing code, toolkits, and libraries, and then when you want to call an XCB function, convert it to an xcb_connection_t pointer for the same connection, allowing mixing calls to Xlib & XCB API's.
This is done by building libX11 as a layer on top of libxcb, so they share the same socket to the X server and pass control of it back and forth. That option was introduced in libX11 1.2, and is now always present (no longer optional) in the libX11 1.4 release coming out this fall (Release Candidate 2 is out now).
So as another example, xdpyinfo also prints a lot of information about the X server, but it calls many extensions, and a lot of its calls aren't blocking on a response from the server. If you add the -queryExt option though, then for every extension it calls XQueryExtension to print which request, event, and error ids are assigned to that extension in the currently running server. Since those are dynamically assigned, and vary depending on the set of extensions enabled in a given server build/configuration, this is critical information to have when debugging X error reports that reference these ids. Using xdpyinfo -queryExt is thus especially needed when reporting an X error message that come from custom error handlers like the one in the gtk toolkit that omit the extension information found in the default Xlib error handler, or the person reading the bug will have no idea which extension you hit the error in.
XQueryExtension takes one extension name at a time, sends a request to the X server for the id codes for that extension, and waits for a response so it can return those ids to the caller. On the Xorg 1.7 server on my Solaris 11 test system, there are currently 30 active X extensions, so that's 30 tiny packets sent to the X server, 30 times the xdpyinfo client blocks in poll() waiting for a response, and 30 times the X server goes through the client handling & request scheduling code before going back to block again on its own select() loop.
A simple patch to xdpyinfo replaced just that loop of calls to XQueryExtension with two loops - the first calling xcb_query_extension for each extension, and then when the entire batch was ready to go to the server, a second loop called xcb_query_extension_reply to start collecting the batched replies. Gathering system call counts with truss -c showed the expected reduction in a number of system calls made by the xdpyinfo client itself:
\* total includes all system calls, including many not shown since their count did not change significantly. There was one additional set of open/mmap/close etc. for loading the added libX11-xcb library.
Over a tcp connection, this reduced both the number of packets, and due to tcp packet header overhead, the overall amount of data:
This sort of change is far more feasible for most applications - finding the hotspots where your user is waiting for data from the server, especially at the startup of the application when you're gathering the information about the server and session to initialize your application with, and converting just those to more efficient sets of xcb calls. This is much like earlier work to reduce latency by converting repeated calls to XInternAtom with a single call to fetch multiple atoms at once via XInternAtoms.
Of course, if you still have to maintain support for older platforms, such as Solaris 10 or RHEL 5, you'll have to keep this code #ifdef'ed, since those platforms won't have xcb support, but availablity is growing on newer systems. (We didn't get it into any of the OpenSolaris distro releases before the end unfortunately, nor the first Solaris 11 Express release, but are working on it for a future Solaris release.)