Friday Nov 01, 2013

Finding nuggets in ARC discussions

A bit over twenty years ago, Sun formed an Architecture Review Committee (ARC) that evaluates proposals to change interfaces between components in Sun software products. During the OpenSolaris days, we opened many of these discussions to the community. While they’re back behind closed doors, and at a different company now, we still continue to hold these reviews for the software from what’s now the Sun Systems Group division of Oracle.

Recently one of these reviews was held (via e-mail discussion) to review a proposal to update our GNU findutils package to the latest upstream release. One of the upstream changes discussed was the addition of an “oldfind” program. In findutils 4.3, find was modified to use the fts() function to walk the directory tree, and oldfind was created to provide the old mechanism in case there were bugs in the new implementation that users needed to workaround.

In Solaris 11 though, we still ship the find descended from SVR4 as /usr/bin/find and the GNU find is available as either /usr/bin/gfind or /usr/gnu/bin/find. This raised the discussion of if we should add oldfind, and if so what should we call it. Normally our policy is to only add the g* names for GNU commands that conflict with an existing Solaris command – for instance, we ship /usr/bin/emacs, not /usr/bin/gemacs. In this case however, that seemed like it would be more confusing to have /usr/bin/oldfind be the older version of /usr/bin/gfind not of /usr/bin/find. Thus if we shipped it, it would make more sense to call it /usr/bin/goldfind, which several ARC members noted read more naturally as “gold find” than as “g old find”.

One of the concerns we often discuss in ARC is if a change is likely to be understood by users or if it will result in more calls to support. As we hit this part of the discussion on a Friday at the end of a long week, I couldn’t resist putting forth a hypothetical support call for this command:

“Hello, Oracle Solaris Support, how may I help you?”

“My admin is out sick, but he sent an email that he put the findutils package on our server, and I can run goldfind now. I tried it, but goldfind didn’t find gold.”

“Did he get the binutils package too?”

“No he just said findutils, do we need binutils?”

“Well, gold comes in the binutils package, so goldfind would be able to find gold if you got that package.”

“How much does Oracle charge for that package?”

“It’s free for Solaris users.”

“You mean Oracle ships packages of gold to customers for free?”

“Yes, if you get the binutils package, it includes GNU gold.”

“New gold? Is that some sort of alchemy, turning stuff into gold?”

“Not new gold, gold from the GNU project.”

“Oracle’s taking gold from the GNU project and shipping it to me?”

“Yes, if you get binutils, that package includes gold along with the other tools from the GNU project.”

“And GNU doesn’t mind Oracle taking their gold and giving it to customers?”

“No, GNU is a non-profit whose goal is to share their software.”

“Sharing software sure, but gold? Where does a non-profit like GNU get gold anyway?”

“Oh, Google donated it to them.”

“Ah! So Oracle will give me the gold that GNU got from Google!”

“Yes, if you get the package from us.”

“How do I get the package with the gold?”

“Just run pkg install binutils and it will put it on your disk.”

“We’ve got multiple disks here - which one will it put it on?”

“The one with the system image - do you know which one that is?

“Well the note from the admin says the system is on the first disk and the users are on the second disk.”

“Okay, so it should go on the first disk then.”

“And where will I find the gold?”

“It will be in the /usr/bin directory.”

“In the user’s bin? So thats on the second disk?”

“No, it would be on the system disk, with the other development tools, like make, as, and what.”

“So what’s on the first disk?”

“Well if the system image is there the commands should all be there.”

“All the commands? Not just what?”

“Right, all the commands that come with the OS, like the shell, ps, and who.”

“So who’s on the first disk too?”

Picture of Abbott and Costello

“Yes. Did your admin say when he’d be back?”

“No, just that he had a massive headache and was going home after I tried to get him to explain this stuff to me.”

“I can’t imagine why.”

“Oh, is why a command too?”

“No, _why was a Ruby programmer.

“Ruby? Do you give those away with the gold too?”

“Yes, but it comes in the ruby package, not binutils.”

“Oh, I’ll have to have my admin get that package too! Thanks!”

Needless to say, we decided this might not be the best idea. Since the GNU package hasn’t had to release a serious bug fix in the new find in the past few years, the new GNU find seems pretty stable, and we always have the SVR4 find to use as a fallback in Solaris, so it didn’t seem that adding oldfind was really necessary, so we passed on including it when we update to the new findutils release.

[Apologies to Abbott, Costello, their fans, and everyone who read this far. The Gold (linker) page on Wikipedia may explain some of the above, but can’t explain why goldfind is the old GNU find, but gold is the new GNU ld.]

Tuesday Jun 13, 2006

Changing the default login session in dtlogin

At some point over the past few years, I somehow went from just checking in the new login and splash images for CDE in each Solaris release to becoming the unofficial dtlogin “special ops” person - handling the overhauls of the dtlogin appearance for the Solaris 10 beta, 3/05, 1/06, and now 6/06 releases and a few other side tasks that the main CDE sustaining team didn't have the resources to handle. The latest of these is shipping in Solaris Nevada starting in build 39, and I've just gotten the draft of the release note for the upcoming Solaris Express release including it:

Default Desktop Session in dtlogin

Now, when a user logs into the Solaris Desktop for the first time, Java Desktop System (JDS) is the default desktop environment instead of the Common Desktop Environment (CDE). JDS has also become the default environment for users who chose a desktop environment on an older Solaris release that is no longer present in the Solaris release, such as OpenWindows or GNOME 2.0.

System administrators can modify the dtlogin configuration to override the default choices using the defaultDt and fallbackDt resources.

For more information about defualtDt and fallbackDt resources, see the dtlogin(1M)man page.

So users who already chose CDE or JDS will have those choices respected - this only changes the defaults for new users or those whose current session can't be found.

For comparison, the description I wrote for our Architecture Review Committee (ARC) has a lot more technical detail (some of which is now also captured in the above mentioned dtlogin(1M) man page):

This case introduces two new resources in the dtlogin Xresources file
(/{usr,etc}/dt/config/$LANG/Xresources) and establishes initial default
values for those in the Xresources files shipped by Sun.

It requests a patch release binding.

1) Dtlogin\*defaultDt:

   When a new user logs in for the first time, a dialog box is presented
   asking which of the desktop environments installed on the machine they
   wish to use.   The default list currently consists of CDE & JDS, but
   any additional desktops installed using the altDt support in dtlogin
   will appear as well.

   This resource controls which of those desktops is selected by default
   when the dialog appears, so a new user who doesn't know the difference
   and just clicks the OK button will get this desktop, but those who know
   the difference can still choose as appropriate.

   The value of this resource must be the altDtStart value of one of the
   desktop environments installed on the system.   (This was chosen since
   it is the one value required for all desktop environments that must be
   most stable, since it is the one recorded in the lastsession file in
   the users home directory to store their chosen desktop.   The altDtName 
   is localized, so not appropriate for a global default setting.  The
   other values for altDt\* settings are all optional and may not be present
   for certain desktops.)

2) Dtlogin\*fallbackDt:

   When an existing user logs in, but the lastsession file in their home
   directory refers to a program that is not executable on the current 
   system, dtlogin will execute the program listed here as a fallback session.

   This may be any program, including /usr/dt/bin/sdt_firstlogin to offer
   the users a choice of the sessions available on the system.

The default X resources files shipped in Solaris will contain these additions:

!!  Default desktop choice for new users in initial login dialog
!!  This should be the altDtStart key of a session defined here or in
!!  an Xresources.d file.

Dtlogin\*defaultDt:	/usr/dt/config/Xsession.jds

!!  Fallback desktop choice for users whose lastsession file refers to
!!  a non-existent session choice.  Set to /usr/dt/bin/sdt_firstlogin to 
!!  offer the users a choice of the sessions available on the system.

Dtlogin\*fallbackDt:	/usr/dt/config/Xsession.jds


Imported interfaces:

dtlogin lastsession mechanism		Project Private	ASARC/1995/390
/usr/dt/bin/sdt_firstlogin		Project Private	ASARC/1995/390

Exported interfaces:

Dtlogin\*defaultDt resource		Stable
Dtlogin\*fallbackDt resource		Stable

Default settings of new resources	Volatile

By declaring these Stable, we're promising that these resources will take values as described and do what we say until further notice. It doesn't prevent us from further changing or refining this in the future (though since we're really trying to replace dtlogin with gdm in the future, we're not likely to do much), it just means that if we do, we'll have to either change these in a compatible way or create new resources. By declaring the defaults as Volatile though, we're warning that we could change them at any time, without warning - for now they're the Xsession file currently used by JDS, but we could change to another mechanism of starting JDS or even (though highly unlikely) to another desktop altogether, without having to worry about having made any compatibility guarantee there.

The note about a “Patch release binding” indicates that we think these changes are compatible enough with current systems to allow shipping this change in a patch or update release (and in fact, this change is being backported to a Solaris 10 patch to be included in the third Solaris 10 update release, tentatively scheduled for release near the end of this year). These changes don't break any existing systems, don't change anything for existing users with a valid desktop choice, still give new users a choice of desktops, and can even be easily overridden by sysadmins who still really prefer their new users be suggested CDE instead of the newer, easier-to-use GNOME-based JDS desktop.

A lot of people think that binary compatibility guarantees that Solaris provides stifle innovation because we can never change things - this is an example of why that's not the case. We don't guarantee nothing about the system will ever change - that would be foolish - what we do is specify what things you can count on remaining compatible (not unchanged, just compatible), and what you can't - and how long that compatibility is promised for. We've never made any promises in the past about what choice would be shown as the default in the session choice dialog, and the Volatile stability for this reflects that we still aren't - but because that's not someting that affects the managability of Solaris systems or the ability of ISV's to provide applications, it's not something we need to make those sort of promises about. The new resources are declared Stable because we don't want admins to use them and then find when they upgrade to a future release that we've broken their settings or made it so their users can't login until they find and fix the problem.

Right now these promises are mostly expressed via the man pages in Solaris - especially the Attributes section in many man pages. The ARC team is also working to open up the processes through which these are set and the history of many of the past review cases that set important ones, via the OpenSolaris ARC community, so that developers will have greater insight into these compatibility promises and eventually even have a voice in shaping them.

[Technorati Tags: , , , ]

Thursday Jun 02, 2005

Xinerama protocol clashes on Solaris

So now that nVidia has released their Solaris x86 drivers, a problem I've been working on recently will probably become more noticable. The Xinerama calls used by CDE & JDS on Solaris aren't compatible with the protocol used by the Xorg server. Back in the early days of Xinerama, a number of people noticed that the protocol originally delivered in X11R6.4 didn't meet all the needs of application and window manager developers, so proposals were made to extend it. Unfortunately, Xsun adopted an early draft of the proposed standard extensions, while XFree86 adopted a slightly different protocol extension, and meanwhile the standards committee version never got adopted by anyone. Since X11R6.7 and later pulled in the XFree86 version, it's effectively the defacto standard now, while we're working on fixing the standards committee version to be compatible with that so it can be adopted in a future X.Org release.

Unfortunately, this means CDE & JDS on Solaris will both exit on login to a session running on the Xorg server in Xinerama mode, and that now includes TwinView mode with a nVidia dual-head card. It will also cause Mozilla 1.7 on Solaris 10 to exit with an X error when you try to display to a XFree86 or Xorg machine running Xinerama, or to a MacOS X machine using the X server included there (which seems to always advertise the Xinerama extension, even when only a single screen is active).

For now, there are a couple of workarounds you can use:

  • If using the nVidia drivers in TwinView mode, enable this option in xorg.conf, as noted in the nVidia driver readme:
    Option "NoTwinViewXineramaInfo" "1"
  • Use the interposer library Steven Hahn posted in his blog to replace the calls in libXext that result in X errors

A real fix should appear in build 16 of “Nevada” (the current development branch of Solaris, which is released via the Solaris Express program, and soon via OpenSolaris as well), and then in patches for older releases of Solaris. This fix detects which version of the Xinerama protocol is used by the X server and then sends the correct requests for that protocol version.

Since this changes an interface our software uses to interact with other software (both from other groups in Sun and from groups outside Sun), I had to run the proposed change by our Architecture Review Committee (ARC) before putting it into Solaris. The change was simple and straightforward enough that it was easily approved, but since we're working to open up those review processes as part of the OpenSolaris process, I figured I'd post here what I sent to the committee for review, both to explain the fix better and to give a small bit of the flavor of our reviews to help future OpenSolaris developers start understanding how we work. As you can see, for small, simple changes, it's not really a lot of paperwork - this case was a “fast track” so it was mostly a free-form description of the changes, written in whatever way the developer thinks makes it most clear to the reviewers, followed by a simple table summarizing the interfaces.

(The references you'll see to “PSARC/2000/036” refer to the previous review when Sun added the draft standard changes. At the end, where the old protocol is listed as “Obsolete” that's probably closer to what most people think of as “deprecated” - still supported for now for backwards compatibility, but it may not be forever and nothing new should depend on it.)

PSARC/2000/036 introduced a Sun extension to the Xinerama protocol and API
that came from X.Org in X11R6.4.   The case made the prediction that
these extensions would be part of the X.Org standard "in 6 to 8 months."
Unfortunately, that was proven to be wildly optimistic as the standard got
stalled (it has not yet been adopted today, 5 years after the original case),
and the XFree86 open source community added their own requests to the protocol
which used the same request numbers as Sun's addition.   This results in
applications receiving X Errors (for which most take the default error handling
path of exiting) when using the functions in Solaris libXext to query an X
server which implements the XFree86 version of the protocol (including Xorg
on Solaris and virtually every X server on Linux, BSD, and MacOS X).   For
instance, dtwm & metacity will both exit on startup, making it impossible to
login to either CDE or JDS on a Solaris 10 x86 system running Xorg in Xinerama

The two protocol definitions are:

Xsun					XFree86/Xorg
================================	================================
Version: 1.0				Version: 1.1

Requests:				Requests:
---------				---------

From X11R6.4:				From X11R6.4:
0 PanoramiXQueryVersion         	0 PanoramiXQueryVersion
1 PanoramiXGetState             	1 PanoramiXGetState
2 PanoramiXGetScreenCount               2 PanoramiXGetScreenCount
3 PanoramiXGetScreenSize                3 PanoramiXGetScreenSize

Xsun additions:				XFree86 additions:
4 XineramaInfo				4 XineramaIsActive
					5 XineramaQueryScreens

XineramaInfo request:			XineramaIsActive request:
 (size 8 bytes)				 (size 4 bytes)
   CARD8   reqType;			   CARD8   reqType;
   CARD8   xXineramaReqType;		   CARD8   panoramiXReqType;
   CARD16  length B16;			   CARD16  length B16;
   CARD32  visual B32;

A client can therefore determine which protocol is used by the X server
by querying the Xinerama extension version, and using the XFree86/Xorg
protocol if the server reports version 1.1 or higher of the Xinerama

Unfortunately, the server cannot query the client as to which version of
the protocol they support, so in order to determine which protocol to use,
it has to attempt to determine the version based on the requests.  Fortunately,
only a single request is in conflict, and the version of the protocol in use
can be determined by the length field in the request - a 8-byte request is
using the Xsun protocol, while a 4-byte is using the Xorg protocol.

This project, therefore proposes the following steps to restore compatibility
to the Xinerama implementations on Solaris, both with itself and with the
many other OS'es that support the Xinerama 1.1 protocol:

 - The XineramaGetState() API in libXext will be modified to check the 
   Xinerama extension version reported by the X server, and if it is >= 1.1,
   will use the XineramaIsActive request to determine the Xinerama state.
   If it is 1.0, it will continue to use the current method of reporting that
   Xinerama is active whenever the X server reports the extension is present.
 - The XineramaGetInfo() API in libXext will be modified to check the 
   Xinerama extension version reported by the X server, and if it is >= 1.1,
   it will use the XineramaQueryScreens request to get the screen layout.
   If it is 1.0, it will continue to use the current method of using the
   XineramaInfo request to obtain this data.

 - The Xsun server will be modified to report that it supports Xinerama
   protocol 1.1 and handle Xinerama Request 5 requests appropriately.
   It will also be modified to check the length of Xinerama Request 4
   calls, and dispatch appropriately to either the Xsun 1.0 or XFree86 1.1
   protocol handlers.   

 - The Xorg server will continue to support only the 1.1 protocol and
   return a BadLength error when it receives a Xsun XineramaInfo request.
   Once this project is implemented, those requests will only come from
   unpatched Solaris libXext versions, and customers will be instructed
   to install the libXext patch to resolve them.

A future project is planned to deliver the libXinerama library which provides the 
Xinerama API in XFree86/Xorg, but that is not included in this project.

Imported interfaces:

Interface:                           Classification  Defined in / Comments
----------                           --------------  ---------------------
XineramaGetInfo API & Protocol          Evolving        PSARC 2000/036
XineramaGetState API                    Evolving        PSARC 2000/036

Exported interfaces:

Interface:                           Classification  Defined in / Comments
----------                           --------------  ---------------------
Xinerama 1.1 Protocol                 Standard (\*)   X11R6.8 implementation
XineramaInfo Request                  Obsolete

(\*) Not yet formally adopted as a standard specification by the X.Org 
    standards body, but in such widespread use in the open source 
    implementation that incompatible change has been rejected by the community.
[] [] [] []

Engineer working on Oracle Solaris and with the X.Org open source community.


The views expressed on this blog are my own and do not necessarily reflect the views of Oracle, the X.Org Foundation, or anyone else.

See Also
Follow me on twitter


« August 2016