Ramifications of the Solaris 10 kernel patch 137111
By Giri Mandalika on Nov 01, 2008
A recent code change in Solaris 10 inadvertantly exposed an inherent bug in some of the 32-bit applications that rely on their own memory allocators. Due to this, some of the 3rd party applications which were working earlier without the KU 137111 may crash on Solaris 10/SPARC with the KU 137111 (any revision).
Symptoms & the Cause
It was identified that majority of such application failures are mainly due to the applications' custom memory allocator that incorrectly returns 4-byte aligned mutexes in place of the required 8-byte aligned mutexes. In Solaris,
pthread_mutex_t structures have been defined to be aligned on an 8-byte boundary. Both of those structures contain the
upad64_t member, which is a
double even for the 32-bit applications. The natural alignment of a
double is 8 bytes; and per the SPARC Compliance Definition 2.4, the structures must be aligned according to their strictest member. That is, applications which create 4-byte aligned mutexes are technically non-compliant on Solaris/SPARC (for the sake of simplicity, such code will be referred to as the non-complying code for the remainder of this blog entry).
Due to a change in the implementation of the userland mutexes introduced by CR 6296770 in KU 137111-01, objects of type
pthread_mutex_t must start at 8-byte aligned addresses. If this requirement is not satisfied, all non-compliant applications on Solaris/SPARC may fail with the signal SEGV with a callstack similar to the following one or with similar callstacks containing the function
\*_atomic_cas_64(0x141f2c, 0x0, 0xff000000, 0x1651, 0xff000000, 0x466d90) set_lock_byte64(0x0, 0x1651, 0xff000000, 0x0, 0xfec82a00, 0x0) fast_process_lock(0x141f24, 0x0, 0x1, 0x1, 0x0, 0xfeae5780) ...
Patches & the Next Steps
Note that only non-compliant 32-bit applications will be affected by the KU 137111. All other complying 32-bit applications continue to run as expected even with the KU 137111 - hence the customers, partners, ISVs and the other software vendors must understand the fact that it is not a Solaris issue. Customers running into this issue must work with the respective software vendors to obtain a patch/fix. We suggest the ISVs and the rest of the software vendors to pro-actively check their 32-bit native code for any discrepancies like the one mentioned in this blog entry.
In our testing of some of the enterprise applications, we have identified Oracle's Siebel CRM as one of the potential applications that is vulnerable to the KU 137111. It appears that IBM's Lotus Domino Server is also prone to a crash on Solaris 10 with the same kernet patch. Speaking of these two known cases, Oracle/Siebel and IBM/Lotus Domino customers (running Solaris) should approach Oracle and IBM Corporations respectively but not Sun Microsystems for a proper fix.
As it may take some time for the ISVs / software vendors to identify and fix the non-complying code in their applications, Sun is planning to provide an interim fix to the mutex byte alignment issue in the form of a Solaris kernel patch. As of this writing, we expect the fix to be integrated into the KU 137137-07. The fix is already available in the latest update of the Solaris, Solaris 10 10/08. Those who cannot upgrade to Solaris 10 10/08 from the prior versions of Solaris 10 must wait for the patch KU [Updated 12/07/08]
One must note that the fix in Solaris is a tentative one that allows the non-complying code to run on SPARC hardware for the time being. There is no guarantee that the non-complying code continues to run 'as is' in the future with new Solaris kernel patches and/or major updates/releases of the Solaris operating system. So the best long term solution is for the software vendors to fix the non-compliant code before it is too late.
AcknowledgmentsSteve S and Roger F of Sun Microsystems.