XCP 1070 Now Available, with SPARC64-VII Jupiter Support
By Bob Hueston on Apr 25, 2008
- Support for SPARC64® VII processors
In XCP Version 1070, the following new feature is introduced:
One interesting limitation noted in the product notes is:
The above limitation references Change Request CR6619224 Tick accounting needs to be made scalable, which describes the problem in detail:
For Solaris domains that include SPARC64 VII processors, a single domain of 256 threads or more might hang for an extended period of time under certain unusual situations. Upon recovery, the uptime command will show extremely high load averages.
Solaris performs some accounting and bookkeeping activities every clock tick. To do this, a cyclic timer is created to go off every clock tick and call a clock handler (clock()). This handler performs, among other things, tick accounting for active threads. Every tick, the tick accounting code in clock() goes around all the active CPUs in the system, determines if any user thread is running on a CPU and charges it with one tick. This is used to measure the number of ticks a user thread is using of CPU time. This also goes towards the time quantum used by a thread. Dispatching decisions are made using this. Finally, the LWP interval timers (virtual and profiling timers) are processed every tick, if they have been set. As the number of CPUs increases, the tick accounting loop gets larger. Since only one CPU is engaged in doing this, this is also single-threaded. This makes tick accounting not scalable. On a busy system with many CPUs, the tick accounting loop alone can often take more than a tick to process if the locks it needs to acquire are busy. This causes the invocations of the clock() handler to drift in time. Consequently, the lbolt drifts. So, any timing based on the lbolt becomes inaccurate. Any computations based on the lbolt (such as load averages) also get skewed.
The issue of tick scalability has been around for a while. Eric Saxe mentions the issue in his May 21, 2007 blog entry tick, tick, tick.... The change request does say that the problem is already fixed in Solaris Nevada (OpenSolaris) build 81, so hopefully this limitation will be removed with an upcoming patch or release of Solaris 10.