Massive Solaris Scalability for the T5-8 and M5-32, Part 1
By Steve Sistare on Apr 02, 2013
How do you scale a general purpose operating system to handle a single system image with 1000's of CPUs and 10's of terabytes of memory? You start with the scalable Solaris foundation. You use superior tools such as Dtrace to expose issues, quantify them, and extrapolate to the future. You pay careful attention to computer science, data structures, and algorithms, when designing fixes. You implement fixes that automatically scale with system size, so that once exposed, an issue never recurs in future systems, and the set of issues you must fix in each larger generation steadily shrinks.
The T5-8 has 8 sockets, each containing 16 cores of 8 hardware strands each, which Solaris sees as 1024 CPUs to manage. The M5-32 has 1536 CPUs and 32 TB of memory. Both are many times larger than the previous generation of Oracle T-class and M-class servers. Solaris scales well on that generation, but every leap in size exposes previously benign O(N) and O(N^2) algorithms that explode into prominence on the larger system, consuming excessive CPU time, memory, and other resources, and limiting scalability. To find these, knowing what to look for helps. Most OS scaling issues can be categorized as CPU issues, memory issues, device issues, or resource shortage issues.
CPU scaling issues include:
- increased lock contention at higher thread counts
- O(NCPU) and worse algorithms
Memory scaling issues include:
- working sets that exceed VA translation caches
- unmapping translations in all CPUs that access a memory page
- O(memory) algorithms
- memory hotspots
Device scaling issues include:
- O(Ndevice) and worse algorithms
- system bandwidth limitations
- lock contention in interrupt threads and service threads
Resource shortages occur when too many CPUs compete for a finite set of resources. Sometimes the resource limit is artificial and defined by software, such as for the maximum process and thread count, in which case the fix is to scale the limit automatically with NCPU. Sometimes the limit is imposed by hardware, such as for the number of MMU contexts, and the fix requires more clever resource management in software.
Next time I will provide more details on new Solaris improvements in all of these areas that enable superior performance and scaling on T5 and M5 systems. Stay tuned.