Power Optimized Thread Placement
By esaxe on Mar 30, 2009
Indeed, the job of thread placement on modern systems has become quite interesting. Just about every modern processor on the market these days is (at least) multi-core, with many also presenting multiple hardware "threads", "strands", or "Hyper Threads" sharing instruction or floating point pipelines...and then there's shared caches, crypto accelerators, memory controllers... So there's a lot to consider when deciding where (on which logical CPUs) a given handful of threads should execute. Where possible we've tried to avoid having threads fight over shared system resources. If the load is light enough, and enough system resources exist that each thread can have it's own pipeline, cache (or even socket)...that's a pretty good strategy for mitigating potential resource contention.
All this good stuff is made possible by the kernel's Processor Group based CMT scheduling subsystem, which (at boot) enumerates all the "interesting" relationships that exist between the system's logical CPUs...which in turns allows the dispatcher to be smart about how it utilizes those CPUs to deliver great performance.
We (or at least I) didn't realize at the time, but all this work we were doing to make the dispatcher smarter about how it uses the CPUs, also turns out to be really useful for being smart about how you're \*not\* using the CPUs. This means that in addition to optimizing for performance, this same dispatcher awareness can be used to optimize for power efficiency.
As part of the Power Aware Dispatcher project, we extended the kernel's CMT scheduling subsystem to enumerate groups of logical CPUs representing active and Idle CPU Power Management Domains. On x86 systems, these domains are enumerated through ACPI. Being aware of these domains allows the dispatcher to place threads in ways that not only optimize performance for shared system resources, but also maximizes opportunities to power manage CPUs. For example, the dispatcher may try to coalesce light utilization on the system onto a smaller number of power domains (e.g. sockets), thus freeing up other CPU resources in the system to be power managed more deeply. On the Intel Xeon 5500 processor series based systems, this enables us to take better advantage of the processor's deep idle power management features, including deep C-states.
Also, consistent with our goals around the Tickless Kernel Architecture project, the Power Aware Dispatcher is an "event based" CPU power management architecture, which means that all CPU power state changes are driven entirely by utilization events triggered by the dispatcher as threads come and go from the CPUs. One clear benefit of this, is that when the system is idle, there no need to periodically wake up to check CPU utilization (which in itself is inefficient and wasteful). It also means that the kernel can be aggressive about adjusting resource power states (in near real-time) with respect to changes in utilization.
We really like thinking about Power Management as just another piece of Resource Management. By designing efficient resource utilization into the kernel subsystems that deal with power manageable hardware resources...we can be smart about how we utilize the system (for improved performance), and how we \*don't\* use the system (to leverage power management features). The power efficiency results we're seeing with PAD are impressive, and we're really looking forward to building on the PAD work we integrated into build 110 in the months ahead.
Technorati Tags: Solaris OpenSolaris