Wednesday Apr 01, 2009

Blogtalkradio: Sun and Intel talk about OpenSolaris Enhancements for the Xeon 5500

Darrin Johnson (Sun) and Bob Kasten (Intel) on Blogtalkradio talking about the OpenSolaris features and optimizations for the new Xeon 5500 Processor (code named Nehalem). Way to go Darrin and Bob!

Technorati Tags: OpenSolaris

Monday Mar 30, 2009

Power Optimized Thread Placement

In the past I've blogged about the work we've been doing over the last several years to optimize thread placement (that is, on which CPUs threads are scheduled to run) in the face of evolving system and processor architectures.

Indeed, the job of thread placement on modern systems has become quite interesting. Just about every modern processor on the market these days is (at least) multi-core, with many also presenting multiple hardware "threads", "strands", or "Hyper Threads" sharing instruction or floating point pipelines...and then there's shared caches, crypto accelerators, memory controllers... So there's a lot to consider when deciding where (on which logical CPUs) a given handful of threads should execute. Where possible we've tried to avoid having threads fight over shared system resources. If the load is light enough, and enough system resources exist that each thread can have it's own pipeline, cache (or even socket)...that's a pretty good strategy for mitigating potential resource contention.

All this good stuff is made possible by the kernel's Processor Group based CMT scheduling subsystem, which (at boot) enumerates all the "interesting" relationships that exist between the system's logical CPUs...which in turns allows the dispatcher to be smart about how it utilizes those CPUs to deliver great performance.

We (or at least I) didn't realize at the time, but all this work we were doing to make the dispatcher smarter about how it uses the CPUs, also turns out to be really useful for being smart about how you're \*not\* using the CPUs. This means that in addition to optimizing for performance, this same dispatcher awareness can be used to optimize for power efficiency.

As part of the Power Aware Dispatcher project, we extended the kernel's CMT scheduling subsystem to enumerate groups of logical CPUs representing active and Idle CPU Power Management Domains. On x86 systems, these domains are enumerated through ACPI. Being aware of these domains allows the dispatcher to place threads in ways that not only optimize performance for shared system resources, but also maximizes opportunities to power manage CPUs. For example, the dispatcher may try to coalesce light utilization on the system onto a smaller number of power domains (e.g. sockets), thus freeing up other CPU resources in the system to be power managed more deeply. On the Intel Xeon 5500 processor series based systems, this enables us to take better advantage of the processor's deep idle power management features, including deep C-states.

Also, consistent with our goals around the Tickless Kernel Architecture project, the Power Aware Dispatcher is an "event based" CPU power management architecture, which means that all CPU power state changes are driven entirely by utilization events triggered by the dispatcher as threads come and go from the CPUs. One clear benefit of this, is that when the system is idle, there no need to periodically wake up to check CPU utilization (which in itself is inefficient and wasteful). It also means that the kernel can be aggressive about adjusting resource power states (in near real-time) with respect to changes in utilization.

We really like thinking about Power Management as just another piece of Resource Management. By designing efficient resource utilization into the kernel subsystems that deal with power manageable hardware resources...we can be smart about how we utilize the system (for improved performance), and how we \*don't\* use the system (to leverage power management features). The power efficiency results we're seeing with PAD are impressive, and we're really looking forward to building on the PAD work we integrated into build 110 in the months ahead.

Technorati Tags: OpenSolaris

Tuesday Aug 21, 2007

Driving towards efficient performance

For the most part, and especially with respect to data center class server systems, driving the performance component of the price : performance ratio has been our focus. But the economics of the industry are shifting...even evolving, as systems initial purchase price represents a decreasingly significant component of their "total cost of ownership" thanks to rising power and cooling costs. This trend coupled with the realization that overall data center utilization remains low (15% or so), implies the opportunities in this space are enormous.

Although performance remains key, at what cost should that performance be delivered? We \*must\* engineer systems to deliver the performance that Sun / Solaris customers have come to expect while using no more resources than is necessary to do so. Beyond performance, we must deliver efficiency. Therein lies the challenge of Project Tesla

Thursday Jul 26, 2007

OpenSolaris Scheduling and CPU Management at SVOSUG tonight

Tonight i'll be presenting at the Silicon Valley OpenSolaris User Group meeting. I'll be giving an overview of the OpenSolaris dispatcher, scheduling classes, processor abstractions and management tools, and debugging (whew).

Here is the slide deck i'll be using. The meeting will be at the Sun Santa Clara campus auditorium. Alan's blog has the details. Come heckle me if you like... :)

Sunday May 27, 2007

Making Solaris a better Solaris than Solaris

Ian Murdock was the speaker at this month's Silicon Valley OpenSolaris User Group meeting. I had heard from Alan last week that Ian would be speaking about the recently announced Project Indiana, and I wanted to go hear more. The first I had heard about it, was from this Slashdot post, and from the flurry of ensuing discussion on the opensolaris-discuss mailing list. A collegue of mine distilled it particularly well when he said (paraphrasing) that the initial spectrum of reaction was such that some folks were realizing their greatest hopes...while others were realizing their greatest fears. :)

The "Making Solaris a better Linux than Linux" quote referenced in the Slashdot post seems to have elicited a wide range of responses from folks in the community. Some folks have expressed that they don't want to see Solaris "become a better Linux", out of concern that Solaris would lose some of it's differentiating strengths (backward compatibility / stability being a frequently raised example). Others on the thread have pointed out examples of things in the Solaris environment that they feel represent barriers for adoption...which in turn has elicited more debate as to whether those barriers are really barriers, and then more debate still as to how best to deal with them. :)

At the SVOSUG meeting, Ian gave some background describing where he's coming from, why he decided to join Sun to advocate for OpenSolaris, and his vision for Project Indiana. The devil is in the details, and it's pretty clear there are many of them, but the modivation and idea behind Project Indiana (or at least my take on it) seems fairly simple. Provide OpenSolaris with the features it needs to appeal to, and be welcoming of Linux enthusiasts and/or folks who would otherwise reach for a Linux solution.

At the meeting, I said I felt that the goal shouldn't necessarily be to make Solaris a better Linux than Linux..but to make Solaris a better Solaris, such that it appeals to Linux enthusiasts more than Linux itself does. The difference is where you set your sights. I don't believe there's any shortage of opportunity. While OpenSolaris is superior in many ways, I believe it's deficient in others. I note myself carrying around a short mental list of things that (for me) are missing, or deficient in OpenSolaris that I suspect could represent an adoption "show stopper" for someone else. My short list represents the feature gap that exists between where OpenSolaris is, and where (as a developer) I wish it would be.

I suspect that such a list would vary depending on who you ask. For Project Indiana, I would imagine that characterizing what this list would look like from the perspective of a Linux enthusiast, as well as someone who tried (and gave up on) OpenSolaris would be a useful start.

Technorati Tags: OpenSolaris

Sunday May 20, 2007

tick, tick, tick...

Looks like we've got some clock work ahead of us. Over the last year or so, i've been waking up at night, in a cold sweat thinking about how we have but one cyclic/thread firing on one CPU 100 times a second, that does accounting for all threads over all CPUs in the system (ok, not really, but it's something we've been thinking/talking about). As time marches on, we continue to see the logical CPU count (as seen via psrinfo(1M)) in systems grow (especially with the proliferation of multi-core/multi-threaded processors) it's not surprising that the single threaded clock has (or eventually will be) a scaling issue. Implementing clock()'s responsibilities in a more distributed fashion will be an interesting, but important bit of work.

As part of the Tesla Project, we're going to be looking at providing a "scheduled" clock implementation. The clock cyclic currently fires 100 times a second somewhere in the system. From a power management perspective, it would be nice if the clock fired only when necessary (something is scheduled to timeout, scheduled accounting is due, etc). This would allow the CPU on which the clock cyclic fires to potentially remain quiescent much longer (on average), which in turn would mean that the CPU could remain longer (or go deeper) in a power saving state.

It might be that the scaling issue becomes less so if the clock doesn't always have to fire. Then again, this may be one of those "elephant in the living room" type can pretend that it isn't there only so long... :)

Technorati Tags: OpenSolaris

Sunday May 06, 2007

The road ahead...

I'm semi-worried because I should be getting up in a few hours, but the spell of the espresso beans clearly has not i'll keep going. :) Within the last few months, work has wrapped up on our "Processor Groups" project, otherwise known as Multi-Level CMT scheduling optimizations. The putback introduced (among other things) a platform independent abstraction for representing a group of logical CPUs with some physical or characteristic sharing relationship. As of Nevada build 57 (and Solaris 10 update 4), Solaris uses this abstraction to construct groupings of CPUs that have performance relevant hardware sharing relationships, including shared caches, pipelines, pipes to memory, etc. The kernel scheduler/dispatcher then implements load balancing and affinity policy against these groupings in an attempt to maximize throughput and improve cache utilization efficiency.

This infrastructural upgrade has replaced the kernel's legacy CMT scheduling framework (implemented in chip.c/chip.h), which provided for only a single level of optimization (physical processor). The PG infrastructure enables Solaris to optimize for N levels, which is needed in cases where multiple levels/types of hardware sharing exists between logical CPUs in the system. Longer term, we're interested in providing a PG based implementation for the scheduling side of the kernel's NUMA optimizations. In addition to simplifying the implementation, this would potentially get us to the point of a having an affinity/load balancing policy implementation that spans both NUMA and CMT.

The road ahead is an exciting one

Over the next year or two, i'll be focusing my efforts on Solaris platform independent power management policy...which will entail bringing (or coordinating to bring) power management awareness to the platform independent kernel subsystems that deal with power manageable resources. We'll start with the dispatcher. :)

"Tesla" is the code name for the project, which will be run "in the open" via OpenSolaris. Over the last week, i've been working on the logistical aspects of the project (getting content on the page, setting up mail aliases, figuring out the Mercurial SCM, etc.). I'm hoping the project will go live either tomorrow (uh, today) or Tuesday.

Technorati Tags: OpenSolaris

Wednesday Dec 07, 2005

Do-it-yourself Kernel Development Preso

Russ Blaine gave this presentation to the ACM student chapter at Northeastern University. The first part of the presentation describes (in seat gripping detail) our adventures in root causing a \*very\* recent bug:

6348316 cyclic subsystem baffled by x86 softint behavior

It's a nice ride, with lots of sights along the way...the cyclic subsystem, device configuration, autovectored interrupts, and more... The second part of the presentation, is essentially the same as the OpenSolaris presentation Steve presented at UCSD last week, and the last part talks about why the Solaris Kernel Group is such a great place to work, who should work for us, and why.
Technorati Tags: OpenSolaris



« February 2017