Monday Nov 24, 2008

Zonestat v1.3

It's - already - time for a zonestat update. I was never happy with the method that zonestat used to discover the mappings of zones to resource pools, but wanted to get v1.1 "out the door" before I had a chance to improve on its use of zonecfg(1M). The obvious problem, which at least one person stumbled over, was the fact that you can re-configure a zone while it's running. After doing that, the configuration information doesn't match the current mapping of zone to pool, and zonestat became confused.

Anyway, I found the time to replace the code in zonestat which discovered zone-to-pool mappings with a more sophisticated method. The new method uses ps(1) to learn the PID that each zone's [z]sched process is. Then it uses "poolbind -q <PID>" to look up the pool for that process. The result is more accurate data, but the ps command does use more CPU cycles.

While performing surgery on zonestat, I also:

  • began using the kstat module for Perl, reducing CPU time consumed by zonestat
  • fixed some bugs in CPU reporting
  • limited zonename output to 8 characters to improve readability
  • made some small performance improvements
  • added a $DEBUG variable which can be used to watch the commands that zonestat is using
With all of that, zonestat v1.3 provides better data, is roughly 30% faster, and is even smaller than the previous version! :-) You can find it at http:://

Friday Sep 05, 2008

got (enough) memory?

DBAs are in for a rude awakening.

A database runs most efficiently when all of the data is held in RAM. Insufficient RAM causes some data to be sent to a disk drive for later retrieval. This process, called 'paging' can have a huge performance impact. This can be shown numerically by comparing the time to retrieve data from disk (about 10,000,000 nanoseconds) to the access time for RAM (about 20 ns).

Databases are the backbone of most Internet services. If a database does not perform well, no amount of improvement of the web servers or application servers will achieve good performance of the overall service. That explains the large amount of effort that is invested in tuning database software and database design.

These tasks are complicated by the difficulty of scaling a single database to many systems in the way that web servers and app servers can be replicated. Because of those challenges, most databases are implemented on one computer. But that single system must have enough RAM for the database to perform well.

Over the years, DBAs have come to expect systems to have lots of memory, either enough to hold the entire database or at least enough for all commonly accessed data. When implementing a database, the DBA is asked "how much memory does it need?" The answer is often padded to allow room for growth. That number is then increased to allow room for the operating system, monitoring tools, and other infrastructure software.

And everyone was happy.

But then server virtualization was (re-)invented to enable workload consolidation.

Server virtualization is largely about workload isolation - preventing the actions and requirements of one workload from affecting the others. This includes constraining the amount of resources consumed by each workload. Without such constraints, one workload could consume all of the resources of the system, preventing other workloads from functioning effectively. Most virtualization technologies include features to do this - to schedule time using the CPU(s), to limit use of network bandwidth... and to cap the amount of RAM a workload can use.

That's where DBAs get nervous.

I have participated in several virtualization architecture conversations which included:
Me: "...and you'll want to cap the amount of RAM that each workload can use."
DBA: "No, we can't limit database RAM."

Taken out of context, that statement sounds like "the database needs infinite RAM." (That's where the CFO gets nervous...)

I understand what the DBA is trying to say:
DBA: "If the database doesn't have sufficient RAM, its performance will be horrible, and so will the performance of the web and app servers that depend on it."

I completely agree with that statement.

The misunderstanding is that the database is not expected to use less memory than before. The "rude awakening" is modifying one's mind set to accept the notion that a RAM cap on a virtualized workload is the same as having a finite amount of RAM - just like a real server.

This also means that system architects must understand and respect the DBA's point of view, and that a virtual server must have available to it the same amount of RAM that it would need in a dedicated system. If a non-consolidated database needed 8GB of RAM to run well in a dedicated system, it will still need 8GB of RAM to run well in a consolidated environment.

If each workload has enough resources available to it, the system and all of its workloads will perform well.

And they all computed happily ever after.

P.S. Memory needs of consolidated systems require that a system running multiple workloads will need more memory than each of the unconsolidated systems had - but less than the aggregate amount they had.

Considering that need, and the fact that most single-workload systems were running at 10-15% CPU utilization, I advise people configuring virtual server platforms to focus more effort on ensuring that the computer has enough memory for all of its workloads, and less effort on achieving sufficient CPU performance. If the system is 'short' on CPU power by 10%, performance will be 10% less than expected. That rarely matters. But if the system is 'short' on memory by 10%, excessive paging can cause transaction times to increase by 10 times, 100 times, or more.

Tuesday Oct 16, 2007

Solaris Training at LISA'07

It's time for a "shameless plug"...

If you would like to develop deeper Solaris skills, LISA'07 offers some excellent opportunities. LISA is a conference organized by Usenix, and is intended for Large Installation System Administrators. This year, LISA will be held in Dallas, Texas, November 11-16. It includes vendor exhibits, training sessions and invited talks. This year the keynote address will be delivered by John Strassner, Motorola Fellow and Vice President, and is entitled "Autonomic Administration: HAL 9000 Meets Gene Roddenberry."

Many tutorials will be available, including four full-day sessions focusing on Solaris:

  • Solaris 10 Administration Workshop - by Peter Galvin and Marc Staveley
  • Solaris 10 Security Features Workshop - by Peter Gavlin
  • Solaris 10 Performance, Observability & Debugging - by Jim Mauro
  • Resource Management with Solaris Containers - Jeff Victor (yes, me)
My session covers the concepts, uses and administrative interfaces of Solaris Resource Management, focusing on Solaris Containers and Projects.

Early-bird registration ends this Friday, October 19 and saves $Hundreds compared to the Procrastinator's Rate .

Thursday May 10, 2007

Virtual DoS

No, not that DOS. I'm referring to Denial-of-Service.

A team at Clarkson University including a professor and several students recently performed some interesting experiments. They wanted to determine how server virtualization solutions handled a guest VM which performed a denial-of-service attack on the whole system. This knowledge could be useful when virtualizing guests that you don't trust. It gives you a chance to put away the good silver.

They tested VMware Workstation, Xen, OpenVZ, and Solaris Containers. (It's a shame that they didn't test VMware ESX. VMware Workstation and ESX are very different technologies. Therefore, it is not safe to assume that the paper's conclusions regarding VMware Workstation apply to ESX.) After reading the paper, my conclusion for Solaris Containers is "they have non-default resource management controls to contain DoS attacks, and it's important to enable those controls."

Fortunately, with the next update to Solaris 10 (due this summer) those controls are much easier to use. For example, the configuration parameters used in the paper, and shown below, limit a Container's use of physical memory, virtual memory, and amount of physical memory which can be locked so that it doesn't get paged out:

add capped-memory
 set physical=128M 
 set swap=512M 
 set locked=64M 
Further, the following parameters limit the number of execution threads that the Container can use, turn on the fair-share scheduler and assign a quantity of shares for this Container:
set max-lwps=175 
set scheduling-class=FSS 
set cpu-shares=10
All of those parameters are set using the zonecfg(1M) command. One benefit of the centralization of these control parameters is that they move with a Container when it is moved to another system.

I partly disagree with the authors' statement that these controls are complex to configure. The syntax is simple - and a significant improvement over previous versions - and an experienced Unix admin can determine appropriate values for them without too much effort. Also, a GUI is available for those who don't like commands: the Solaris Container Manager. On the other hand, managing these controls does require Solaris administration experience, and there are no default values. It is important to use these features in order to protect well-behaved workloads from misbehaving workloads.

It also is a shame that the hardware used for the tests was a desktop computer with limited physical resources. For example it had only one processor. Because multi-core processors are becoming the norm, it would be valuable to perform the same tests on a multi-core system. The virtualization software would be stressed in ways which were not demonstrated. I suspect that Containers would handle that situation very well, for two reasons:

  1. There is almost no overhead caused by the use of Containers - the workload itself does not execute any extra code just because it is in a Container. Hypervisor-like solutions like Xen and VMware have longer code paths for network and storage I/O than would occur without virtualization. The additional cores would naturally support more guests, but the extra code paths would limit scalability. Containers would not suffer from that limitation.
  2. Solaris has extraordinary scalability - efficiently running workloads that have hundreds or thousands of processes on 144 cores in a Sun Fire E25K. None of the other solutions tested for this paper have been shown to scale beyond 4-8 cores. I booted 1,000 Containers on an 8-socket system.

Also, the test system did not have multiple NICs. The version of Solaris that was used includes a new feature called IP Instances. This feature allows a Container to be given exclusive access to particular NIC. No process outside that Container can access that NIC. Of course, multiple NICs are required to use that feature...

The paper Quantifying the Performance Isolation Properties of Virtualization Systems will be delivered at the ACM's Workshop on Experimental Computer Science.


Jeff Victor writes this blog to help you understand Oracle's Solaris and virtualization technologies.

The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.


« February 2017