The Ten Percent Solution

The Ten Percent Solution

How do you make a slow, expensive platform look like it's competitive with any other platform? This can be difficult, especially if you don't have (or don't dare have) public open benchmarks such as those at SPEC (Standard Performance Evaluation Corporation) or TPC (Transaction Processing Performance Council). This is the problem facing IBM with the announcement of their new System z10 mainframe. How do you make costly, proprietary hardware look like it can compete on a price/performance basis?

One way is to play tricky games with numbers. Unfortunately, that's what it seems IBM has done with in its z10 announcement. If you look at http://www-03.ibm.com/press/us/en/pressrelease/23592.wss you will see the following text

The z10 also is the equivalent of nearly 1,500 x86 servers, with up to an 85% smaller footprint, and up to 85% lower energy costs. The new z10 can consolidate x86 software licenses at up to a 30-to-1 ratio. \*3
and a pointer to footnote 3:
3 Source: On Line Transaction Processing Relative Processing Estimates (OLTP-RPEs): Derivation of 760 Sun X2100 2.8 Opteron processor cores with average OLTP-RPEs per Ideas International of 3,845 RPEs and available utilization of 10% and 20 RPEs equating to 1 MIPS compared to 26 z10 EC IFLs and an average utilization of 90%.s

This even comes up on an IBM blog at http://www-128.ibm.com/developerworks/blogs/page/benchmarking?entry=back_to_the_future_with#comments, where a reader asks how the calculations work, and is told:

The 30 to 1 claim is based on 760 X2100 cores to 26 z10. The 760 to 26 is based on 3845 RPEs at 10% = 384.5 RPEs is approximately equivalent to the number of z10 RPEs at 90% when you use 20 RPEs equal to 1 MIP where MIPS are based on the LSPR curve for the z10.

Games People Play

When I saw this, I thought "what kind of nonsense is this?" The whole idea of benchmarks is to determine the capacity of a system under load - not to say "assume 100% of the capital and operational costs of the server, and then only use 1/10th of it". That's crazy. Or crazy like a fox. (Not to mention, there's no such thing as a "MIP". All true mainframe performance guys know that).

Let's illustrate how this works: they take a consultancy's benchmark (instead of an open one like those at SPEC and TPC), and then specify that the competitor's machines (that's ours, by the way) should be evaluated at 10% CPU busy, while they specify that their own machines are 90% busy. With this "just because we say so" trick they use 9 times as much of their machine as they do of ours. Play games like this, and they can make yourself look a lot better than they are. On top of that, they base their figures on extrapolating an IBM-only benchmark (the "LSPR" referred to) to estimate the difference between the z9 and z10. That's a really shaky limb to go out on. There's not a lot of science here.

Work the numbers

Let's do some math and convert that into prices. Assume for the moment that the price of a z10 IFL is the same as the $125,000 per IFL on the z9. It's not easy to tell, since IBM isn't exactly transparent about pricing and I couldn't find it on the site. So, 26 times $125,000 is $3,250,000. They say 760 x2100 CPU cores, which I can price on sun.com at $1,189 for 2 cores and 1GB of RAM. I can add 500GB of disk for $359, but let's hold that for a moment. Now, using IBM's "ten percent solution", 760 cores requires 380 servers at $1,189, for a cost of $451,820. Hey, that's not bad. Even with the bogus 10% trick, Sun still has a 7 to 1 price/performance advantage over the IBM z10 mainframe. Fantastic, huh? Even with that little deceptive trick, the z10 is trailing in the dust. You did notice that IBM didn't actually put prices on their web pages, even with the 10% trick, right?

Now let's make it more realistic :-)

Database servers are just as likely to be 90% CPU busy on one platform as on another - that's just a question of not overprovisioning. So let's work the numbers fairly so the x2100 systems are no more overprovisioned than the z10. Instead of 760 cores at 10% busy, that would be 84.4 cores at 90%, which I'll round up to 86. Two cores per server: 43 servers. A rack. So, a set of Sun x2100s provisioned for actual capacity is 43 servers times $1,189 for a total of $51,127. That leaves Sun with a 63 to 1 price performance advantage over the z10. That's actually pretty consistent with mainframe-to-SPARC and mainframe-to-x64 conversions I've been involved in. Do you wonder why I got out of the mainframe side of computing after becoming an expert there? This is part of the reason.

And it gets worse (for IBM, that is). I didn't include the disk drives, which will be much more expensive on the mainframe side. Nor did I include the operating systems: For the mainframe you will need 26 CPU licenses of z/VM, which will be about $20,000 per CPU, for a total of $520,000. There's a formula for this, but that's close enough for a quick estimate, and excludes subsequent maintenance. We provide Solaris "right to use" for free, and I assure you our software maintenance is a heck of a lot less. Plus, on the mainframe, you'll license Linux at a cost of $15,000 or more per CPU for a one-year license (see Novell and Red Hat pricing) for another $390,000 per year.

Your first year cost for the IBM solution is $3,250,000 for the server (which by the way, just gave you a single point of failure!), $520,000 for z/VM, and an annual $390,000 for Linux. Ours is a few orders of magnitude less expensive, starting with $51K for the servers, and continues that way.

Advance disclaimer

It's late and I may have made a few mistakes. This is all "back of the envelope" calculation. But, when there's orders-of-magnitude differences in platform price, a few percent error one way or the other really doesn't matter. If I made a mistake - it's happened! - I apologize for it in advance, and will correct as needed.

Integrity in what you say

Let me make this clear: I made my living for years as a mainframe system performance expert. I ran mission critical, performance-sensitive systems, consulted with major companies, wrote books, and taught classes on mainframe performance. I stay current. I've run mainframe Linux, and taught a performance class on that, too. I know what is a fair comparison and what is "benchmarketing" and trying to game the system. If the numbers worked out differently, I would report them and let the chips (no pun intended) fall where they may.

Which reminds me. I Am Not A Lawyer, but I have to imagine that IDEAS (the consultancy whose benchmark IBM used) specifies contracted terms and conditions that state how their benchmarks can be reported and used. I expect that there's a contract stipulating that RPE is a proprietary metric and restricted in use. IANAL, but I would be unsurprised if IBM used it without IDEAS' permission, and may be in violation of IDEAS copyright or usage terms.

The moral of the story

There are several:

  • Use open, standard benchmarks, such as those from SPEC and TPC.
  • Read and understand what they measure, instead of just accepting them uncritically.
  • Get the price-tag associated with the system used to run the benchmark.
  • Relate benchmarks to reality. Nobody buys computers to run Dhrystone.
  • Don't permit games like "assume the other guy's system is barely loaded while ours is maxed out". That distorts price/performance dishonestly.
  • Don't compare the brand-new machine to the competitor's 2 year old machine
  • Insist that your vendors provide open benchmarks and not just make stuff up.
  • Be suspicious!
<script type="text/javascript"> var sc_project=6611784; var sc_invisible=1; var sc_security="4251aa3a"; </script> <script type="text/javascript" src="http://www.statcounter.com/counter/counter.js"></script>
Comments:

The statement says "up to a 30-to-1 ratio" - that's up to, not will always be 30-to-1. Many smaller scale servers are running at low - mid capacity, still requiring licenses but not utilizing all the hardware and are good candidates for consolidation.

If your x86 servers are at 90-100% capacity then obviously consolidation is not going to yield great benefit.

P.S. I am not in the hardware side of the business

Posted by David Bell on March 05, 2008 at 01:29 PM MST #

Excellent blog! How the hell does IBM get away with this???

May this rank in the top ten google searches!! ;-)

Posted by Phil on March 05, 2008 at 06:16 PM MST #

David, I appreciate your point, and if IBM had included your caveats, I would have little to complain about. As it is, I find the "up to 30-to-1" just to be weasel wording to make it technically not a falsehood. "Up to 30" includes "0" and "1".

The clear intention is to be deceptive: The press announcement repeatedly uses language "A single z10 offers the computing power of 1,500 PC-style servers" which simply is not true. At the very least, IBM should prove this claim by submitting to real benchmarks, rather than extrapolating from benchmarks that are unrelated to one another. If IBM really believes their systems are that powerful, they've had the chance to prove it for years. They aren't shy about running those very same SPEC and TPC benchmarks on POWER or x86 when they have a competitive offering. It's only on their proprietary systems they avoid this, and it's easy to see why.

The 10% busy factor is nonsense when describing OLTP servers during prime shift, and using that strikes me as dishonest. Average CPU utilization is completely meaningless (over what period? All 24 hours of the day? Peak hour? Weekends included?) - what you have to size for is peak loads. Sizing systems based on unspecified averages is a recipe for disaster, and making advertising claims based on that disingenuous. Why not just turn off the x86 servers and say that the z is as powerful as an infinite number of them?

Fun fact: An old IBM rule of thumb for mainframe OLTP capacity planning (like CICS) was to provision for peak to average ratio of 2.5:1 for heaviest day of the month, and 2.5:1 of heaviest hour of the day, and then leave headroom. If you plan for 90% CPU busy at your peak hour of your peak day, that means your average CPU% is 90%/(2.5\*2.5) = 14.4% That's right: your mainframe average CPU% would be under 15% Few places followed this because it was so expensive to do so, but I saw that written as an old-school "best practice". (You weren't always lucky enough to have a low-priority CPU cycle-soaker to drive up averages.) Average CPU% is bunk

Phil: thanks for the kind words!

To both: thanks for the comments, whether we agree or not.

regards, Jeff

Posted by Jeff Savit on March 06, 2008 at 07:07 AM MST #

Personally, I think the z10 is a remarkable machine. A 4.4GHz superscaler 960 million transistors quad core is a pretty solid piece of technology but almost anyone's standard!

Can the z10 acheive all that the marketing message stated? Will your mileage vary? Can you run your business with benchmarks? You've made some sound points.

My job within IBM is to help my customers determine if and when a mainframe solution makes sound business sense.

When I engage with my customers we get very pragmatic and we do the hard work of discerning the value proposition of a z10 in comparison to (or in addition to) other server platforms.

The server consolidation and virtualization space is an exciting space to work in, and the z10 has made it even moreso. The z10, for server consolidation workloads that are ameanable to its architecture is often a great fit for vast consolidation potential via server virtualization (the marketing folks sometimes call it hyper-virtualization... but they're marketing people... that's their job... to get people thinking).

Is the z10 always a great fit for a customer? No. The z9 at 1.7GHz did a good job with server consolidations via virtualization given its respectable processor and strong internal bandwidth (throughput) characteristics. The z10 simply encompasses a greater number of good-candidate workload types for which overall economical virtualization to the z10 can be quantified and justified. Simply stated, the z10 processor is about twice as powerful as the z9... hence, more "candidates".

The "work" I mentioned earlier, your commentary touches on, but I'll elaborate a bit. To your point, the z10 is expensive hardware ... no doubt. In the many dozens of deep customer engagements I've worked on the mainframe has ALWAYS yielded the most expensive hardware cost. But not to be flip... but who cares?!?! A sound business decision rests in all the money leaving the company... the total cost of a solution!

When we do "the work" we consider the total hardware cost, but also the total cost of software and storage (which is the same unit cost and usually lower total cost on the mainframe in comparison to distributed servers) and energy and floorspace and disaster recovery and administration. We consider such cost factors not just for the production server(s), but for all the servers incorporated into the lifecycle support (dev, test, qa, prod, et.al.) and architectural topology (presentation, app server, database tiers) of a project.

And once we get the blocking and tackling costing done... we start the hard stuff. Like the cost and value of high availability (or lack thereof). The cost and value of rapid provisioning. The cost and value of (rapid) technology refresh cycles. And whatever else the customer "values" in their technology decision making (i.e. what is their heritage, their skills, their predilections, their risk vs quality vs time tradeoff posture).

A "thoughtful" analysis of server consolidation and virtualization is an interesting "process". Not one that is particularly difficult, or hugely time consuming, but one that needs to be approached with an open mind and a good touch of common sense.

IBM offers more then one server type. A few other server vendors do same. The reason we do so is that when a customer is driven (and more and more are) to the "Nth" degree of IT optimization, then matching a server's capabilities to an application workloads requirements becomes the very task at hand... i.e. the "job" of IT. When an IT shop engages me, I apply a methodology that usually yields not a one-box-fits-all mentality, but an optimized mixture of servers suiting that business's mixture of server workloads. Does the z10 sometimes get a share of that work? Yes!!!!

Hype is hype. Marketing messages are marketing messages... and I can defend and refute and/or correct many (but certainly not all) of your criticisms of the IBM marketing messages at hand (you have gotten a bit dated), but I don't suspect such would yeild your audience much value). In the end, we just gotta do the work.

Posted by Monte Bauman on March 11, 2008 at 01:34 AM MST #

Monte,

I'm glad to give you the opportunity to respond, to which I respond as well.

Using your expression "not to be flip", Who cares how many transistors the chip contains? That's as close to "golly gee!" irrelevant as one can be. A distraction from what counts.

Note, IBM claims only a 50% faster CPU over the z9 despite a 259% increase in clock rate. This proves that pushing clock speeds as IBM did is a performance dead-end, adding costs for only diminishing returns. It gets the "2X" performance by using more CPUs than on the z9, and referencing a proprietary LSPR benchmark for z/OS. Which is - guess what - irrelevant and misleading when talking about z/Linux!

And guess what - in the time since z9 came out, everybody else also came out with bigger and faster systems! They also came out with great virtualization technology, too.

Contrary to what you say, hardware costs do matter. So does software cost. So does actual performance, which IBM doesn't disclose in any transparent way. In all of those cases, Sun (and in fairness, IBM's other platforms) are far less expensive, and far less complicated. Look at http://www.itjungle.com/big/big110706-story01-fig01.html
It's a little old now, but it shows z9 having a cost 20 to 60 times higher than all other alternatives, including IBM's System i and System p. I am sure the economics are still the same. A few % here and there would be fine, but when it's "Z costs several times as much", then it's a real problem.

That's a reality IBM prefers people not be aware of, nor the incredible margins IBM makes on monopoly priced hardware.

Also, let's stipulate: a fair comparison in competitive situations is between contemporary equipment managed equally well. Anybody's 2008 computer beats up anybody else's 2002 computer! It's fair to use that for cost justification of a tech refresh project, but it is deceptive when comparing to competitor technology of the same time period. No - fair is "my 2008 systems, managed well compared to your 2008 systems, managed well".

That's why IBM has to resort to "10% solution" tricks to artificially make their numbers look bad. I wish you had touched on that, since I think it a shame that IBM tries to use that tactic.

IBM also would prefer that customers not be aware that mainframes are no longer the biggest iron for any workload. Sun's high-end servers are much more powerful than the z10, and so are IBM's biggest System p. That includes I/O, which I'll touch on in a future article.

What really hurts is that z/VM is no longer the only virtualization technology around. Ten years ago, IBM had an exclusive (which they neglected for years). Now there's VMware, Xen, Sun xVM, LPARs, dLPARS, vPARS, Logical Domains, Solaris Containers and more (to name products from VMware, IBM, HP, and Sun), several of which have leap-frogged the original VM in functionality.

And let's be serious: the simplest, lowest risk consolidation path for low-utilization Unix, Linux, and Windows servers is to consolidate them using the virtualization capabilities native to that platform, rather than do a brain transplant porting them to expensive iron.

So, while the massive consolidation you speak of is valuable, there's absolutely no reason to pay a premium price on what you admit is the most expensive hardware out there. Customers can get all the benefits you mention (license consolidation, space, and environmentals) from a variety of competing technologies, for a fraction of the cost. For example, I can snap off a newly provisioned Logical Domain, or a new virtual machine in VMware or Sun xVM in seconds. It's easy, and I don't have to spend a fortune to get it done. For another example: everybody has some form of "inter-guest" high-speed networking. It's not unique to IBM "hipersockets" (sic), so why pay extra for it. And so on...

Now, customers don't spend extra for massive consolidation. They can save money using robust, open technologies (from multiple vendors!) without resorting to the most expensive platform out there.

That's the reality.

Posted by Jeff Savit on March 11, 2008 at 08:29 AM MST #

Gross generalizations are a disservice to people who seek to understand, hence a discussion thread like this one is a hard way to make for a good debate. But I will aspire to answer your questions (and I'll do so without putting words in your mouth, as you've done to me ;-).

LSPR (large systems performance reference) is not a benchmark, but a family of benchmarks. The information output by LSPR is provided to help mainframe customers with capacity planning efforts. LSPR of old was an MVS (now z/OS) oriented "set" of benchmarks. The new LSPR includes z/VM and Linux and WAS and other benchmarks reflective of the workloads our customers run (and need to do capacity planning for) every day. Your dislike for LSPR may well be misplaced, as it is likely the best example of a family of benchmarks that provides its users with accurate and tailorable information for the (sometimes) complex capacity planning work that they need to do. Per LSPR being "proprietary" ... well sorta, the LSPR workloads are "mainframe binaries" (to simplify things) so unless you have a mainframe then the LSPR workloads aren't going to run there. That is, LSPR in the past was run on HDS and Amdahl and any other "mainframe-compatible" servers when they existed. Thus, LSPR is perfectly relevant and is not the least bit misleading for Linux on System z capacity planning efforts.

Its not like SUN has not done the same sort of thing (a SUN-only performance metric for comparisons admidst only other SUN servers).

You might ask why System z does not publish any of the atomic benchmarks that are popular in the distributed server space. The reason is pretty simply ... none are reflective of what mainframe customers do with mainframes (is it so different for distributed servers?). Further, in the mainframe space, we do not worry so much about "performance" as we do "throughput". We focus on optimizing transactions per second (while maintaining appropriate and consistent seconds per transaction) against "total cost of ownership". The mainframe (particularly z/OS) is uniquely capable of managing a system to that end.

Per "dead ends" ... I would not worry about performance "dead ends" on IBM servers. The $5B+ invested in R&D annually and the many 1000s of patents yielded are good measure of our continuing ability to address technology constraints that we encounter. Further, the deep integration of our System p and System z engineering teams should yield high end server customers remarkable choices starting now w/the z10 and moving forward with the highend System p POWER6 servers coming out soon.

Per the message that a bunch of low utilization servers might be consolidated onto a single high utilization server ... what's the big deal? You said it yourself, even SUN can do that.

I'm not sure why you are so upset at the notion and the marketing message being stated. One of the biggest IT optimization opportunities at hand is addressing the server sprawl from past year's of distributed physical server depolyments. The marketing message at hand is the simply one that many many of those servers are running at low utilization rates (10% is a nice round number selected to be indicative of such)and as such those low utilization servers are candidates for virtualization to "high-utilization-capable" centralized hosting servers... for example, an IBM z10 Enterprise Class mainframe (yes, there are other examples).

You are right, every server vendor has noted that marketplace movement towards simplification. Every vendor now has server virtualization solutions. Some now on their 2nd or 3rd try.

You are right, the mainframe has been exercising virtualization technology since (at least) 1967. That technology continues in the current iteration of the z/VM operating system and the deep and wide integration in the z10's virtualization hardware services (aka SIE and MIF and others). I suppose that is why our virtualization value proposition on "z" is so different then the rest of the industry. For we too can "snap" a virtual machine very rapidly (6 seconds or so for a completely configured virtual server and total OS and middleware software stack). And using that snap rate we can snap away for several minutes straight before shopping for another server. Said another way, we regularly see virtualization implementations of 100's to many 100's of virtual machines doing real work on a single (EXTREMELY EXPENSIVE ;-) mainframe from the (HIGHLY AFFORDABLE!) z9 family of "open" systems servers (z10's are just now beginning to ship).

And you're right, a BILLION transistors on a chip does not really matter (or does it?)

Posted by guest on March 12, 2008 at 05:57 AM MST #

Some clarifications and answers to your questions...

LSPR (large systems performance reference) is not a benchmark, but a family of benchmarks. The information output by LSPR is provided to help
mainframe customers with capacity planning efforts. LSPR of old was an MVS
(now z/OS) oriented "set" of benchmarks. The new LSPR includes z/VM and
Linux and WAS and other benchmarks reflective of the workloads our customers run (and need to do capacity planning for) every day. Your dislike for LSPR may well be misplaced, as it is likely the best example of a family of benchmarks that provides its users with accurate and tailorable information for the sometimes) complex capacity planning work that they need to do. Per LSPR being "proprietary" ... well sorta, the LSPR
workloads are "mainframe binaries" (to simplify things) so unless you have
a mainframe then the LSPR workloads aren't going to run there. That is, LSPR in the past was run on HDS and Amdahl and any other
"mainframe-compatible" servers when they existed. Thus, LSPR is perfectly
relevant and is not the least bit misleading for Linux on System z capacity
planning efforts.

It is not like SUN has not done the same sort of thing (a SUN-only performance metric for comparisons admidst only other SUN servers).

You might ask why System z does not publish any of the atomic benchmarks that are popular in the distributed server space. The reason is pretty simply ... none are reflective of what mainframe customers do with
mainframes. Further, in the
mainframe space, we do not worry so much about "performance" as we do
"throughput". We focus on optimizing transactions per second (while maintaining appropriate and consistent seconds per transaction) against "total cost of ownership". The mainframe (particularly z/OS) is uniquely capable of managing a system to that end.

For an example of a mainframe-style benchmark effort, go here:
http://www.enterprisenetworksandservers.com/monthly/art.php?2976

Per "dead ends" ... I would not worry about performance "dead ends" on IBM servers. The $5B+ invested in R&D annually and the many 1000s of patents yielded are good measure of our continuing ability to address technology constraints that we encounter. Further, the deep integration of our System p and System z engineering teams should yield high end server customers remarkable choices starting now w/the z10 and moving forward with the highend System p POWER6 servers coming out soon.

Per the message that a bunch of low utilization servers might be consolidated onto a single high utilization server ... what's the big deal?
You said it yourself, even SUN can do that.

I'm not sure why you are so upset at the notion and the marketing message
being stated. One of the biggest IT optimization opportunities at hand is
addressing the server sprawl from past years and years of distributed physical server depolyments. The marketing message at hand is the simply one that many many of those servers are running at low utilization rates (10% is a nice round number selected to be indicative of such)and as such those low utilization servers are candidates for virtualization to
"high-utilization-capable" centralized hosting servers... for example, an
IBM z10 Enterprise Class mainframe (yes, there are other examples).

You are right, every server vendor has noted that marketplace movement
towards simplification. Every vendor now has server virtualization solutions. Some now on their 2nd or 3rd try.

You are right, the mainframe has been exercising virtualization technology since (at least) 1967. That technology continues in the current iteration of the z/VM operating system and the deep and wide integration in the z10's
virtualization hardware services (aka SIE and MIF and PAV et.al.). I suppose
that is why our virtualization value proposition on "z" is so different then the rest of the industry. For we too can "snap" a virtual machine
very rapidly (6 seconds or so for a completely configured virtual server
and total OS and middleware software stack). And using that snap rate we can snap away for several minutes straight before shopping for another server. Said another way, we regularly see virtualization implementations
of 100's to many 100's of virtual machines doing real work on a single
(EXTREMELY EXPENSIVE ;-) mainframe from the (HIGHLY AFFORDABLE!) z9 family
of "open" systems servers (z10's are just now beginning to ship).

And you're right, a BILLION transistors on a chip does not really matter (or does it?)

Posted by guest on March 14, 2008 at 06:23 AM MST #

Hey Jeff, Great article...

Are you, by an chance, the same Jeff that is Senior Architect at Sun?

Posted by Cathrine on March 17, 2008 at 07:33 PM MST #

Point by point deconstruction... my paragraphs are prefaced by "J:", since (alas) I can't use different colors for fonts

> Your dislike for LSPR may well be misplaced,

J: You wound me sir! :-) I'm very fond of LSPR; a vast improvement over guestimating MIPS. Remember, I was an mainframe guy with a performance orientation for most of my career; this was an important tool for me. The inaccurate "bad science" we often had to do using MIPS (Meaningless Indicator of Performance, as it was sometimes called), often led to poor capacity estimates when moving from one IBM model to another. What I object to is it being used as a marketing tool in an official IBM announcement to extrapolate performance for comparison to a completely different platform based on a workload that isn't even the same as the LSPR benchmark workload. That is definitely deceptive.

> Per LSPR being "proprietary" ... well sorta, the LSPR workloads are "mainframe binaries" (to simplify things) so unless you have a mainframe then the LSPR workloads aren't going to run there. That is, LSPR in the past was run on HDS and Amdahl and any other "mainframe-compatible" servers when they existed.

J: Of course they are proprietary, and that's very much to the point. Their contents are a mystery to the outside world (just saying a workload is a COBOL+CICS batch application without the actual code so everyone can see what they are is the "gross generalization" form of disservice you deplore). More important: Other vendors can't use them for comparison or validation, so a FLEX-ES vendor, or Platform Technologies which can actually run z/OS or z/Linux can't post their performance results on the same workloads - using the very same binaries. Or Hercules. Nor can Clerity, Inc show their results running the same workloads ported to Unikix. Now that HDS and Amdahl machines are safely dead and buried, IBM publishes their aged results for LSPR, while preventing living competitors from using it.

> Thus, LSPR is perfectly relevant and is not the least bit misleading for Linux on System z capacity planning efforts.

J: Only when doing sizing from one Linux on System z platform to another, and only for a workload that looks like the WASDB workload used for z/Linux in LSPR. It is quite misleading everywhere else - including other applications on z/Linux - but especially for competitive comparisons. It's absolutely misleading to use as a predictor of "OLTP RPEs" (whatever they may happen to be), since that's a different workload.

J: Let's be clear: IBM took an "LSPR curve" for z10 (not saying for which benchmark anyway, or what an "LSPR curve" is), and applied that to a MIPS converter (bad science), and applied that to a completely unrelated benchmark (no science at all). Two levels of extrapolation without basis, used to make a marketing claim. And then applied the 10% solution to make it look more competitive than it really was compared to a competing product.

> Its not like SUN has not done the same sort of thing (a SUN-only performance metric for comparisons admidst only other SUN servers).

J: Indeed we do, and for the same \*legitemate\* purpose IBM has for LSPR: for same-platform-family capacity planning. Anything else should be marked with disclaimers that admit to the "this is only an estimate". For cross-platform comparisons, there are the standard open benchmarks which IBM refuses to publish for System z. That's the big difference: we actually participate in those.

> You might ask why System z does not publish any of the atomic benchmarks that are popular in the distributed server space. The reason is pretty simply ... none are reflective of what mainframe customers do with mainframes (is it so different for distributed servers?).

J: That's a very revealing remark. If mainframe is so different that standard, open benchmarks are unusable, then the (valid for mainframe) LSPR is as unusable for describing performance of distributed applications! (if mainframe performance is unlike distributed performance, and LSPR describes mainframe performance, then LSPR does not describe distributed performance. QED). In other words, you just contradicted yourself: LSPR is definitely misleading for distributed workloads.

J: If performance characteristics are so different then why is IBM trying so hard to convince customers to move "distributed workloads" to the mainframe - without using the only accepted way to measure "distributed" applications. Because they are "different from how mainframe customers use mainframes"? This is as close to an admission that "IBM mainframes are unsuitable for 'distributed workloads'" as we're likely to see.

J: Further: you're simply wrong. The LSPR workload description states directly that the WebSphere workload on z/Linux is "basically the same as the WASDB workload on z/OS" (see http://www-03.ibm.com/servers/eserver/zseries/lspr/lsprwork.html). IBM once published Specweb figures (let's just say they showed you would pay a massive $ premium to serve web pages on z/OS. Something I already knew, because I once operated what was arguably the biggest mainframe website in the world at a previous employer). IBM has +projected+ SpecJBB and published the +projected+ (and pathetic) results in public, and has actually published IOzone figures (about which I will blog later). There is nothing particularly "distributed" about the database and transaction processing benchmarks on tpc.org. Unless you want to say that OLTP and database are now distributed applications and not mainframe. Fine by me, I guess - they do scale better on open systems after all.

> Further, in the mainframe space, we do not worry so much about "performance" as we do "throughput".

J: That's a non sequitur: "throughput" is an aspect of "performance". They are not separate disciplines. How the devil does one "worry" about throughput without worrying about performance. My goodness, that makes no sense. And, then why does IBM avoid benchmarks that measure throughput, eh?

> We focus on optimizing transactions per second (while maintaining appropriate and consistent seconds per transaction) against "total cost of ownership". The mainframe (particularly z/OS) is uniquely capable of managing a system to that end.

J: In that case, you should be perfectly willing to prove that claim using the same framework everyone else uses (including IBM, when it has a competitive product). Until that happens, the "uniquely capable" claim is unsubstantiated by evidence.

J: And, if the solution is based on z/Linux, then z/OS capabilities (and its liabilities) simply don't enter into it. z/Linux is Linux, and doesn't have any of the aspects of z/OS that are beneficial.

> Per "dead ends" ... I would not worry about performance "dead ends" on IBM servers. The $5B+ invested in R&D annually and the many 1000s of patents yielded are good measure of our continuing ability to address technology constraints that we encounter. Further, the deep integration of our System p and System z engineering teams should yield high end server customers remarkable choices starting now w/the z10 and moving forward with the highend System p POWER6 servers coming out soon.

J: In that case IBM will no longer have a reason to avoid posting performance results for System z, since it posts results on System p. Using the open standards benchmarks.

> Per the message that a bunch of low utilization servers might be consolidated onto a single high utilization server ... what's the big deal? You said it yourself, even SUN can do that.

J: The big deal is that IBM claims this capability by hacking together bogus numbers. Nobody runs loaded database servers at 10% busy during peak (and averages are meaningless, as I said earlier. Ask anyone who does capacity planning). The extrapolations used are baloney. IBM doesn't show the price tag, which even according to IBM's own figures would be about 7X more expensive. Also, there's a little issue of service level (taught to me by retired IBM great Walter Doherty), which I'll relate in a future blog. Yes, Sun can do it too - massive consolidation for great savings in TCO - and unlike on Z, we can do it at an attractive price, and we actually provide meaningful performance metrics for our systems.

> I'm not sure why you are so upset at the notion and the marketing message being stated. One of the biggest IT optimization opportunities at hand is addressing the server sprawl from past year's of distributed physical server depolyments. The marketing message at hand is the simply one that many many of those servers are running at low utilization rates (10% is a nice round number selected to be indicative of such)and as such those low utilization servers are candidates for virtualization to "high-utilization-capable" centralized hosting servers... for example, an IBM z10 Enterprise Class mainframe (yes, there are other examples).

J: It has nothing to do with being "upset". Here's how it works: Vendor A makes a product announcement and the marketing guys insert what we in the business call a "takeout scenario" mentioning by name products from Vendor B (as if the scenario wouldn't be equally valid or not with IBM's Intel servers, eh?). Then, a subject matter expert from Vendor B (that's me, in case you're following this far) points out all the exaggerations and distortions in Vendor A's scenario - and showing that even under the rosiest conditions, the customer pays 7X more if misguided enough to adopt Vendor A's product. That's only appropriate. Why are you so upset about my calling IBM out on this?

> You are right, every server vendor has noted that marketplace movement towards simplification. Every vendor now has server virtualization solutions. Some now on their 2nd or 3rd try.

J: And therefore, it no longer makes any sense pay an exorbitant premium (7X higher cost, even using IBM's numbers) for virtualization solutions, now that they're available on every vendor's servers. My point exactly.

Posted by Jeff Savit on March 18, 2008 at 09:58 AM MST #

Hi Cathrine,

It must be me - the title is slightly different (do titles matter?) but that's close enough. I'm glad you like the article!

regards, Jeff

Posted by Jeff Savit on March 18, 2008 at 10:23 AM MST #

Your selection of words and 'imbedded feelings' reveals you really don't like z and the numbers IBM give you.
You could get figures from real cases instead.
I have seen calculations from the real world that included all costs, and guess what ? z HW was cheaper than x86 blades HW ! And manhour was of course lower for z, but that you already know I guess.
Yes this was a calculations with more than 200 Linux servers, but that is of course a fact you have to have volume in a big machine.
You also already need a mainframe.
But then these two things are there, you get these numbers.

Posted by Tore on June 23, 2009 at 12:46 AM MST #

It nothing to do with what I like - I like z just fine, bearing in mind that I work for a competitor :-) I spent many years happily working on that platform as a systems, performance, and applications guy.

The problem is the obviously deceptive numbers in IBM's claim that a single z10 is "the equivalent of nearly 1,500 x86". This is based on bogus substantiation that later was removed and then replaced with other bogus "evidence". Just look at the following blog entries that detail how they changed their story and airbrushed out what they previously said.

Now let's talk to your point on TCO. I've seen those types of numbers, and they're also misleading. Sure, if I have 200 separate, non-virtualized servers, each with costs for network connections, rack space, power, cooling, and software licenses, I can save a lot of money if I put them on consolidated, virtualized servers. But you can easily virtualize on Intel, AMD, or SPARC - you don't need z to do that! It's completely wrong to compare virtualized z to non-virtualized Intel as if there were no virtualization choices.

Besides, it's an incomplete analysis. You don't get these benefits just by having a mainframe. You must also factor in the risk and cost of conversion (including opportunity and development costs), the need to retrain or hire employees or consultants (the only good thing about z/Linux is that it creates jobs for VM systems programmers!), finding alternative applications if the ones you want don't exist on z, and creation of a single point of failure where none existed before.

It's also incomplete because you can get the same benefits without expensive proprietary systems. System z isn't the only game in town for virtualization and consolidation. Instead of paying monopoly prices for mainframe hardware and hypervisor, you could do the same thing at a fraction of the cost on commodity priced systems, with a choice of vendors for hardware and virtualization. VMware, Sun Virtualbox, Xen, OracleVM, Solaris Containers, Sun Logical Domains - there are a lot of choices out there now. Choice is good.

Even 3+ years ago, there were customers running 100+ virtual machines in production on a single Sun x4600 AMD server. Similar benefits can be accomplished using Sun's Logical Domains or Solaris Containers - which we provide at no extra cost. So, all the savings in licenses, rack space, power, etc, touted for z, can be achieved at lower cost and better performance - and without the risk and cost of having to replace the applications. Existing open systems applications can be moved into virtual machines, domains, or Solaris Containers without having to replace your application stack, and achieve the same consolidation benefits. It makes so much more sense than a "rip-and-replace". No porting, no rewrites, and costs less.

Finally, we're making a mistake if we say the only answer is to consolidate at the virtual machine level. If it's a mistake to have 200 separate Linux servers (and I think that it frequently is...), then maybe the right answer is to use operating systems that are robust, scale, and have a resource manager. You don't need to run in a virtual machine (and let the VM system allocate resources) if you choose an OS where it's easy to host multiple separate applications at high loads - such as Solaris (and just to show I can be fair: z/OS. That's why z/OS doesn't need z/VM the way z/Linux does.)

All too often, virtual machine systems are used to compensate for the defects of the guest OS. The right answer is to run a better OS in the first place. Instead of running 200 Linux virtual machines - run one copy of Solaris with 200 Solaris applications (preferably in Containers to separate them from one another) and consolidate as much as you want. All the benefits of consolidation and virtualization, at native performance, without added cost and complexity. There is a better way.

Posted by Jeffrey Savit on June 23, 2009 at 11:32 AM MST #

Hello Jeff,
Great blog!

I am a german IT-Architect and currently do some investigation in system platform differences. I hope my english is sufficient to comunicate.
With platforms I address mainframe, midrange (RISC) and X86/64.
My company runs lots of platforms, mainframe (z9) included, and we are looking at new strategies for the future. Because of this we talked to analysts, searched the web and asked internal staff.
Most of the time I ran into religious battles about platforms. Most of the time using rude words, desgrediting the opposite. It's about exchanging claims which are not proven most of the time. From the "mainfraime-Defenders" I saw not a single fact based statement. Always the "your brain isn't large enough to imagine", "believe" or "I saw" statements - thats why I talked about religious battles.

I found/made a few interpretations (or maybe facts) and assumptions which I like to share with you to get your view about this.

-Fact: IBM doesn't publish any benchmark which allows you to compare performances between platforms. (based on not found articles) If IBM has a superior platform, as they claim, the chance this makes sense is about zero.
-Thumb rule: I heard performance comparisons like this: 1 MIPS = 4MHz (Intel), a dual core Xeon (2GHz) compares to 150-400 MIPS (based on internet articles) - how do you size?
-Assumption: RISC-platforms are currently the highest scaling platforms (few times higher than mainframe) with x86/64 (as high as mainframe) closing the gap. X86/64 is the platform which advances with the highest speed, closely followed by RISC (based on the already metioned benchmark "Big Iron" on itjungle and the fact that x86/64 has the strongest competition, the most system integrators and most platform vendors).
-Assumption: Mainframe performance is depending on the workload. Simplified: Lot of calculation is bad lot of small I/O is good. But this doesn't mean good I/O performance is exclusive to mainframe (based on own experience). If you migrate these loads you will benefit most on loads which are bad for mainframe.
-Interpretation: Hardware and maintenance costs are roughly 20-60 times higher than other platforms. You mentioned this, but it can't be reapeated enough. That's what counts at the end of the day. Even Red Hat Linux on mainframe costs over 10 times the amount compared to other platforms. Performance-wise maybe even more (based on "Big Iron" benchmark and Red Hat price list).

So with all this in mind why stick with the mainframe? The answer is: Legacy applications. We have lots of IMS and Assembler based applications and frameworks which are hard to migrate and the transition costs are barely calculable - maybe incalculable - and from an economic reason it doesn't make sense. It might (does?) make sense from a strategic standpoint, raising flexibility and lowering skill risks. So, until any vendor has the solution to this migration issue (some are close) we can only slowly remove platform dependencies and live with the mainframe a few more years.
But if we stay with the mainframe, does it make sense to consolidate on mainframe and push the load on mainframe? I can't see any reason. I see, I have to pay more with extra load than on other platforms.
Also the IBM example (1500 [almost dead] servers) could be solved by virtualization on any platform, or not? Ypu answered this already.
I also don't buy into the zLinux story - but I see (virtualized) Linux (on X86/64) on a steep rise digging into RISC-teritory and I would use (virtualized) Windows (x86/64) also (does a goob from W2K3 Server and above) - I am very pragmatic on choosing OSes. Sorry for not mentioning SPARC or Solaris :-)

So personally I don't see a bright future for the mainframe. The standard server system (general purpose, cost efficient), in my opinion, will be x86/64-based. The last problem with this platform is scalability (scale up - scale out isn't a problem), which will be fixed in the near future.

Regards,
Oliver

Posted by Oliver on June 29, 2009 at 10:34 PM MST #

Hi Jeff,

I want to thank you for your article on z10. I think it's must read for every IT shop that is being approached by IBM.

I wanted to correct a mistake you have regarding IBM memory. I know it's on this article but the one you published it didnt have comments.

You calculated it at 6K per GB. That’s not entirely correct. The way they calculate is per core basis.

If you have 1 core you can buy 16GB of ram at 2k per GB. After the initial 16GB any additional GB will be at 6K.

I hope this helps.

Thanks
Ogan

Posted by Ogan Sayek on June 30, 2009 at 06:53 AM MST #

Oliver, Ogan - thanks very much for your comments.

Oliver: I generally agree with your points, with only the exception that I think RISC processors will continue to be best for a number of workloads: scale, reliability features, and also power efficiency on machines like Sun's CMT servers. But that's a minor point: by and large I think you are quite correct (and your English is fine!)

As you say, it is necessary to take "religion" and emotional argument out of the discussion, and make decisions based on data. If one vendor refuses to provide meaningful data, then it's because they don't have data that makes them look good. Vendors publish benchmark data when they have good numbers. If they publish nothing, it's because they have no good numbers. The few numbers that leak out are embarrassing for z10 - especially considering the high prices you mention.

Rules of thumb like the 1 MIPS == 4MHz (originally called "Barton's Rule") are at best crude estimates and should not be considered permanent (that rule was devised long before dual and quad-core x86, and way before processors like Nehalem!) Current Intel and AMD processors do more per clock cycle. The best thing is to measure actual performance, and to insist that vendors back up claims with evidence.

I also agree that the only use for mainframes is legacy applications, because of the high cost of converting from proprietary APIs (though services like Clerity's Unikix can reduce those costs). It makes no sense at all to use the mainframe as a consolidation platforms for x86 Linux or Windows. A better job can be done using virtualization like VMware/VSphere, Xen, or VirtualBox, at a better price and with functions that mainframe virtualization does not provide. Also, you can then simply continue to run the x86 OS and binaries without having to do a port to mainframe.

Ogan: I didn't know those details, and I thank you for providing them. I have found it hard to get public pricing information on z10 hardware and software. That's part of the lack of transparency I have complained about.

Again, I thank both of you for your comments.

Posted by Jeffrey Savit on June 30, 2009 at 03:11 PM MST #

Two additional points are that IBM might be claiming to have the ability to run thousands VM with zVM.

1. They are running 2 virtualization solutions on top of each other. physical hypervisor to create LPARs and software based hypervisor (zVM) to create zlinux instance at 7% utilization penalty. So each VM will utilize 7% more resources.
2. Yes, i'm sure they can install thousands of zlinux instances, but now day any OS will require 1GB of memory + whatever application needs are. Lets say each VM needs 3GB of memory, which fairly low given applications like weblogic or oracle DB will take A LOT more memory. z10 can scale up to 1.2TB of memory. 12TB/3GB = 400 VMs
3. I will give you another fact. z10 chip is a single threaded chip, you will not be able to get good performance when you stack up a lot of multi-threaded applications.

Posted by Ogan Sayek on July 01, 2009 at 05:22 AM MST #

Hello Ogan, and thanks again for your comments. I really appreciate them. I have some comments on them as well, and I want to be very fair and careful.

First, I'm not sure how much added overhead there is when running z/VM under LPARs. The number might be 7%, but could be lower or higher. I expect that it's VERY dependent on the workload. I don't really focus on that because the z has several HUNDRED percent worse price/performance than other platforms, so another few percent one way or the other won't make any difference. If the idea is to consolidate mostly idle physical machines, it might not even be an issue - except for the fact that it could be done on much less expensive systems, and without conversion effort.

The second point is more important: Memory is a bigger problem in a virtual machine environment than CPU. I've blogged on this a bit already, but the key thing is that an OS has poor locality of reference, and is always touching pages. Operating systems also don't go truly idle, so it's hard to push their pages out to disk: they're constantly being touched.

Because of that, it's usually best practice to run guest virtual memories as small as possible. This is essential on z/VM and z/Linux, where memory is so expensive and where you need to try to run so many guests to compensate for the very expensive z systems. So, in z/Linux, the thing to do is cut virtual machine RAM sizes to the smallest possible value, and even force them to swap. This does terrible things for performance, but the alternative is even worse. So, in z/Linux, don't expect to see lots of guests with memory sizes of 3GB! It's simply unaffordable. Imagine squeezing large databases into tiny virtual machines. This is a serious problem for z/Linux, as you figured out.

Last, I don't know that a z10 chip is truly single-threaded. To my best knowledge, a z10 chip is a quad-core chip, with up to 5 of them in a "book". The LSPR numbers seem to indicate that z10 chips scale better than would be indicated if a chip was single-threaded, but as I've indicated elsewhere, LSPR shows that z10 machines fail to scale linearly as you add CPUs. Despite the horrible expense.

Again, thanks for the comment!

regards, Jeff

Posted by Jeff Savit on July 02, 2009 at 10:10 AM MST #

Jeff,

Absolutely, I want to be fair as well and I try not to guess or over estimate things.

The 7% overhead was told to me by IBM instructor at a zLinux course that I attended in NYC. Of course it will vary from load to load. However if IBM is claiming only 7 percent (which I think is pretty high), it's safe to use it because in reality it might be much more.

As far as z10 chip being single-threaded; I wasn’t really referring to the whole chip but the core. IBM seems to almost never mention it's a quad core chip and they base of all their assumptions on core bases.

So as far as core goes, they are single threaded. I believe IBM is working on a multi-threaded chip, but I'm not sure when it will become available.

Posted by Ogan Sayek on July 06, 2009 at 02:59 AM MST #

My company believed IBM, and decided for z/Linux. We currently consolidating old, not well-managed systems on z/Linux. But it doesn't work out. Everything is so much slower, and the capacity runs out fast.

In our case simple Spring-based Java Servers take 30 seconds to start, while on our 5 year old AIX they start in 7 seconds. And the first IFIL is already 100% utilized by a few "quick-win" applications, with almost no users. Of course you can buy a lot more additional hardware, but there is no business case at all.

Our senior managers never allowed to question the decision - even as nothing was decided. They found themselve brave to go such a huge step forward. At least all our software will be re-packaged and updated. So the next migration will be much simpler.

Posted by Alex on February 21, 2010 at 10:11 PM MST #

Alex,

Thanks for the comment. It's consistent with everything I've seen with z/Linux. A 5 year old AIX server outperforming z doesn't surprise me at all, especially with Java. I don't have any reason to take sides in your case :-) since it's one IBM product replacing another one, but this is a very good example.

regards, Jeff

Posted by Jeffrey Savit on February 21, 2010 at 11:48 PM MST #

Post a Comment:
Comments are closed for this entry.
About

jsavit

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today