Sunday Apr 19, 2009

SPECjAppServer2004 2925.18 JOPS, Glassfish and MySQL raise the bar


SPECjAppServer2004 2925.18 Sun continues performance and price/performance leadership

Good News :

Sun has just published our latest SPECjAppServer2004 results of 2925.18 SPECjAppServer2004 JOPS@Standard using all open source stack i.e. Glassfish , MySQL and OpenSolaris on the new Sun / Intel Nehalem based servers (using ZFS).

Highlights:

  • Nehalem/Glassfish/MySQL combination brings substantial price savings to users of typical web based applications and all with commercial grade support

  • The tested configuration consists of 1 x Sun/Intel Nehalem based X2270 server for the Glassfish application server and 1 x SunFire X4170 Intel Nehalem based server for the MySQL 5.1 database. For more details see the result at http://www.spec.org/jAppServer2004 or look at http://blogs.sun.com/kevink for a good overview.

  • MySQL efficient logging system requires only 12 spindles to achieve the I/O throughput to support this load, yet most of the proprietary software results published require at least double this !

  • First industry standard benchmark featuring the database running on a ZFS files system (which made running multiple tests very efficient , this is a big deal so I will give some more details on this shortly)

  • Total software and hardware purchase price $US 38,880 based on the “Bill of Materials” in the benchmark report and online pricing from sun.com.

  • Significant performance gains i.e. the previous best all open source result was Sun's 1197 JOPS result (see http://blogs.sun.com/tomdaly/entry/sun_compelling_price_performance for details ) but this new result is on Sun/Intel Nehalem servers is more than double the previous best.

  • Despite achieving more than double the score of the previous best all open source result this configuration still yields a price/performance gain over the previous result I.e $US 38,880 / 2925.18 = $US 13.29 / JOPS@Standard and the previous best was $13.46/JOPS@Standard (see http://blogs.sun.com/JenniferGlore/ for pricing details ).

  • To give some idea of the actual performance , this configuration of 2 x 8 Core servers and open source , supports a virtual concurrent user load of more than 22,750 web application virtual users and uses > 800 connections to the MySQL 5.1.30 via the Glassfish connection pools!

The Test rig


Conclusions:

  • If you are a software developer of web based applications , perhaps an ISV, or perhaps a contributor to an open source project then you really ought to consider the price/perform and enterprise level support advantages of the Sun open source stack and start considering certifying or deploying your application to this platform.

  • If you are an end user customer of web based applications such as e-commerce or similar applications you should perhaps ask your software supplier when they can start moving their applications to the Sun open source stack, so as to start saving you and them money (especially in these tight times)

  • If your hardware supplier is not helping to drive your costs down like Sun is, then perhaps start asking them why !

Final thought :

These results using MySQL 5.1.30 don't yet include the performance improvements from the marvellous work that the combined Sun/MySQL/Community performance team have been working on so watch this space, the good news is not finished yet.

Pricing is based on Sun Glassfish Enterprise Server 2925.18 SPECjAppServer2004 JOPS@Standard result Bill of Materials at http://www.spec.org/jAppServer2004                                                                                                                                                                                                       
Required disclosure : SPEC and SPECjAppServer are registered trademarks of Standard Performance Evaluation Corporation.
Sun GlassFish Enterprise Server v2.1 on Sun Fire X2270 with MySQL 5.1 on OpenSolaris 2008.11                                                       Sun result 1  x X2270  8 x cores (2 Chips) used for application server and 1 x X4170 Database server 8 x Cores (2 Chips)   

Thursday Nov 06, 2008

Sun Demonstrates Compelling Price Performance

Sun demonstrates compelling price/performance advantage of enterprise supported Open Source.

Using Glassfish V2 U2 / MySQL 5.0/ OpenSolaris on commodity Sun Intel hardware (i.e. SunFire X4150 Intel based servers)

Sun has achieved a result of 1197.50 SPECjAppServer2004JOPS@Standard , you can see a lot more detail on this result on the Sun benchmarks page and also a nice write up by the bmseer and also checkout the Sun ISV blog here

The Summary :







Why this is interesting/important to organisations running web applications.

  1. Demonstrates the huge cost savings available from using Open Source software for the entire application stack !

    The table and graphs above highlight an important fact. The Sun solution for hardware and software costs less than one CPU's worth of database license as used on the Dell or HP results and this isn't true just of the results above but in fact true of most of the SPECjAppServer2004 submissions that use proprietary (application server or database) software. You can check out the details of the pricing here

  2. Demonstrates Open Source solution competing directly against proprietary solutions

    One very useful and important feature of the SPECjAppServer2004 benchmark is that it is written in java and this makes it possible to run the same code unchanged on a variety of different operating systems and hardware platforms, in fact there is a benchmark rule that prohibits any code changes being made to the SPECjAppServer2004 java code. This means that no matter what platform , j2ee application server, operating system or database the application is run on the application itself is unchanged. This characteristic makes SPECjAppServer2004 and ideal workload for making comparisons between  different environments and Sun is utilising this fact to provide a comparison between the proprietary software based Dell and HP results with the an all open source based result. In addition to running the same java code and hence accessing the same database data, the open source results must fulfill the benchmark requirements for enterprise products such as , correctness, support, durability ACID compliance etc.

    Of course there are limitations with all benchmarks and comparisons must be made carefully, however this result clearly demonstrates open source doing the same job as proprietary but at a fraction of the cost. This coupled with the continuing performance improvements I mention in 3. below seem to be adding weight to Allan Packers thoughts on the topic of the future of proprietary and open source databases.

  3. Demonstrates continuing and outstanding performance improvements in the Open Source stack and in the MySQL DB in particular

    In fact it turns out that the Sun result of 1197.10 JOPS demonstrates the best published “commodity hardware” result for per core database performance. The Sun benchmark uses just 4 Intel cores on one chip in the X4150 database server and this means that the combination of OpenSolaris/MySQL 5 and X4150 delivers per core JOPS of 299.28 JOPS/database core. The only results that demonstrate superior per core performance for the database are some IBM power architecture chips (which are definitely not commodity priced).  If you want to verify this claim please download the spreadsheet of results available via the search feature at http://www.spec.org/jAppServer2004/results/

    Also it turns out that the MySQL database requires far less disk resources that competing products, this is demonstrated by the fact that the internal disks of the X4150 were used to support the database throughput and only 6 of them in a raid 0+1 partition. If you look through the SPECjAppServer2004 results you will see most vendors need to use expensive disk arrays to achieve the required disk performance (and most likely expensive extra database options).

  4. Just the start of performance improvements for MySQL , Glassfish and Open Solaris

    This result is based on the first release/update of OpenSolaris and uses MySQL version 5.0. and the performance of both of these platforms is going to improve in the near future for instance with  MySQL 5.0 we simply have not yet seen  the results of the impressive gains the Sun Performance Technology team in conjunction with the open source community and open source focussed companies such as Google are bringing to MySQL.  We will see these performance gains start to turn up with MySQL 5.1 and beyond (see Allan Packers article on this).  Also stayed tuned as the performance of Glassfish is going to continue to impress especially on commodity 4 and 8 core servers used in conjunction with MySQL. In fact I can't wait to start testing GF V3.

  5. Because this benchmark result can be a pointer to how your organisation can start to save “serious” money by using enterprise supported Open Source software

    Basically you might want to start investigating what web applications in your organisation can run on an enterprise supported open source stack such as the glassfish/MySQL/OpenSolaris stack used for this benchmark result.
    There are some steps you can start talking now and I am going to detail a strategy for doing just this in an upcoming blog.


Required disclosure : SPEC and SPECjAppServer are registered trademarks of Standard Performance Evaluation Corporation. Results from www.spec.org as of 11/05/2008. 2xSun Fire X4150 (8 cores, 2chips) and 1xSun Fire X4150 (4 cores, 1 chip) 1197.10 SPECjAppServer2004 JOPS@Standard; Best result with 8 cores in application-tier of the benchmark: 1xHP BL460c (8 cores,2chips) and 1xHP BL480c (8 cores,2 chips) 2056.27 SPECjAppServer2004 JOPS@Standard; Best result with 2 systems in application-tier of the benchmark: 2xDell PowerEdge 2950 (8 cores, 2 chips) and 1xDell PowerEdge R900 (4 chips, 24 cores) 4,794.33 SPECjAppServer2004 JOPS@Standard.



Monday Aug 11, 2008

Real Value in Benchmarks

Benchmarks, Workloads , Micro benchmarks and In-House Performance Testing

As a contributor to the SPEC performance organisation on behalf of Sun, I tend to notice and read comments both negative and positive on the benchmarks SPEC creates and administers, I read with particular interest articles on the SPECjAppServer benchmarks that I am involved in. A few days ago I was forwarded a post in which the author offers the opinion that the SPECjAppServer2004 results provide no value and  also offers a pretty negative view of industry standard benchmarks in general. I certainly don't believe that the SPEC benchmarks or any benchmarks are perfect , nor do I thnk that they are the only valuable source for performance information but to claim that the results have no value seems ..well, absurd.  So I thought it might be useful to offer some background  and observations regarding performance measurement,  and in the discussion below I try to categorise the main sources of performance information and  to  highlight the main benefits and shortcomings of each source. 

(formal) Benchmark


A benchmark is comprised of a performance testing workload (application) or workload definition + a set of run rules and procedures that define how the workload will be run + a process for ensuring that the published results conform to the run rules and to prescribed "fair use"  rules about how comparisons may be made between results.

The workload is generally a (very) complex application and includes the user simulations, data models either in code or specification form and all the information necessary to run repeatable performance tests over a (potentially) wide variety of computing environments. The run rules and procedures define how the workload will be run, what constitutes fair and reasonable tuning techniques, what the requirements are for the products being tested and the format and length of test runs and the reporting requirements . The benchmark usage rules (fair use) outline how one benchmark result can be compared to others and effectively puts constraints on the claims that can be made about any particular benchmark result. Hence (ideally) increasing the value of the published results to end user consumers of the results.

Industry standard benchmarks organisations such as SPEC or TPC are comprised of (IT) companies, and interested individuals who contribute time and/or money to the organisation to develop (complex) benchmarks and to help manage these benchmarks. The reason these benchmark organisations exist is to create benchmarks and performance data that is credible, relevant and useful to end user consumers of this data. There are many benefits to the contributing vendors in creating and running benchmarks, having a forum to prove performance or price/performance gains in the their products is certainly a big motivation but not the only one , many of the benchmarks defined and created for example by SPEC are used by hardware and software vendors to improve their products, long before a result is ever published on the competitive public site.  So there are very sound engineering as well as  marketing  reasons for vendors to contribute to the goal of creating credible relevant useful perfomance benchmarks.

Another valuable source of benchmark and performance data comes from vendor benchmarks. The Oracle applications benchmarks and SAP benchmarks are good and well know examples, the workload, run rules and usage rules are defined by the vendor company and then made available to 3rd parties or perhaps hardware partners who want to run and tune these workloads on their environments. These benchmarks have much in common with the industry standard benchmarks but the scope is limited generally to just the product offered by the vendor. These benchmarks are very useful for potential customers of these systems, to gain the performance information required to size implementations of these products and hence to build confidence in the performance capacities of the system prior to purchase or implementation.

Benefits

  • Extremely cheap for end users, normally the large IT vendors have done most of the work and published results, end users then need only look at these results and then decide if and how they might be applied to their business and what comparisons they can make based on the published data. End users can use the numbers with a degree of confidence knowing that the results have been audited or perhaps peer reviewed to ensure compliance to the benchmark rules.

  • There is a lot of tuning information and real value in the benchmark results themselves, for instance consider the SPECjAppServer2004 benchmark results. In each result you will find the .html result page which is the full disclosure report (FDR). The FDR contains not just the benchmarking results and final and repeat run scores but also a wealth of tuning information, tuning for the database the application server, hardware , java virtual machine, JDBC driver and operating system, everything another user might need to be able to reproduce the result. In the FDR there is also the full disclosure archive (FDA). The FDA contains the scripts, database schema, deployment information and instructions on how to the environment was established. The SPECjAppServer2004 FDR and FDA are valuable resources I use all the time on customer sites as a reference on how to tune and configure their production and test systems.

  • Again in reference to the FDR and FDA much of the raw data and data rate information is useful, examples include the number of concurrent web tier transactions or the network traffic or perhaps the size of the database supported by the database hardware. These data rates and speeds and feeds can be used to assess the capabilities of certain parts of the system being tested and can be useful in sizing some aspect of similar applications.

  • Hardware and software vendors use the benchmarks as tools to improve their products which in flows to end users. A good example of this at SPEC is when decided to use BigDecimal in the web tier of SPECjAppServer2004, even prior to the SPECjAppServer2004 workload being released it was obvious to the java virtual machine vendors participating at SPEC that this was an opportunity to optimise BigDecimal processing in their JVMs. So before the first SPECjAppServer2004 results were released the JVMs were already providing optimisations for BigDecimal and SPECjAppServer2004 was helping quantify the performance gains from these optimisations. The benefits of these optimisations flowed to all users of java BigDecimal that could move to the later JVMs.

  • Competition. Industry standard benchmarks are one way a vendor can show performance improvements in their products and performance leadership over their competition and perhaps gain a marketing advantage, so there is a fiercely competitive aspect to industry standard and vendor benchmarking. This competition is generally good for end users as it commonly produces tunings and optimisations in the vendors products that benefit a wide range of applications using their technology and indeed this is the situation that the run rules and fair use rules generally strive to promote.

Limitations

  • Inappropriate comparisons, or extrapolation of results. Care must be taken to make selective and reasonable judgements based on the information provided in the results or benchmark reports. It makes no sense for instance to use SPECjAppServer2004 results which is a transactional benchmark say to size a system for data warehousing or business intelligence, a TPC-H benchmark would be the place to go for this information. Also looking at the transaction rate disclosed in a benchmark report and then extrapolating this result upwards is risky as performance is not a continuous function but instead can have many discrete jumps and tested configurations may have hard ceilings such as memory capacity or bus bandwidth. For example it might not be accurate to predict the performance of a single instance of Glassfish application server on a 64 core machine based on the JOPS/Glassfish instance result measured on a 4 core machine.

  • There will be limitations on how closely industry standard benchmarks model your chosen or developed application. In the case of SPECjAppServer2004 the developers and participants , companies like IBM, Intel,Sun,Oracle have looked at our customer base and tried to model the web applications we have seen our customers developing or perhaps based our modelling decisions what they told us they were going to develop.

  • Industry standard benchmarks trail the technology curve and hence will often be using an older version of infrastructure / technology than the market would like. This is because the benchmark can't come out until there is an established set of products to run the benchmark and because it takes time for companies to run and scale the benchmarks and to build a set of results that is useful for end users. For example the development of SPECjAppServer2004 workload was well underway before there were many/any J2EE 1.4 products available, but it wasn't released until after most of the major J2EE application server companies had released their products. Similarly work is underway for the new version of SPECjAppServer but it is trailing the availability of the application servers that have implemented the Java Enterprise Edition 5.0 specification.

Workload

A performance workload  is similar to the benchmark described above but it lacks the run rules, process and oversight. This means that end users can't (in general) have high levels of confidence in the performance claims made by vendors publishing results based on these workloads. End users reading benchmark reports and performance claims made from using workloads without the process of a formal benchmark will have much more work to do to decide what comparisons make sense and what comparisons may in fact even be misleading. For example a vendor could use a workload like DBT2 and publish test results comparing say the largest server hardware running database "A" and a tiny single cpu based server running database "B" and then without disclosing the different hardware platforms could offer this as data to suggest that database "A" is better performing than database "B".  Sure this is an exaggeration but it serves to demonstrate the value of the process and disclosure rules of the formal benchmarks.

Benefits

  • Workloads are often easy to run, easily understood and readily available. This makes them very useful to run in-house and therefore end users can make their own comparisons without having to rely on external vendors.

  • Workloads don't have the restrictions of the process imposed by the industry standard benchmark bodies and as such it is mush easier to just run and report results from workloads. For example in the open source database world the SysBench workload is a very valuable tool and is commonly used for performance testing of code changes to the MySQL database. Results of these tests are widely and openly reported and used as the basis for even more performance improvement. One key here is, in this situation the workload is being used collaboratively for investigation not primarily competitively to sell something.

  • Workloads are potentially designed by individuals and so development cycles may be shorter that the industry standard benchmarks.

Limitations

  • Risk of error , especially in tuning. Even though running performance workloads in house can be relatively straight forward there is still the risk of getting the wrong answer. Consider trying to determine which is the best performing database "A" or  "B" by doing workload based performance testing of both. The user running the tests and trying to accurately compare the results has to have the expertise to be able to tune both "A" and "B" to the point where they can make optimal use of the hardware and operating system resources otherwise the results may be misleading.

  • There is a cost to running performance investigations in-house and though running performance workloads may be relatively cheap it is potentially more expensive than if published industry standard or vendor benchmark figures are available and can be used.

Micro Benchmark

A micro benchmark is usually a small generally simple workload that tests only a limited number of system or user functions.
In fact most often the micro benchmark will not have any process such as reporting rules or any basis for comparison of results so really I believe a better terminology would be micro workloads.

Benefits

  • Generally free or cheap to download or develop.

  • Very easy to run and report results on

  • Potentially very powerful too for diagnosing low-level performance problems

  • Because micro benchmarks are generally fairly simple and only measure a very small set of performance attributes then comparisons may in fact be valid across platforms.

Limitations

  • It is generally not possible and rarely a good idea to predict larger system or application performance based on micro benchmarking, again by their nature micro benchmarks will test and consider only a small sub set of the performance of the system being tested so it is quite likely that other factors beyond the scope of the micro benchmarks will effect total system response and throughput.

In House Application Performance Testing (customer benchmarking)

Where an end user, customer or software developer creates a purpose built stress test (workload) for their application or hardware and tests what they intend to run in production.
This is arguably the most reliable way for an end user or application developer to understand their code / environment as it involves running the actual code planned to be run in production. There is no requirement to try and apply the results and resource utilisation from some other performance test (benchmark or workload) as it is your code that is being run and directly observed.

Benefits

  • This is by far the most accurate way to determine real application and / or system performance i.e. to actually test in-house what you intend to run in production. This way no guesses have to be made as to how applicable the test environment is to the intended production environment.

Limitations

  • By far this is the most expensive of all of the options but for a company or individual potentially spending a lot of money on hardware or software this may well be the best option and it might be that the potentially substantial costs associated with developing a test harness and simulation, determining the test parameters and running the performance tests is well worth it. One caveat here might be that as software costs fall with the increasing enterprise use of open source software then the costs of running “in houseâ€� performance tests may start to look large vrs software purchase cost.

  • Running in-house application performance benchmarks is still not without risk, a wide variety of skills is required to create the simulation and determine that it covers the expected usage pattern for the application. Different skills are needed to be able to deploy and tune the application and any middle-ware required for the application and also DBA skills will be required as well as general performance tuning skills...the variables really start to add up.

Summary

I hope to have provided a very high level overview and a useful categorisation of the main sources of performance data available today, in my opinion each of these sources or approaches to obtaining performance data has great value however because performance analysis is always a contentious and often a more subjective topic than it should be I am not sure I expect to settle too many debates. Hopefully offering a broader perspective on the value of benchmarking than I have seen in (some) other forums is useful to those who might be needing or relying on this data.

About

tomdaly

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today