Tuesday Sep 27, 2011

Talend's new data processing engine on Sun Blade X6270

Having the chance to test the brand new Sun Blade X6270 server based on the Intel Xeon X5500 series processors, I asked one of our ISV partners, Talend, an open source ETL (Extract Transform & Load) solution provider, if they where willing to do some benchmarking with me.

The timing was perfect since Talend has just rewritten some parts of their ETL engine, that will be included in the upcoming version, in order to make a better use of modern CPU multi threading capabilities.

During the development they had benched their application on a two socket Xeon 5320, and where very interested in seeing how the the new Intel Xeon 5500 would perform.

Test descriptions

We used DBGEN v2.8.0, a database population program that generates files to be loaded in a database tables. In our case we will generate moderately to very large files, and will process them directly (no use of a database system) as simple flat files. Also, we will be only using the file called “lineitem.tbl” which represents a list of order item lines having the following structure:

DBGen Structure

For each benchmark run we perform three tests, each applying a different type of processing on the file:

  • Sort:
    We will sort the entire file by date, on the 11th column (L_SHIPDATE: see above in red)

  • Count:
    Count the number or order lines by shipment mode ( L_SHIPMOD: see blue column above) and the year of the shipment date. ( L_SHIPDATE: see above in bold red )

  • Average:
    Average discount (L_DISCOUNT) for each item (L_PARTKEY)

DBGEN uses a scaling factor representing the total size of all the tables generated. For this test we only use the file named «lineitem.tbl». The table bellow size and number of lines in the «lineitem.tbl» file given each scaling factor.

As you can see we start quite small, by processing a file with 6 million lines (only !) and go all the way to processing finally 3.3 Billion lines in a single file.



Scale

Number of entries

Size

1

6 Million

740 MB

10

60 Million

7,4 GB

100

600 Million

74 GB

300

1,8 Billion

225 GB

550

3,3 Billion

415 GB


Hardware Configurations

The following table shows the hardware configurations used for the tests (referred to as X6270), and also the vanilla Xeon bases box used by Talend (referred to as Bi-Xeon)

Server

X6270

Bi-Xeon

CPU

2 x Xeon 5520 quad core with HyperThreading & Turbomode on (2,26GHz)

2 x Intel Xeon 5320 quad core (1,86 GHz)

RAM

24 GB DDR III

    4 GB DDRII

Internal storage

1 x 136 GB 15K tr/min

3 x 250 GB and 2 x 320 GB Seagate 7200 tr/min (all on ext3)

  • 1 x 250 GB for system and temporary files

  • 1 x 320 GB for input files

  • 1 x 320 GB for output files

External storage

  • 3 volumes of 4 disks using RAID 0 (stripping), 544 Gb each.

  • A ZFS pool for each group.

None

Operating System

Solaris 10 update 6 (aka. 10/08)

Debian GNU/Linux Etch with Linux 2.6.18 (i686)


With respect to the CPU, the X6270 configuration is obviously much more powerful, especiall given the amount of RAM, and the external storage. However the tests proved to be more CPU and IO bound than memory bound. Even if obviously the amount of memory does make a difference, the test will give us some indications about the extra performance brought by the Xeon 5500.

In order to get closer to the Bi-Xeon configuration, we did also two set of tests on the X6270: with (referred to as X6270-Ext) and without the external storage (Referred to as X6270-Int).

In the second case, we are even in a less favorable position than the Bi-Xeon that uses 3 disks vs. a single disk for the X6270.

Results

The table bellow presents the final results of the tests done on the three configurations. It's interesting to note a couple of things:

  • When processing a file, at least three times the disk space is needed to proceed. For this reason, we could only process a 7.4 GB file for the X6270-Int (Single internal 136 Gb in the server)

  • Given the much higher processing time needed on the Bi-Xeon, we didn't even try going further than 74 Gb.

  • We pushed the X6270-Ext up to processing a 415 GB file, and could have reasonably gone all the way to 1 Tb if we were not limited by disk space.

Result Table











Conclusions

On the CPU bound tests (Average test) we can clearly see a 32% to 60% boost of performance on the new Intel Xeon 5500 compared to the older generation (depending on the size of the file).

Of course the processor matters, and we saw that on the more CPU bound processing, it has a great impact. But what we can also see, and that's not new, is that data hungry processors need to be fed with data, good and fast. To that respect the speed of the IO sub system is very important. Obviously working with files over 400 Gb put a lot of pressure on the IO, and plugging a professional external storage device, just makes a huge difference (in our case anyway)

As you can see on the SORT test (scale 10) we get a 290 % boost with the Intel Xeon 5500. Once we use the external storage, that performance sky rockets to 1075 % (more than 10x the performance) !

We could of course go on along time analyzing all the figures, with different file sizes, but without pushing the analysis very far, it's plain to see the performance gain we get with this new processor alone, not to mention if we also take care of the IO sub system.

The Intel Xeon 5500 based Sun servers, such as the Sun Blade X6270 we just tested, enhanced with an external storage device such as the Sun StorageTek 2540 seems to be a killer combination for large data processing.

Wednesday Mar 23, 2011

Traffix scales on Solaris Sparc

Traffix Systems is leading the control plane market, with a range of next-generation network Diameter products and solutions --Diameter is an authentication, authorization and accounting protocol for telco networks, and a successor to RADIUS.

The amount of Diameter signaling in LTE and 4G networks is unlike anything telecom operators have seen or been confronted in the past. It is estimated that there will be up to 25x more signaling per subscriber compared to legacy and IN networks. As a result, network operators moving to LTE are finding it progressively more difficult to manage their core network architecture and Diameter signaling as it becomes increasingly complex to maintain, manage and scale.

With these challenges in mind, and as part of the on-going engineering collaboration between Traffix Systems and Oracle's ISV Engineering, we investigated which Oracle technologies could help decrease and manage the complexity. The first thing we looked at was the SPARC Enterprise T-Series systems…

[Read More]

Tuesday Mar 01, 2011

Talend Integration Suite optimized on Solaris

Continuing with the spirit of the Tunathon program --an innovative enginneering program to study and tune application performance on Solaris, run at Sun Microsystems in the early 2000's--, we at ISV Engineering are still running today "Tunathon" projects with our partners, i.e. tuning their application on Solaris --we have about 5 in flight right now. Tunathon efforts are in fact more and more relevant as computers are becoming more complex, scalable and heteregeneous --think e.g. of a 4-socket quad-core dual-thread system with extra GPU engines. Developers have the impossible job to release new business logic in their code, faster and faster, while being fully optimized and scalable on systems that a developer never gets his hands on to test scalability to start with, anyway. And the programming frameworks, good for developer productivity and code quality, comes as additional layers that can make debugging and optimization a real nightmare.

Recently, Talend, a fast growing ISV positioned by Gartner in the “Visionaries” quadrant of the “Magic Quadrant for Data Integration Tools”, contacted us to report a serious performance issue at one of their customers, a large bank, using the Talend Integration Suite application on a large 32-way quad-core SPARC M-Series server. Although fully multi-threaded, the software simply did not scale on such a large system. We got on it right away, set up a 128-thread Sun T5140 system in our Lab to reproduce the problem, and took a closer look at the Java code…

[Read More]

Thursday Feb 03, 2011

Why Solaris Zones?

Our recent validation exercise of SAP NetWeaver Master Data Management on Oracle Solaris Containers reminded us how great a virtualization technology Solaris Zones are. Why am I saying this? Read on.

Starting with Solaris 10, Solaris Zones (a.k.a. Containers) are an operating system level virtualization technology that provides a complete, light-weight and secure run-time environment for applications. Compared to other virtualization solutions, Zones do not use an hypervisor --which in fact is another layer of operating system that translates / gets in the way of your system calls to the hardware--, rather Zones are an integral part of a running Solaris instance. Still, Zones meet the same business objective of server consolidation thru Virtual Machines (VM), isolated from each other and designed to provide fine-grained control over the hardware resources.

The following benefits are often put forward about Solaris Zones…

[Read More]

Monday Sep 20, 2010

Avaloq runs on Oracle Solaris x86

Financial Services is probably the vertical industry with the most successful early adopters of Solaris 10 x86 --like Murex e.g., a leading Risk Management vendor, with several customers in production with Solaris x86 today-- and remains a strong area for Solaris x86 adoption. It is now the turn of Avaloq, the Swiss market leader in integrated banking software, to announce that it has released its Avaloq Banking System on Oracle Solaris 10 x86-64.

"Oracle Solaris 10 x86-64 as an enterprise class Operating System is a decisive advantage for banks."
Klaus Rausch, CTO, Avaloq Evolution Ltd

If you are a Solaris Sparc ISV today, Solaris x86 is your safest and quickest path to leverage commodity x86 hardware and its price-performance advantages, where it makes sense --the traditional RISC architecture, the novel CMT architecture and the standard x86 architecture indeed all have different design points and each has the best price-performance when applied to the appropriate workload.

Safest because you retain all of the Solaris enterprise-class features that your customers love, notably the Solaris binary compatibility --the Oracle Solaris 10 Binary Application Guarantee Program still accepts submissions until May 2011 by the way. Quickest because you maintain a single source code, i.e. no porting is needed --check out the Oracle Solaris 10 Source Application Guarantee Program, also valid until May 2011.

If you are a Windows or Linux ISV today, Solaris x86 is probably your best bet to differentiate yourself. Whether it is with respect to virtualization with Containers, security with the ZFS filesystem, availability with SMF, Oracle Solaris 10 has an award-winning technology that can help you better meet the needs of enterprise customers. The Avaloq press release points particularly to the Solaris "optimisations for the new Intel processors [that makes Solaris 10] a secure, energy-efficient and highly scalable base for mission-critical IT systems which reduces a bank's TCO significantly."

Avaloq's Architect Martin Büchi is in fact talking at Oracle Open World 2010 today at 1PM PST. To discuss database application development. Still, if you happen to be there, ask him about Solaris x86.

About

How open innovation and technology adoption translates to business value, with stories from our developer support work at Oracle's ISV Engineering.

Subscribe

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
5
6
8
9
10
11
12
13
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Feeds