Talend's new data processing engine on Sun Blade X6270

Having the chance to test the brand new Sun Blade X6270 server based on the Intel Xeon X5500 series processors, I asked one of our ISV partners, Talend, an open source ETL (Extract Transform & Load) solution provider, if they where willing to do some benchmarking with me.

The timing was perfect since Talend has just rewritten some parts of their ETL engine, that will be included in the upcoming version, in order to make a better use of modern CPU multi threading capabilities.

During the development they had benched their application on a two socket Xeon 5320, and where very interested in seeing how the the new Intel Xeon 5500 would perform.

Test descriptions

We used DBGEN v2.8.0, a database population program that generates files to be loaded in a database tables. In our case we will generate moderately to very large files, and will process them directly (no use of a database system) as simple flat files. Also, we will be only using the file called “lineitem.tbl” which represents a list of order item lines having the following structure:

DBGen Structure

For each benchmark run we perform three tests, each applying a different type of processing on the file:

  • Sort:
    We will sort the entire file by date, on the 11th column (L_SHIPDATE: see above in red)

  • Count:
    Count the number or order lines by shipment mode ( L_SHIPMOD: see blue column above) and the year of the shipment date. ( L_SHIPDATE: see above in bold red )

  • Average:
    Average discount (L_DISCOUNT) for each item (L_PARTKEY)

DBGEN uses a scaling factor representing the total size of all the tables generated. For this test we only use the file named «lineitem.tbl». The table bellow size and number of lines in the «lineitem.tbl» file given each scaling factor.

As you can see we start quite small, by processing a file with 6 million lines (only !) and go all the way to processing finally 3.3 Billion lines in a single file.



Scale

Number of entries

Size

1

6 Million

740 MB

10

60 Million

7,4 GB

100

600 Million

74 GB

300

1,8 Billion

225 GB

550

3,3 Billion

415 GB


Hardware Configurations

The following table shows the hardware configurations used for the tests (referred to as X6270), and also the vanilla Xeon bases box used by Talend (referred to as Bi-Xeon)

Server

X6270

Bi-Xeon

CPU

2 x Xeon 5520 quad core with HyperThreading & Turbomode on (2,26GHz)

2 x Intel Xeon 5320 quad core (1,86 GHz)

RAM

24 GB DDR III

    4 GB DDRII

Internal storage

1 x 136 GB 15K tr/min

3 x 250 GB and 2 x 320 GB Seagate 7200 tr/min (all on ext3)

  • 1 x 250 GB for system and temporary files

  • 1 x 320 GB for input files

  • 1 x 320 GB for output files

External storage

  • 3 volumes of 4 disks using RAID 0 (stripping), 544 Gb each.

  • A ZFS pool for each group.

None

Operating System

Solaris 10 update 6 (aka. 10/08)

Debian GNU/Linux Etch with Linux 2.6.18 (i686)


With respect to the CPU, the X6270 configuration is obviously much more powerful, especiall given the amount of RAM, and the external storage. However the tests proved to be more CPU and IO bound than memory bound. Even if obviously the amount of memory does make a difference, the test will give us some indications about the extra performance brought by the Xeon 5500.

In order to get closer to the Bi-Xeon configuration, we did also two set of tests on the X6270: with (referred to as X6270-Ext) and without the external storage (Referred to as X6270-Int).

In the second case, we are even in a less favorable position than the Bi-Xeon that uses 3 disks vs. a single disk for the X6270.

Results

The table bellow presents the final results of the tests done on the three configurations. It's interesting to note a couple of things:

  • When processing a file, at least three times the disk space is needed to proceed. For this reason, we could only process a 7.4 GB file for the X6270-Int (Single internal 136 Gb in the server)

  • Given the much higher processing time needed on the Bi-Xeon, we didn't even try going further than 74 Gb.

  • We pushed the X6270-Ext up to processing a 415 GB file, and could have reasonably gone all the way to 1 Tb if we were not limited by disk space.

Result Table











Conclusions

On the CPU bound tests (Average test) we can clearly see a 32% to 60% boost of performance on the new Intel Xeon 5500 compared to the older generation (depending on the size of the file).

Of course the processor matters, and we saw that on the more CPU bound processing, it has a great impact. But what we can also see, and that's not new, is that data hungry processors need to be fed with data, good and fast. To that respect the speed of the IO sub system is very important. Obviously working with files over 400 Gb put a lot of pressure on the IO, and plugging a professional external storage device, just makes a huge difference (in our case anyway)

As you can see on the SORT test (scale 10) we get a 290 % boost with the Intel Xeon 5500. Once we use the external storage, that performance sky rockets to 1075 % (more than 10x the performance) !

We could of course go on along time analyzing all the figures, with different file sizes, but without pushing the analysis very far, it's plain to see the performance gain we get with this new processor alone, not to mention if we also take care of the IO sub system.

The Intel Xeon 5500 based Sun servers, such as the Sun Blade X6270 we just tested, enhanced with an external storage device such as the Sun StorageTek 2540 seems to be a killer combination for large data processing.

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

How open innovation and technology adoption translates to business value, with stories from our developer support work at Oracle's ISV Engineering.

Subscribe

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
5
6
8
9
10
11
12
13
16
17
18
19
20
21
23
24
25
26
27
28
29
30
   
       
Today
Feeds