Talend Enterprise Data Integration overperforms on Oracle SPARC T4

The SPARC T microprocessor, released in 2005 by Sun Microsystems, and now continued at Oracle, has a good track record in parallel execution and multi-threaded performance. However it was less suited for pure single-threaded workloads. The new SPARC T4 processor is now filling that gap by offering a 5x better single-thread performance over previous generations.

Following our long-term relationship with Talend, a fast growing ISV positioned by Gartner in the “Visionaries” quadrant of the “Magic Quadrant for Data Integration Tools”, we decided to test some of their integration components with the T4 chip, more precisely on a T4-1 system, in order to verify first hand if this new processor stands up to its promises.

Several tests were performed, mainly focused on:

  • Single-thread performance of the new SPARC T4 processor compared to an older SPARC T2+ processor
  • Overall throughput of the SPARC T4-1 server using multiple threads

The tests consisted in reading large amounts of data --ten's of gigabytes--, processing and writing them back to a file or an Oracle 11gR2 database table. They are CPU, memory and IO bound tests. Given the main focus of this project --CPU performance--, bottlenecks were removed as much as possible on the memory and IO sub-systems. When possible, the data to process was put into the ZFS filesystem cache, for instance. Also, two external storage devices were directly attached to the servers under test, each one divided in two ZFS pools for read and write operations.

Test Configuration

Multi-thread: Testing throughput on the Oracle T4-1

The tests were performed with different number of simultaneous threads (1, 2, 4, 8, 12, 16, 32, 48 and 64) and using different storage devices: Flash, Fibre Channel storage, two stripped internal disks and one single internal disk. All storage devices used ZFS as filesystem and volume management.

Each thread read a dedicated 1GB-large file containing 12.5M lines with the following structure:

customerID;FirstName;LastName;StreetAddress;City;State;Zip;Cust_Status;Since_DT;Status_DT
1;Ronald;Reagan;South Highway;Santa Fe;Montana;98756;A;04-06-2006;09-08-2008
2;Theodore;Roosevelt;Timberlane Drive;Columbus;Louisiana;75677;A;10-05-2009;27-05-2008
3;Andrew;Madison;S Rustle St;Santa Fe;Arkansas;75677;A;29-04-2005;09-02-2008
4;Dwight;Adams;South Roosevelt Drive;Baton Rouge;Vermont;75677;A;15-02-2004;26-01-2007
[…]

The following graphs present the results of our tests:

Results 1

Unsurprisingly up to 16 threads, all files fit in the ZFS cache a.k.a L2ARC : once the cache is hot there is no performance difference depending on the underlying storage. From 16 threads upwards however, it is clear that IO becomes a bottleneck, having a good IO subsystem is thus key. Single-disk performance collapses whereas the Sun F5100 and ST6180 arrays allow the T4-1 to scale quite seamlessly. From 32 to 64 threads, the performance is almost constant with just a slow decline.

For the database load tests, only the best IO configuration --using external storage devices-- were used, hosting the Oracle table spaces and redo log files.

Results 2

Using the Sun Storage F5100 array allows the T4-1 server to scale up to 48 parallel JVM processes before saturating the CPU. The final result is a staggering 646K lines per second insertion in an Oracle table using 48 parallel threads.

Single-thread: Testing the single thread performance

Seven different tests were performed on both servers. Given the fact that only one thread, thus one file was read, no IO bottleneck was involved, all data being served from the ZFS cache.

  • Read File → Filter → Write File: Read file, filter data, write the filtered data in a new file. The filter is set on the “Status” column: only lines with status set to “A” are selected. This limits each output file to about 500 MB.
  • Read File → Load Database Table: Read file, insert into a single Oracle table.
  • Average: Read file, compute the average of a numeric column, write the result in a new file.
  • Division & Square Root: Read file, perform a division and square root on a numeric column, write the result data in a new file.
  • Oracle DB Dump: Dump the content of an Oracle table (12.5M rows) into a CSV file.
  • Transform: Read file, transform, write the result data in a new file. The transformations applied are: set the address column to upper case and add an extra column at the end, which is the concatenation of two columns.
  • Sort: Read file, sort a numeric and alpha numeric column, write the result data in a new file.

The following table and graph present the final results of the tests:

  • Throughput unit is thousand lines per second processed (K lines/second).
  • Improvement is the % of improvement between the T5140 and T4-1.

Test

T4-1 (Time s.)

T5140 (Time s.)

Improvement

T4-1 (Throughput)

T5140 (Throughput)

Read/Filter/Write

125

806

645%

100

16

Read/Load Database

195

1111

570%

64

11

Average

96

557

580%

130

22

Division & Square Root

161

1054

655%

78

12

Oracle DB Dump

164

945

576%

76

13

Transform

159

1124

707%

79

11

Sort

251

1336

532%

50

9

Results 3

The improvement of single-thread performance is quite dramatic: depending on the tests, the T4 is between 5.4 to 7 times faster than the T2+. It seems clear that the SPARC T4 processor has gone a long way filling the gap in single-thread performance, without sacrifying the multi-threaded capability as it still shows a very impressive scaling on heavy-duty multi-threaded jobs.

Finally, as always at Oracle ISV Engineering, we are happy to help our ISV partners test their own applications on our platforms, so don't hesitate to contact us and let's see what the SPARC T4-based systems can do for your application!

"As describe in this benchmark, Talend Enterprise Data Integration has overperformed on T4. I was generally happy to see that the T4 gave scaling opportunities for many scenarios like complex aggregations. Row by row insertion in Oracle DB is faster with more than 650,000 rows per seconds without using any bulk Oracle capabilities !"

Cedric Carbone, Talend CTO.

Comments:

Nice writeup. Would be interesting to see how one of the M-series servers performs for a comparison.

Posted by Jukka on March 30, 2012 at 06:31 AM CEST #

would be more beneficial to compare performance of an Intel/AMD released in same time frame (similar generation) to T4. Comparing T4 to T2 only proves that T4 can outperform T2, but its more like saying my 2012 Accord has better mileage than 2006 Accord.

Posted by martin francis k on April 01, 2012 at 11:11 AM CEST #

Hi Martin

I don't fully agree with you when you say that this only proves that the T4 outperforms the T2+ (or even a T3) as a natural fact. Obviously newer chips always tend to outperform older generations. However the main point here was to verify the fact that the T4 has a much better SINGLE thread performance than the previous generation: We are not talking about 30% or 50% improvement but at least 5x better, which is quite a breakthough.
The other point was to check if the single thread performance evolution, did not break the primary force of the T chips: multi-threaded workload processing.

The tests where therfore focused on those two points and I do believe that it answeres to both questions.

However, it would be, as you said, interesting to test the performance of the T4 compared to an Intel/AMD chip but this was not the aim of this particular test.

Posted by Amir Javanshir on April 02, 2012 at 05:57 AM CEST #

I agree with your point. Given the purpose of the test, it very well proves the point.
Its also interesting to know, FC drive scales similar to a flash array in sequential read.

Posted by martin francis k on April 02, 2012 at 09:31 AM CEST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

How open innovation and technology adoption translates to business value, with stories from our developer support work at Oracle's ISV Engineering.

Subscribe

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
5
6
8
9
10
11
12
13
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Feeds