Thursday May 31, 2012

Gemalto Mobile Payment Platform on Oracle T4

GemaltoGemalto is the world leader in digital security, at the heart of our rapidly evolving digital society. Billions of people worldwide increasingly want the freedom to communicate, travel, shop, bank, entertain and work – anytime, everywhere – in ways that are convenient, enjoyable and secure. Gemalto delivers on their expanding needs for personal mobile services, payment security, identity protection, authenticated online services, cloud computing access, eHealthcare and eGovernment services, modern transportation solutions, and M2M communication.

Gemalto’s solutions for Mobile Financial Services are deployed at over 70 customers worldwide, transforming the way people shop, pay and manage personal finance. In developing markets, Gemalto Mobile Money solutions are helping to remove the barriers to financial access for the unbanked and under-served, by turning any mobile device into a payment and banking instrument.

In recent benchmarks by our Oracle ISVe Labs, the Gemalto Mobile Payment Platform demonstrated outstanding performance and scalability using the new T4-based Oracle Sun machines running Solaris 11.

Using a clustered environment on a mid-range 2x2.85GHz T4-2 Server (16 cores total, 128GB memory) for the application tier, and an additional dedicated Intel-based (2x3.2GHz Intel-Xeon X4200) Oracle database server, the platform processed more than 1,000 transactions per second, limited only by database capacity --higher performance was easily achievable with a stronger database server. Near linear scalability was observed by increasing the number of application software components in the cluster. These results show an increase of nearly 300% in processing power and capacity on the new T4-based servers relative to the previous generation of Oracle Sun CMT servers, and for a comparable price.

T4-2

In the fast-evolving Mobile Payment market, it is crucial that the underlying technology seamlessly supports Service Providers as the customer-base ramps up, use cases evolve and new services are launched. These benchmark results demonstrate that the Gemalto Mobile Payment Platform is designed to meet the needs of any deployment scale, whether targeting 5 or 100 million subscribers.

Oracle Solaris 11 DTrace technology helped to pinpoint performance issues and tune the system accordingly to achieve optimal computation resources utilization.

Thursday Mar 29, 2012

Talend Enterprise Data Integration overperforms on Oracle SPARC T4

The SPARC T microprocessor, released in 2005 by Sun Microsystems, and now continued at Oracle, has a good track record in parallel execution and multi-threaded performance. However it was less suited for pure single-threaded workloads. The new SPARC T4 processor is now filling that gap by offering a 5x better single-thread performance over previous generations.

Following our long-term relationship with Talend, a fast growing ISV positioned by Gartner in the “Visionaries” quadrant of the “Magic Quadrant for Data Integration Tools”, we decided to test some of their integration components with the T4 chip, more precisely on a T4-1 system, in order to verify first hand if this new processor stands up to its promises.

Several tests were performed, mainly focused on:

  • Single-thread performance of the new SPARC T4 processor compared to an older SPARC T2+ processor
  • Overall throughput of the SPARC T4-1 server using multiple threads

The tests consisted in reading large amounts of data --ten's of gigabytes--, processing and writing them back to a file or an Oracle 11gR2 database table. They are CPU, memory and IO bound tests. Given the main focus of this project --CPU performance--, bottlenecks were removed as much as possible on the memory and IO sub-systems. When possible, the data to process was put into the ZFS filesystem cache, for instance. Also, two external storage devices were directly attached to the servers under test, each one divided in two ZFS pools for read and write operations.

Test Configuration

Multi-thread: Testing throughput on the Oracle T4-1

The tests were performed with different number of simultaneous threads (1, 2, 4, 8, 12, 16, 32, 48 and 64) and using different storage devices: Flash, Fibre Channel storage, two stripped internal disks and one single internal disk. All storage devices used ZFS as filesystem and volume management.

Each thread read a dedicated 1GB-large file containing 12.5M lines with the following structure:

customerID;FirstName;LastName;StreetAddress;City;State;Zip;Cust_Status;Since_DT;Status_DT
1;Ronald;Reagan;South Highway;Santa Fe;Montana;98756;A;04-06-2006;09-08-2008
2;Theodore;Roosevelt;Timberlane Drive;Columbus;Louisiana;75677;A;10-05-2009;27-05-2008
3;Andrew;Madison;S Rustle St;Santa Fe;Arkansas;75677;A;29-04-2005;09-02-2008
4;Dwight;Adams;South Roosevelt Drive;Baton Rouge;Vermont;75677;A;15-02-2004;26-01-2007
[…]

The following graphs present the results of our tests:

Results 1

Unsurprisingly up to 16 threads, all files fit in the ZFS cache a.k.a L2ARC : once the cache is hot there is no performance difference depending on the underlying storage. From 16 threads upwards however, it is clear that IO becomes a bottleneck, having a good IO subsystem is thus key. Single-disk performance collapses whereas the Sun F5100 and ST6180 arrays allow the T4-1 to scale quite seamlessly. From 32 to 64 threads, the performance is almost constant with just a slow decline.

For the database load tests, only the best IO configuration --using external storage devices-- were used, hosting the Oracle table spaces and redo log files.

Results 2

Using the Sun Storage F5100 array allows the T4-1 server to scale up to 48 parallel JVM processes before saturating the CPU. The final result is a staggering 646K lines per second insertion in an Oracle table using 48 parallel threads.

Single-thread: Testing the single thread performance

Seven different tests were performed on both servers. Given the fact that only one thread, thus one file was read, no IO bottleneck was involved, all data being served from the ZFS cache.

  • Read File → Filter → Write File: Read file, filter data, write the filtered data in a new file. The filter is set on the “Status” column: only lines with status set to “A” are selected. This limits each output file to about 500 MB.
  • Read File → Load Database Table: Read file, insert into a single Oracle table.
  • Average: Read file, compute the average of a numeric column, write the result in a new file.
  • Division & Square Root: Read file, perform a division and square root on a numeric column, write the result data in a new file.
  • Oracle DB Dump: Dump the content of an Oracle table (12.5M rows) into a CSV file.
  • Transform: Read file, transform, write the result data in a new file. The transformations applied are: set the address column to upper case and add an extra column at the end, which is the concatenation of two columns.
  • Sort: Read file, sort a numeric and alpha numeric column, write the result data in a new file.

The following table and graph present the final results of the tests:

  • Throughput unit is thousand lines per second processed (K lines/second).
  • Improvement is the % of improvement between the T5140 and T4-1.

Test

T4-1 (Time s.)

T5140 (Time s.)

Improvement

T4-1 (Throughput)

T5140 (Throughput)

Read/Filter/Write

125

806

645%

100

16

Read/Load Database

195

1111

570%

64

11

Average

96

557

580%

130

22

Division & Square Root

161

1054

655%

78

12

Oracle DB Dump

164

945

576%

76

13

Transform

159

1124

707%

79

11

Sort

251

1336

532%

50

9

Results 3

The improvement of single-thread performance is quite dramatic: depending on the tests, the T4 is between 5.4 to 7 times faster than the T2+. It seems clear that the SPARC T4 processor has gone a long way filling the gap in single-thread performance, without sacrifying the multi-threaded capability as it still shows a very impressive scaling on heavy-duty multi-threaded jobs.

Finally, as always at Oracle ISV Engineering, we are happy to help our ISV partners test their own applications on our platforms, so don't hesitate to contact us and let's see what the SPARC T4-based systems can do for your application!

"As describe in this benchmark, Talend Enterprise Data Integration has overperformed on T4. I was generally happy to see that the T4 gave scaling opportunities for many scenarios like complex aggregations. Row by row insertion in Oracle DB is faster with more than 650,000 rows per seconds without using any bulk Oracle capabilities !"

Cedric Carbone, Talend CTO.

Monday Jan 09, 2012

Infovista VistaInsight for Networks shows 3.7x performance on Oracle

System management vendor InfoVista markets the VistaInsight for Networks® application to enable telco operators, service providers and large enterprises effectively meet performance and service level agreements of converged and next-generation communication networks. As part of our on-going technology partnership, InfoVista and Oracle ISV Engineering together ran a performance test campaign of VistaInsight for Networks® over Oracle Solaris and Sun CMT hardware. The two companies shared many common objectives when starting this project.

The most obvious was to improve the scalability and performance of VistaInsight for Networks® over Oracle's SPARC T-Series systems and thereby provide customers with a better price/performance ratio and a better ROI. From the onset, virtualization was considered a promising technology to improve scalability, thus testing VistaInsight for Networks® in the context of Oracle Solaris Zones was also a major milestone.

Second, InfoVista was interested in setting new limits in terms of the workload that its application can sustain, in response to the evolving needs of its customers.

Lastly, as the first improvements on computing scalability were delivered, it became obvious that the storage was the next critical component for the performance of the entire solution. A decision was then made to test the Oracle Solaris ZFS file system, the Sun ZFS Storage Appliance, and the SSD technology from Oracle to move to the next level of performance.

The result of this performance test campaign is a new Reference Architecture whitepaper that provides detailed information about the configuration tested, the tests executed and the results obtained. It clearly shows that VistaInsight for Networks® takes full advantage of the server, storage and virtualization technology provided by Oracle. By leveraging the Oracle Solaris Zones, Oracle Solaris ZFS, SSD and Sun ZFS Appliance storage, Infovista increased the throughput performance by more than 370%, meeting the highest expectations in terms of workload and performance while maintaining the cost in a very attractive range.

Learn all the details about this new Reference Architecture published on OTN.

Monday Nov 14, 2011

Latency Matters

A lot of interest in low latencies has been expressed within the financial services segment, most especially in the stock trading applications where every millisecond directly influences the profitability of the trader. These days, much of the trading is executed by software applications which are trained to respond to each other almost instantaneously. In fact, you could say that we are in an arms race where traders are using any and all options to cut down on the delay in executing transactions, even by moving physically closer to the trading venue.

The Solaris OS network stack has traditionally been engineered for high throughput, at the expense of higher latencies. Knowledge of tuning parameters to redress the imbalance is critical for applications that are latency sensitive. We are presenting in this blog how to configure further a default Oracle Solaris 10 installation to reduce network latency.

[Read More]

Tuesday Sep 27, 2011

Talend's new data processing engine on Sun Blade X6270

Having the chance to test the brand new Sun Blade X6270 server based on the Intel Xeon X5500 series processors, I asked one of our ISV partners, Talend, an open source ETL (Extract Transform & Load) solution provider, if they where willing to do some benchmarking with me.

The timing was perfect since Talend has just rewritten some parts of their ETL engine, that will be included in the upcoming version, in order to make a better use of modern CPU multi threading capabilities.

During the development they had benched their application on a two socket Xeon 5320, and where very interested in seeing how the the new Intel Xeon 5500 would perform.

Test descriptions

We used DBGEN v2.8.0, a database population program that generates files to be loaded in a database tables. In our case we will generate moderately to very large files, and will process them directly (no use of a database system) as simple flat files. Also, we will be only using the file called “lineitem.tbl” which represents a list of order item lines having the following structure:

DBGen Structure

For each benchmark run we perform three tests, each applying a different type of processing on the file:

  • Sort:
    We will sort the entire file by date, on the 11th column (L_SHIPDATE: see above in red)

  • Count:
    Count the number or order lines by shipment mode ( L_SHIPMOD: see blue column above) and the year of the shipment date. ( L_SHIPDATE: see above in bold red )

  • Average:
    Average discount (L_DISCOUNT) for each item (L_PARTKEY)

DBGEN uses a scaling factor representing the total size of all the tables generated. For this test we only use the file named «lineitem.tbl». The table bellow size and number of lines in the «lineitem.tbl» file given each scaling factor.

As you can see we start quite small, by processing a file with 6 million lines (only !) and go all the way to processing finally 3.3 Billion lines in a single file.



Scale

Number of entries

Size

1

6 Million

740 MB

10

60 Million

7,4 GB

100

600 Million

74 GB

300

1,8 Billion

225 GB

550

3,3 Billion

415 GB


Hardware Configurations

The following table shows the hardware configurations used for the tests (referred to as X6270), and also the vanilla Xeon bases box used by Talend (referred to as Bi-Xeon)

Server

X6270

Bi-Xeon

CPU

2 x Xeon 5520 quad core with HyperThreading & Turbomode on (2,26GHz)

2 x Intel Xeon 5320 quad core (1,86 GHz)

RAM

24 GB DDR III

    4 GB DDRII

Internal storage

1 x 136 GB 15K tr/min

3 x 250 GB and 2 x 320 GB Seagate 7200 tr/min (all on ext3)

  • 1 x 250 GB for system and temporary files

  • 1 x 320 GB for input files

  • 1 x 320 GB for output files

External storage

  • 3 volumes of 4 disks using RAID 0 (stripping), 544 Gb each.

  • A ZFS pool for each group.

None

Operating System

Solaris 10 update 6 (aka. 10/08)

Debian GNU/Linux Etch with Linux 2.6.18 (i686)


With respect to the CPU, the X6270 configuration is obviously much more powerful, especiall given the amount of RAM, and the external storage. However the tests proved to be more CPU and IO bound than memory bound. Even if obviously the amount of memory does make a difference, the test will give us some indications about the extra performance brought by the Xeon 5500.

In order to get closer to the Bi-Xeon configuration, we did also two set of tests on the X6270: with (referred to as X6270-Ext) and without the external storage (Referred to as X6270-Int).

In the second case, we are even in a less favorable position than the Bi-Xeon that uses 3 disks vs. a single disk for the X6270.

Results

The table bellow presents the final results of the tests done on the three configurations. It's interesting to note a couple of things:

  • When processing a file, at least three times the disk space is needed to proceed. For this reason, we could only process a 7.4 GB file for the X6270-Int (Single internal 136 Gb in the server)

  • Given the much higher processing time needed on the Bi-Xeon, we didn't even try going further than 74 Gb.

  • We pushed the X6270-Ext up to processing a 415 GB file, and could have reasonably gone all the way to 1 Tb if we were not limited by disk space.

Result Table











Conclusions

On the CPU bound tests (Average test) we can clearly see a 32% to 60% boost of performance on the new Intel Xeon 5500 compared to the older generation (depending on the size of the file).

Of course the processor matters, and we saw that on the more CPU bound processing, it has a great impact. But what we can also see, and that's not new, is that data hungry processors need to be fed with data, good and fast. To that respect the speed of the IO sub system is very important. Obviously working with files over 400 Gb put a lot of pressure on the IO, and plugging a professional external storage device, just makes a huge difference (in our case anyway)

As you can see on the SORT test (scale 10) we get a 290 % boost with the Intel Xeon 5500. Once we use the external storage, that performance sky rockets to 1075 % (more than 10x the performance) !

We could of course go on along time analyzing all the figures, with different file sizes, but without pushing the analysis very far, it's plain to see the performance gain we get with this new processor alone, not to mention if we also take care of the IO sub system.

The Intel Xeon 5500 based Sun servers, such as the Sun Blade X6270 we just tested, enhanced with an external storage device such as the Sun StorageTek 2540 seems to be a killer combination for large data processing.

About

How open innovation and technology adoption translates to business value, with stories from our developer support work at Oracle's ISV Engineering.

Subscribe

Search

Categories
Archives
« May 2015
SunMonTueWedThuFriSat
     
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
      
Today
Feeds