X

Everything you want and need to know about Oracle SPARC systems performance

Hadoop TeraSort: SPARC T7-4 Top Per-Chip Performance

Brian Whitney
Principal Software Engineer

Oracle's SPARC T7-4 server using virtualization delivered an outstanding single server result running the Hadoop TeraSort benchmark.  The SPARC T7-4 server was run with and without security. Even the secure runs on the SPARC M7 processor based server performed much faster per chip compared to competitive unsecure results.

  • The SPARC T7-4 server on a per chip basis is 4.7x faster than an IBM POWER8 based cluster on the 10 TB Hadoop TeraSort benchmark.

  • The SPARC T7-4 server running with ZFS encryption enabled on the 10 TB Hadoop TeraSort benchmark is 4.6x faster than an unsecure x86 v2 cluster on a per chip basis.

  • The SPARC T7-4 server running with ZFS encryption (AES-256-GCM) enabled on the 10 TB Hadoop TeraSort benchmark is 4.3x faster than an unsecure (plain-text) IBM POWER8 cluster on a per chip basis.

  • The SPARC T7-4 server ran the 10 TB Hadoop TeraSort benchmark in 4,259 seconds.

Performance Landscape

The following table presents results for the 10 TB Hadoop TeraSort benchmark. The rate results are determined by taking the dataset size (10**13) and dividing by the time (in minutes). These rates are further normalized by the number of systems or chips used in obtaining the results.

 
10 TB Hadoop TeraSort Performance Landscape
System Security Nodes Total
Chips
Time
(sec)
Sort Rate (GB/min)
Per Node Per Chip
SPARC T7-4
SPARC M7 (4.13 GHz)
unsecure 1 4 4,259 140.9 35.2
SPARC T7-4
SPARC M7 (4.13 GHz)
AES-256-GCM 1 4 4,657 128.8 32.2
IBM Power System S822L
POWER8 (3.0 GHz)
unsecure 8 32 2,490 30.1 7.5
Dell R720xd/VMware
Intel Xeon E5-2680 v2 (2.8 GHz)
unsecure 32 64 1,054 17.8 8.9
Cisco UCS CPA C240 M3
Intel Xeon E5-2665 (2.4 GHz)
unsecure 16 32 3,112 12.0 6.0
 

Configuration Summary

Server:

SPARC T7-4
4 x SPARC M7 processors (4.13 GHz)
2 TB memory (64 x 32 GB)
6 x 600 GB 10K RPM SAS-2 HDD
10 GbE
Oracle Solaris 11.3 (11.3.0.29)
Oracle Solaris Studio 12.4
Java SE Runtime Environment (build 1.7.0_85-b33)
Hadoop 1.2.1
 

External Storage (Common Multiprotocol SCSI TARget, or COMSTAR enables system to be seen as a SCSI target device):

16 x Sun Server X3-2L
2 x Intel Xeon E5-2609 (2.4 GHz)
16 GB memory (2 x 8 GB)
2 x 600 GB SAS-2 HDD
12 x 3 TB SAS-1 HDD
4 x Sun Flash Accelerator F40 PCIe Card
Oracle Solaris 11.1 (11.1.16.5.0)

Please note: These devices are only used as storage. No Hadoop is run on these COMSTAR storage nodes. There was no compression or encryption done on these COMSTAR storage nodes.
 

Benchmark Description

The Hadoop TeraSort benchmark sorts 100-byte records by a contained 10-byte random key.  Hadoop TeraSort is characterized by high I/O bandwidth between each compute/data node of a Hadoop cluster and the disk drives that are attached to that node.

Note: benchmark size is measured by power-of-ten not power-of-two bytes; 1 TB sort is sorting 10^12 Bytes = 10 billion 100-byte rows using an embedded 10-Byte key field of random characters, 100 GB sort is sorting 10^11 Bytes = 1 billion 100-byte rows, etc.

Key Points and Best Practices

  • The SPARC T7-4 server was configured with 15 Oracle Solaris Zones.  Each Zone was running one Hadoop data-node with HDFS layered on an Oracle Solaris ZFS volume.

  • Hadoop uses a distributed, shared nothing, batch processing framework employing divide-conquer serial Map and Reduce JVM tasks with performance coming from scale-out concurrency (e.g. more tasks) rather than parallelism. Only one job scheduler and task manager can be configured per data/compute-node and both (job scheduler and task manager) have inherent scaling limitations (the hadoop design target being small compute-nodes and hundreds or even thousands of them).

  • Multiple data-nodes significantly help improve overall system utilization – HDFS becomes more distributed with more processes servicing file system operations, and more task-trackers are managing all the MapReduce work.

  • On large node systems virtualization is required to improve utilization by increasing the number of independent data/compute nodes each running their own hadoop processes.

  • I/O bandwidth to the local disk drives and network communication bandwidth are the primary determinants of Hadoop performance.  Typically, Hadoop reads input data files from HDFS during the Map phase of computation, and stores intermediate file back to HDFS. Then during the subsequent Reduce phase of computation, Hadoop reads the intermediate files, and outputs the final result. The Map and Reduce phases are executed concurrently by multiple Map tasks and Reduce tasks. Tasks are purpose-built stand-alone serial applications often written in Java (but can be written in any programming language or script).

See Also

Disclosure Statement

Copyright 2015, Oracle and/or its affiliates. All rights reserved.  Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.  Results as of 25 October 2015.

Competitive results found at: Dell R720xd/VMware, IBM S822L, Cisco C240 M3

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.Captcha
Oracle

Integrated Cloud Applications & Platform Services