Oracle's SPARC T7-4 server using virtualization delivered an outstanding single server result running the Hadoop TeraSort benchmark. The SPARC T7-4 server was run with and without security. Even the secure runs on the SPARC M7 processor based server performed much faster per chip compared to competitive unsecure results.
The SPARC T7-4 server on a per chip basis is 4.7x faster than an IBM POWER8 based cluster on the 10 TB Hadoop TeraSort benchmark.
The SPARC T7-4 server running with ZFS encryption enabled on the 10 TB Hadoop TeraSort benchmark is 4.6x faster than an unsecure x86 v2 cluster on a per chip basis.
The SPARC T7-4 server running with ZFS encryption (AES-256-GCM) enabled on the 10 TB Hadoop TeraSort benchmark is 4.3x faster than an unsecure (plain-text) IBM POWER8 cluster on a per chip basis.
The SPARC T7-4 server ran the 10 TB Hadoop TeraSort benchmark in 4,259 seconds.
The following table presents results for the 10 TB Hadoop TeraSort benchmark. The rate results are determined by taking the dataset size (10**13) and dividing by the time (in minutes). These rates are further normalized by the number of systems or chips used in obtaining the results.
|10 TB Hadoop TeraSort Performance Landscape|
|Sort Rate (GB/min)|
|Per Node||Per Chip|
SPARC M7 (4.13 GHz)
SPARC M7 (4.13 GHz)
|IBM Power System S822L
POWER8 (3.0 GHz)
Intel Xeon E5-2680 v2 (2.8 GHz)
|Cisco UCS CPA C240 M3
Intel Xeon E5-2665 (2.4 GHz)
External Storage (Common Multiprotocol SCSI TARget, or COMSTAR enables system to be seen as a SCSI target device):
The Hadoop TeraSort benchmark sorts 100-byte records by a contained 10-byte random key. Hadoop TeraSort is characterized by high I/O bandwidth between each compute/data node of a Hadoop cluster and the disk drives that are attached to that node.
Note: benchmark size is measured by power-of-ten not power-of-two bytes; 1 TB sort is sorting 10^12 Bytes = 10 billion 100-byte rows using an embedded 10-Byte key field of random characters, 100 GB sort is sorting 10^11 Bytes = 1 billion 100-byte rows, etc.
The SPARC T7-4 server was configured with 15 Oracle Solaris Zones. Each Zone was running one Hadoop data-node with HDFS layered on an Oracle Solaris ZFS volume.
Hadoop uses a distributed, shared nothing, batch processing framework employing divide-conquer serial Map and Reduce JVM tasks with performance coming from scale-out concurrency (e.g. more tasks) rather than parallelism. Only one job scheduler and task manager can be configured per data/compute-node and both (job scheduler and task manager) have inherent scaling limitations (the hadoop design target being small compute-nodes and hundreds or even thousands of them).
Multiple data-nodes significantly help improve overall system utilization – HDFS becomes more distributed with more processes servicing file system operations, and more task-trackers are managing all the MapReduce work.
On large node systems virtualization is required to improve utilization by increasing the number of independent data/compute nodes each running their own hadoop processes.
I/O bandwidth to the local disk drives and network communication bandwidth are the primary determinants of Hadoop performance. Typically, Hadoop reads input data files from HDFS during the Map phase of computation, and stores intermediate file back to HDFS. Then during the subsequent Reduce phase of computation, Hadoop reads the intermediate files, and outputs the final result. The Map and Reduce phases are executed concurrently by multiple Map tasks and Reduce tasks. Tasks are purpose-built stand-alone serial applications often written in Java (but can be written in any programming language or script).
Copyright 2015, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of 25 October 2015.