Computational graph algorithms are used in many big data and analytics workloads. The algorithm kernels inherently have a large degree of parallelism and a very large number of random accesses to memory. Oracle's SPARC M7 processor based system provides better performance than an x86 server with Intel Xeon Processor E7-8895 v3.
Oracle's SPARC T7-4 server with four SPARC M7 processors evaluating PageRank on large graphs was able to deliver 1.2x to 1.5x better per core performance than a four chip x86 server with Intel Xeon Processors E7-8895 v3.
The SPARC T7-4 server with four SPARC M7 processors computing Single-Source Shortest Path (SSSP) using the Bellman-Ford algorithm on large graphs was able to deliver 1.4x to 1.5x better per core performance than a four-chip x86 server with Intel Xeon Processor E7-8895 v3.
Fifty PageRank iterations were run. SSSP repeated the computation from 3 different source vertices. The Graph 500 Scale 32 graph has 1,650 million vertices, 68.7 billion edges and requires 4 TB of memory to analyze. The Graph 500 Scale 30 graph has 448 million vertices, 17.2 billion edges and requires a computer with 1 TB of memory to analyze. The Graph 500 Scale 29 graph has 233 million vertices, 8.6 billion edges and requires 500 GB of memory to analyze.
The graphs are identified by "Scale".
Graph Algorithm | Graph 500 Scale |
x86 4-chip (sec) |
SPARC T7-4 4-chip (sec) |
SPARC Per Chip Advantage |
SPARC Per Core Advantage |
---|---|---|---|---|---|
PageRank | 30 | 136.7 | 62.6 | 2.1x | 1.2x |
29 | 72.1 | 27.6 | 2.6x | 1.5x | |
SSSP Bellman-Ford | 30 | 39.2 | 14.7 | 2.7x | 1.5x |
29 | 21.3 | 8.5 | 2.5x | 1.4x |
Both graph algorithms were also run on a 12 processor SPARC M7-16 server with Graph 500 Scale 32.
Graph Algorithm | Graph 500 Scale |
SPARC M7-16 using 1 chip (sec) |
SPARC M7-16 using 12 chips (sec) |
---|---|---|---|
PageRank | 32 | 1,600.9 | 98.4 |
SSSP Bellman-Ford | 32 | 405.9 | 31.5 |
Systems Under Test:
Computational graphs are a core part of many analytics workloads and are very data intensive and stress computers. Each algorithm typically traverses the entire graph multiple times, while doing certain arithmetic operations during the traversal. These benchmarks generate a combination of sequential memory access patterns (streaming through the structure of the graph) and random accesses to the targets of graph edges
Two computational graph algorithms were run, PageRank and the Bellman-Ford algorithm for Single-Source Shortest Path (SSSP), using the Graph 500 graphs at scale 29, 30 and 32.
The mathematics of PageRank are entirely general and apply to any graph or network in any domain. Thus, PageRank is now regularly used in bibliometrics, social and information network analysis, and for link prediction and recommendation. The PageRank algorithm counts the number and quality of links to a page to determine a rough estimate of the importance. Graph vertices and edges are represented as 64 bit integers. Double precision floating point arithmetic is used for PageRank values. The benchmark runs 50 PageRank iterations.
The Bellman-Ford algorithm for SSSP, Single-Source Shortest Path, finds the shortest paths from a source vertex to all other vertices in the graph. Often used to find the shortest distance between 2 points, e.g. Google Maps. It's also used in operations research and "six degrees of separation." Graph vertices and edges are represented as 64 bit integers. Double precision floating point distances are held on each graph edge. The result provides the shortest distance from a root vertex to every other vertex, and each vertex's predecessor along this shortest path. The benchmark repeats this computation from 3 different source vertices.
Characteristics of the Graphs | |||
---|---|---|---|
Graph 500 Scale |
Connected Vertices |
Edges | Runtime Memory Required |
32 | 1,650 M | 6.87 B | 4 TB |
30 | 448 M | 17.2 B | 1 TB |
29 | 233 M | 8.6 B | 500 GB |
Note: M is 10**6, B is 10**9.
Computational graph algorithms benefit from the faster memory bandwidth in the SPARC T7-4 server. The memory bandwidth as measured by STREAM is 258,868 MB/Sec on the Oracle Server X5-4 and 555,374 MB/sec on the SPARC T7-4 server.
Each algorithm was highly tuned for the memory hierarchy of each processor's architecture and its server.
Copyright 2016, Oracle and/or its affiliates. All rights reserved. Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of June 29, 2016.