X

Everything you want and need to know about Oracle SPARC systems performance

Computational Graph Algorithms: SPARC T7-4 Performance on PageRank and Single-Source Shortest Path

Brian Whitney
Principal Software Engineer

Computational graph algorithms are used in many big data and analytics workloads. The algorithm kernels inherently have a large degree of parallelism and a very large number of random accesses to memory. Oracle's SPARC M7 processor based system provides better performance than an x86 server with Intel Xeon Processor E7-8895 v3.

  • Oracle's SPARC T7-4 server with four SPARC M7 processors evaluating PageRank on large graphs was able to deliver 1.2x to 1.5x better per core performance than a four chip x86 server with Intel Xeon Processors E7-8895 v3.

  • The SPARC T7-4 server with four SPARC M7 processors computing Single-Source Shortest Path (SSSP) using the Bellman-Ford algorithm on large graphs was able to deliver 1.4x to 1.5x better per core performance than a four-chip x86 server with Intel Xeon Processor E7-8895 v3.

Fifty PageRank iterations were run. SSSP repeated the computation from 3 different source vertices.  The Graph 500 Scale 32 graph has 1,650 million vertices, 68.7 billion edges and requires 4 TB of memory to analyze.  The Graph 500 Scale 30 graph has 448 million vertices, 17.2 billion edges and requires a computer with 1 TB of memory to analyze.  The Graph 500 Scale 29 graph has 233 million vertices, 8.6 billion edges and requires 500 GB of memory to analyze.

Performance Landscape

The graphs are identified by "Scale".

Graph Algorithm Graph 500
Scale
x86
4-chip
(sec)
SPARC
T7-4
4-chip
(sec)
SPARC
Per Chip
Advantage
SPARC
Per Core
Advantage
PageRank 30 136.7 62.6 2.1x 1.2x
29 72.1 27.6 2.6x 1.5x
SSSP Bellman-Ford 30 39.2 14.7 2.7x 1.5x
29 21.3 8.5 2.5x 1.4x

Both graph algorithms were also run on a 12 processor SPARC M7-16 server with Graph 500 Scale 32.

Graph Algorithm Graph 500
Scale
SPARC M7-16
using 1 chip
(sec)
SPARC M7-16
using 12 chips
(sec)
PageRank 32 1,600.9 98.4
SSSP Bellman-Ford 32 405.9 31.5

 

Configuration Summary

Systems Under Test:

SPARC T7-4 server with
4 x SPARC M7 processors (4.13 GHz)
2 TB memory
Oracle Solaris 11.3
Oracle Developer Studio 12.5

Oracle Server X5-4 system with
4 x Intel Xeon E7-8895 v3 processors (2.6 GHz), hyperthreading enabled
1 TB memory
Oracle Linux Server release 7.1
gcc version 6.1.0 (CDS 27-Apr-2016)

SPARC M7-16 server with
12 x SPARC M7 processors (4.13 GHz), configurable from 4 to 16 processors
6 TB memory
Oracle Solaris 11.3
Oracle Developer Studio 12.5

Benchmark Description

Computational graphs are a core part of many analytics workloads and are very data intensive and stress computers. Each algorithm typically traverses the entire graph multiple times, while doing certain arithmetic operations during the traversal.  These benchmarks generate a combination of sequential memory access patterns (streaming through the structure of the graph) and random accesses to the targets of graph edges

Two computational graph algorithms were run, PageRank and the Bellman-Ford algorithm for Single-Source Shortest Path (SSSP), using the Graph 500 graphs at scale 29, 30 and 32.

The mathematics of PageRank are entirely general and apply to any graph or network in any domain. Thus, PageRank is now regularly used in bibliometrics, social and information network analysis, and for link prediction and recommendation. The PageRank algorithm counts the number and quality of links to a page to determine a rough estimate of the importance. Graph vertices and edges are represented as 64 bit integers.  Double precision floating point arithmetic is used for PageRank values.  The benchmark runs 50 PageRank iterations.

The Bellman-Ford algorithm for SSSP, Single-Source Shortest Path, finds the shortest paths from a source vertex to all other vertices in the graph. Often used to find the shortest distance between 2 points, e.g. Google Maps. It's also used in operations research and "six degrees of separation." Graph vertices and edges are represented as 64 bit integers.  Double precision floating point distances are held on each graph edge. The result provides the shortest distance from a root vertex to every other vertex, and each vertex's predecessor along this shortest path. The benchmark repeats this computation from 3 different source vertices.

Characteristics of the Graphs
Graph 500
Scale
Connected
Vertices
Edges Runtime Memory
Required
32 1,650 M 6.87 B 4 TB
30 448 M 17.2 B 1 TB
29 233 M 8.6 B 500 GB


Note: M is 10**6, B is 10**9.

Key Points and Best Practices

Computational graph algorithms benefit from the faster memory bandwidth in the SPARC T7-4 server. The memory bandwidth as measured by STREAM is 258,868 MB/Sec on the Oracle Server X5-4 and 555,374 MB/sec on the SPARC T7-4 server.

Each algorithm was highly tuned for the memory hierarchy of each processor's architecture and its server.

See Also

Disclosure Statement

Copyright 2016, Oracle and/or its affiliates. All rights reserved.  Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. Results as of June 29, 2016.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.