Monday Jun 28, 2010

Oracle Solaris Studio Secret Sauce: Ferociously tuned and Parallel Scientific Libraries



One of the lesser known "secret sauces" of Oracle Solaris Studio is perhaps one of its easiest-to-use and highest performance components: Performance Library (what we commonly call Perflib). Sun Performance Library is a set of optimized, high-speed mathematical subroutines for solving linear algebra and other numerically intensive problems. Sun Performance Library is based on a collection of public domain applications available from Netlib . Sun has enhanced these public domain applications and bundled them as the Sun Performance Library. Sun ensures that the performance of each routine is optimal for the underlying hardware and that the routines are parallelized to take advantage of multiple cores.

If words like BLAS, LAPACK, FFTPACK, SuperLU, ScalaPACK, SparseBLAS and SPSOLVE get you excited or at least curious, read on. For the rest of you, there are only a couple of headliners, I'd like you to remember:
  • Sun Performance Library comes optimized for every Sun HW platform. This means there are optimized versions for for V8, sparcvis, sparcvis2, and sparcfmaf architectures on the SPARC side and there are also optimized versions for x86/x64 architectures, for AMD/Opteron, AMD/Barcelona and Intel/Xeon.
  • Sun Performance Library works on Solaris SPARC, Solaris x86/x64, OEL, RedHat and SuSE
  • These highly optimized versions are hand-tuned for the best performance. That means linking into these routines will automagically give you scalability across multiple cores and the best possible performance on each HW brand you could be running.
  • Scalability across multiple cores is automatically guaranteed by the parallelized routines, which means code can automatically scale up on newer machines without having to parallelize code by hand (a very tedious task, in most cases).
Of course, these advantages apply only to numeric codes that can take advantage of these popular routines.
For the die-hards who want to know more, here is a classification of the kind of Linear Algebra and Numerical solvers that are part of Perflib:
  • Elementary vector and matrix operations - Vector and matrix products; plane rotations; 1, 2-, and infinity-norms; rank-1, 2, k, and 2k updates
  • Linear systems - Solve full-rank systems, compute error bounds, solve Sylvester equations, refine a computed solution, equilibrate a coefficient matrix
  • Least squares - Full-rank, generalized linear regression, rank-deficient, linear equality constrained
  • Eigenproblems - Eigenvalues, generalized eigenvalues, eigenvectors, generalized eigenvectors, Schur vectors, generalized Schur vectors
  • Matrix factorizations or decompositions - SVD, generalized SVD, QL and LQ, QR and RQ, Cholesky, LU, Schur
  • Support operations - Condition number, in-place or out-of-place transpose, inverse, determinant, inertia
  • Sparse matrices - Solve symmetric, structurally symmetric, and unsymmetric coefficient matrices using direct methods and a choice of fill-reducing ordering algorithms, and user-specified orderings
  • Convolution and correlation in one and two dimensions
  • Fast Fourier transforms, Fourier synthesis, cosine and quarter-wave cosine transforms, cosine and quarter-wave sine transforms
  • Complex vector FFTs and FFTs in two and three dimensions
  • Interval BLAS routines
  • Sorting operations
See a complete list of routines here.

Taking full advantage of the increased accuracy, performance and parallelism of these routines often requires code change. However, in many cases, such code change can result in more readable code as well (here is a good example of that). The performance improvements are often dramatic, and well worth the time taken to change code to take advantage of these routines.

Want to know more? There are several places you can look:


Thursday Jun 10, 2010

(HPC) Challenges to Exascale Super-computing



After the recently concluded HPC/International Supercomputing conference, there is quite a bit of talk about Exa-scale computing. The idea here is to push Supercomputing into the realm of sustainable Exaflop computation (by 2018). [Lets ignore for a moment that its pretty hard to sustain even the current Petaflop levels, a problem that will no doubt be solved in next few years ... ]

Jack Dongarra, a leader in the area of Supercomputing (and co-creator of leading mathematical packages such as Linpack, EISpack, etc), recently gave an interview on this topic which I think makes for an interesting read (you can read the full interview here). The salient points are interesting, and I'm listing here a few that I found most worth pondering over:
  • Going from Petascale to Exascale will mean going from hundreds of thousands of threads to billions of threads
    • This shift is similar to the shift from vector programs to parallel programming
    • The strategy used to achieve petascale will no longer scale to the exascale level, so programs will need to be redesigned
    • Programs will need to have asynchronous handling built in.
  • Exascale programs/machines will essentially be hybrids and purely MPI or loop-based programs will no longer be viable for this scale. Thus a fork-join model will no longer work
  • Memory is going to play at least as big a factor as CPU. For costs, for heat considerations and for latency/computational issues.
  • Programs will have to build in fault-tolerance. At that scale, something is bound to fail. And you can restart using checkpointing
  • Machines will be both lightweight parallel (Blue-gene style of lots of simple threads) or commodity processors with GPU accelerators.
  • International cooperation is a must. Government (and international) involvement of bodies like G-8 will be critical drivers.
  • Community will drive development into vendors.
This is an ambitious and complex goal and the journey will be interesting to follow as much for the human pursuit as it is for the technical pursuit. As a major HPC vendor, Sun systems group (inside Oracle) is watching and following these developments very closely. Compilers and tools are an integral part of such a pursuit; they have always been and will continue to be critical.

Wednesday Nov 11, 2009

Sun Studio OpenMP gets 12x improvement on Seismic benchmark on SLES10


This story is hard to pass up:  Sun's BestPerf blog (read the details here) recently reported how they got a 12x performance improvement over a single-threaded version on an important Seismic (Reverse Time Migration) benchmark using Sun Studio's OpenMP feature on SLES10. Its a great story of how Sun can deliver performance through a combination of Sun Studio and new Hardware (via Sun Storage F5100 Flash Array). Yes, this is the same Flash Array that has been the talk of the town and has notched up several World Record wins.
Several points come to mind:
  • Sun Studio and OpenMP are key to exploiting parallel performance. Not just with Flash, but also with multiple cores now becoming the mainstay in chip offerings. Multi-threading, parallel performance (and parallel programming, for those who are willing to take the effort) is going to be even more critical to fully utilize system resources now and into the future.
  • Sun Studio performance here is highlighted on SuSE 10. Note this, because I've had to defended the impression that Sun Studio doesnt do as well on Linux; it does. Sun Studio does not leave any performance, features, tools, options, optimizations out of its offering on Linux.
  • The Flash Array Storage alone gets a 2.2x performance win over 15K disks. But the combination with Sun Studio in achieving parallelism that the Flash Array Storage can exploit is even more attractive.

Monday Sep 08, 2008

Interesting Read: Parallel Programming Made Easy?


Michael Wolfe, Senior Compiler engineer/architects at The Portland Group (PGI Compilers) recently wrote an article featured in HPCWire questioning how Easy? one could claim to make Parallel programming.
Some interesting thoughts and views from one of the leading compiler architects from a compiler that has focussed on high performance computing for decades(?) now.
Whether you agree or not is your own personal view of how this important style of programming should be shaped, but he does represent a very important part of the discussion underway today.
Definitely worth a read!

Thursday May 22, 2008

Sun Studio wins Infoworld award in Application Performance category


Sun Studio came out on top in the application performance category in the recently rated in a new IDE survey by Infoworld (here).
NetBeans is rated lower than Sun Studio (on which the SunStudio IDE is based), which is a surprise. Also surprising is the absence of Eclipse in the list, which the article explains in a very unsatisfactory way, IMO.
I'm just glad that it is being noticed that we -Sun Studio compilers- produce good overall application performance. Someone saying ITS BEST really helps.
FYI, IBM's Rational IDE came up on top overall. I'm sure the survey is not without bias, as the comments it has attracted seem to indicate.

Monday Jan 21, 2008

Technical articles on performance tuning


I often get asked about which compiler options work best for x86 or SPARC or even between Intel and AMD or various SPARC architectures.
Here is a handy reference for compiler optimization and Performance Tuning options that is nice to have handy. Getting good performance out of applications is both important and often a little tricky, so this should help.
Of course the generic advice is that -xtarget=generic and baseline options are constantly being tuned to be generally the best average case options. It should suffice in most cases to get the most juice across the broadest set of machines. But there will always be those who need to go the extra mile and need to know how they can get the most.


Selecting the Best Compiler Options
How to get the best performance from an UltraSPARC or x86/AMD64 (x64) processor running on the latest Solaris systems by compiling with the best set of compiler options and the latest compilers? Here are suggestions of things you should try, but before you release the final version of your program, you should understand exactly what you have asked the compiler to do.
Advanced Compiler Options for Performance

Users wanting the best performance from CPU-intensive codes may wish to explore the use of additional libraries and advanced compiler options that control individual compiler components.
Getting the Best AMD64 Performance With Sun Studio Compilers
Performance is a factor of both hardware and software. To extract the maximum performance from the new AMD-64 based systems on your critical C/C++ and Fortran applications, choose the best compilers. Then use compiler options to take advantage of the Opteron system features to maximize performance.
How I Got 15x Improvement Without Really Trying
A case study in program optimization.
Using Inline Templates to Improve Application Performance
Inline templates are a mechanism for directly inserting assembly code into an executable. Typically, this approach is used to obtain the best performance for a given function, or to implement an algorithm in a specific way.
Performance Tuning With Sun Studio Compilers and Inline Assembly Language
Here are examples of using a compiler flag or inline assembly language with Sun Studio compilers to increase the performance of C, C++, and Fortran programs.
Prefetching Pragmas and Intrinsics

Explicit data prefetching pragmas and intrinsics for the x86 platform and additional pragmas and intrinscs for the SPARC platform are now available in Sun Studio 12 compilers. Prefetch instructions can increase the speed of an application substantially by bringing data into cache so that it is available when the processor needs it. This benefits performance because today's processors are so fast that it is difficult to bring data into them quickly enough to keep them busy, even with hardware prefetching and multiple levels of data cache.
Using F95 Interfaces to Customize Access to the Sun Performance Library
When porting Fortran source, the Fortran 95 generic interface can be used to allow the source code to remain virtually unchanged and yet facilitate the use of the ILP-32, LP-64, and ILP-64 programming models.
Using VIS Instructions to Speed Up Key Routines
The VIS instruction set includes a number of instructions that can be used to handle several items of data at the same time. These are called SIMD (Single Instruction Multiple Data) instructions. The VIS instructions work on data held in floating point registers. The advantage of using VIS instructions is that an operation can be applied to different items of data in parallel; meaning that it takes the same time to compute eight 1 byte results as it does to calculate one 8-byte results. In theory this means that code that uses VIS instructions can be many times faster than code without them.
The Sun Studio Binary Optimizer

The Binary Optimizer is a static SPARC optimizer that accepts as input a binary and creates an optimized binary as the output. We define a binary as either an executable or a shared object. The availability of the original source code is not a pre-requisite for using this tool. It can optimize binaries irrespective of the source language used (C, C++ or FORTRAN). It can also optimize mixed source language binaries.

Sunday Sep 30, 2007

New OpenMP book published


Its nice to see a new book on OpenMP published by some of the experts in this area.
See here for details.
One of the authors, Ruud van der Paas, works very closely with my compiler teams and with customers, so this book is sure to have practical and down to earth suggestions. I havent read it yet, myself.
PS. For those not quite in the know,here's a simple description of OpenMP:
OpenMP is a set of APIs for SMP programming in C, C++ and Fortran. It consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior.
Jointly defined by a group of major computer vendors, OpenMP is a portable, scalable model that gives programmers a simple and flexible interface for developing parallel applications for platforms ranging from the desktop to the supercomputer.

Tuesday Aug 21, 2007

Sun Studio 12 patch performance improvements quantified for Core2Duo


My previous blog pointed at the first available Sun Studio patch. It mentioned that there were performance improvements for the Core2Duo architecture beyond what was in the released product.
This one quantifies the improvements we're seeing on SPECfp2000 (52%) and SPECfp2006 (34%). There are small improvements (0-3%) on SPECint2006  and 1-4% on SPECint2000 as well, but they are not as noteworthy. The system used here is a whitebox, 2 CPU, 4core 2.66GHz x5355 based system with 4GB memory. This run was a purely a comparative run and not done for a SPEC submission, so these arent official SPEC numbers (they are not numbers, anyway) but SPEC estimates, in that regard.

Benchmark: SPECfp 2000
%Change (over Studio12 FCS)
wupwise
30.3
swim
83.23
mgrid
39.10
applu
80.92
mesa
3.52
galgel
172.86
art
130.83
equake
14.05
facerec
2.44
ammp
65.74
lucas
13.86
fma3d
36.98
sixtrack
96.94
apsi
50.50
Overall Geo Mean
51.99

Benchmark: SPECfp 2006
%Change over Studio 12 FCS
410.bwaves
10.05
416.gamess
50.26
433.milc
12.99
434.zeusmp
50.34
435.gromacs
21.24
436.cactusADM
57.82
437.leslie3d
32.10
444.namd
62.20
447.dealII
9.00
450.soplex
4.58
453.povray
13.38
454.calculix
166.38
459.GemsFDTD
24.79
465.tonto
41.75
470.lbm
12.27
481.wrf
3.00
482.sphinx3
18.75
Overall Geo Mean
34.55

Now, you know, why I recommended that if you're using Sun Studio 12 for Woodcrest, Clovertown (Core2Duo) systems, then you MUST get the new patch.

Monday Jun 11, 2007

SunStudio 12 Compiler establishes World Record on Woodcrest chip!


Imagine That!
However, here it is. The latest submitted results for our Constellation Blade Server, now called Sun Blade Server 6000 system, makes it official.
The Dual-Core Intel Xeon 5160 Intel Blade Module of this Server  delivers World Record Performance on SPECint2006  of 21.0, which is higher than any announced benchmark for either the Core2Duo or Opteron chips.  These results beat even the Intel Compiler results for an Intel Chip!
New Sun Systems with SunStudio compilers and Solaris 10 on Intel chips are leading the way with performance. If ever there was vindication needed what Intel chips are capable of bringing to Sun systems and what Solaris is bringing to Intel to help expand the x86 marketplace, this is it!

Sun also announced World Record Performance  on the SPEC OMPM2001 benchmark for the Dual-socket dual-core AMD Opteron Model 2222SE based Opteron Blade Module of this server with a score of  13847 for 4-threads

Required Disclosure:
SPEC, SPECint and SPEComp registered trademarks of Standard Performance Evaluation Corporation. Results from www.spec.org as of 05/25/07. Sun's results were submitted for review.
Sun Blade 6250 (2xDual-core , 4 cores, 2 chips, 2 cores/chip, Solaris 10):  SunStudio12 SPECint2006 - 21.0
Sun Blade 6200 (2xDual-Core AMD Opteron Model 2222SE processors 4 cores, 2 chips, 2 cores/chip, Solaris 10):  SunStudio11. SPECompM2001 - 13847

Tuesday Jan 09, 2007

Sun announces new Blade Servers with World Record Performance


Sun today announced the fastest and the industry's only 4 socket dual-core 2.8GHz AMD Opteron 8000 series processor based blade server (announcement is here).
And it packs an enormous performance punch! Here are the latest World Record Benchmarks for SPEC CPU2006 rate and SPEC OMP2001, using Sun Studio 11 and Solaris 10 with these blades:
  • The blade server module posted an 8-thread SPECompM2001 score of 23224 on the medium size problem set. That score bested the latest competing 8 thread SPECompM2001 result of 19983, achieved by 4 cores/2 chips IBM System p5 550, by 16%.
  • Beating the competition to the punch, the Sun Blade X8420 server module equipped with the fastest available Dual-Core AMD Opteron(TM) processors posted  record x86 throughput scores on both the SPEC CPU2000 and SPEC CPU2006 suites (note: the SPEC CPU2006 benchmark, which supersedes the SPEC CPU2000 suite, provides a broader variety of workloads and better real-world applicability of the results and consists of two benchmark suites where one suite measures and compares compute-intensive integer performance and the other measures and compares floating point performance)  SPECint_rate2006 score of 93.1 and the SPECfp_rate2006 result of 87.3.
These numbers clearly set this blade server module apart from the competition at this stage, delivering compelling performance differentiation. Look at the disclosure below for competitive numbers. What makes these boxes even more interesting is the Sun Refresh Service (read here) which will keep your blade servers upto date in this fast changing marketplace (especially with the exciting upcoming speedbumps and quad-core processors from AMD and Intel).

Required Disclosure Statements:

SPEC, SPEComp, SPECCfp and SPECfp Rate are Registered Trademarks of Standard Performance Evaluation Sun's results were submitted for review. For SPEC comparisons, socket equates to chip.
Competitive results from
www.spec.org as of  Jan 05, 2007. Sun's results were submitted for review

Sun Blade X8420 (4xAMD Opteron model 8220, 8 cores, 4 chips, 2 cores/chip, 8 threads, Solaris10): SPECompM2001 –23224;
IBM System p5 550 (POWER5+, 4 cores, 2 chips, 2 cores/chip, 8 threads, AIX5L V5.3): SPECompM2001 – 19,983;
HP ProLiant DL585 (4xAMD Opteron model 880, 8 cores, 4 chips, 2 cores/chip, 8 threads, RedHat EL AS): SPECompM2001 – 17948

 

Sun Blade X8420 (4xAMD Opteron model 8220, 8 cores, 4 chips, 2 cores/chip, 8 threads, SLES 9): SPECint_rate2006 –93.1.
Sun Blade X8420 (4xAMD Opteron model 8220, 8 cores, 4 chips, 2 cores/chip, 8 threads, Solaris10): SPECfp_rate2006 – 87.3.


Friday Nov 17, 2006

Sun Studio powers new Opteron Workstation to record SPEC INTrate and SPEC FP

Sun recently introduced the so-called AM2 variant (the next-generation of AMD processors) of Sun Ultra40 Workstation. For details, see here. With this introduction, Sun also announced new World Records with this machine. I am particularly happy with this particular one (words from the product page, directly).
The Sun Ultra 40 M2 workstation, with two Dual-Core AMD Opteron model 2220SE processors, has reached a new milestone on the SPECint_rate2006 suite of the SPEC CPU2006 benchmark, by utilizing the most advanced features of Sun Studio 11 software and Solaris 10 OS.
Leading the x86 segment and surpassing competing workstations, the next-generation Sun Ultra 40 M2 workstation produced a SPECint_rate2006 result of 48.8.
For a while now Woodcrest had retaken the SPEC INT lead held previously by AMD's Opteron chips. The performance of Woodcrest on SPEC CPU2000 has been particularly spectacular. So it is particularly pleasing to see that with CPU2006 INTrate, Sun has been able to reclaim the World Record here for the dual-core AMD Opteron model 2220SE processors. The rate measure is particularly important as we move into the dual- and quad-core world for the x86/x64 architecture machines.
In addition, Sun Ultra 40 Workstation continues to claim World Record performance for SPEC CPU 2000 FP with best numbers (Peak) of 3545 and a 4-core FPrate of 121. This beats the Woodcrest based numbers, handily, by about 40+%
The following table shows these comparisons:
SPEC CPU2006 INTrate (ratios, higher is better)
System
Description
#Threads
INTrate
Sun Ultra 40 M2
AMD 2220SE (dual-core 2.8GHz), 2CPU
4
48.4

SuperMicroWoodcrest, Intel 5160 (dual-core 3.0GHz), 2 CPU
4
45.2
Dell Precision 380
Intel 3.73 GHz, Pentium Exteme Edition 965 4
23.1

SPEC CPU2000 FP(ratios, higher is better)
System
Description
#Threads
FPrate
Sun Ultra 40 M2
AMD 2220SE (dual-core 2.8GHz), 2CPU
2
121
Dell Precision 690
(Xeon 5160, 4cores, 2 chips, RHEL 4AS U3) 2
81.3

Required Disclosure Statements:

SPEC, SPECCfp and SPECfp Rate are Registered Trademarks of Standard Performance Evaluation Sun's results were submitted for review. For SPEC comparisons, socket equates to chip.
Competitive results from
www.spec.org as of Nov 17, 2006.

Sun Ultra 40 M2 (AMD Opteron model 2220SE, 4 cores, 2 chips): SPECINT_rate2006: 48.4
Dell Precision 690 (Xeon 5160, 4 cores, 2 chips, RHEL 4AS U3): SPECint_rate2006 - 45.2

Sun Ultra 40 M2 (AMD Opteron model 2220SE, 4 cores, 2 chips): SPECfp_rate2000: 121
Dell Precision 690 (Xeon 5160, 4 cores, 2 chips, RHEL 4AS U3): SPECfp_rate2000 - 81.3


Sun Ultra 40 M2 (AMDOpteron model 2220SE, 4 cores, 2 chips): SPECfp2000 – 3545
Dell Precision 690 (Xeon 5160, 4 cores, 2 chips, RHEL 4AS U3): SPECfp2000 - 2872


<script src="http://www.google-analytics.com/urchin.js" type="text/javascript"> </script> <script type="text/javascript"> _uacct = "UA-942279-1"; urchinTracker(); </script> <script type="text/javascript" language="javascript"> var sc_project=2049036; var sc_invisible=0; var sc_partition=18; var sc_security="f264f00e"; var sc_text=2; </script> <script type="text/javascript" language="javascript" src="http://www.statcounter.com/counter/counter.js"></script>

Wednesday Oct 18, 2006

Performance Comparison: Sun Studio vs GCC on STREAM Benchmark

I have previously described  the STREAM Benchmark and the results we were seeing with its OpenMP version and what we got by turning on Automatic Parallelization  in the compiler.
Here I'd like to put out comparative results with the GCC compiler

Function
Sun Studio 11(MB/s)
GCC4.1 (MB/s)
Copy
4658
2766
Scale
4614
2745
Add
4628
2970
Triad
4627
2969

This is roughly a 1.6x advantage with the Sun Studio compiler.
The comparisons were done on exactly the same box. The box was a SunFire V40z with 4 x 2.6GHz processors and PC3200 CL3 DDR SDRAM ECC Regd. memory.

The Optimization options used in these cases were:
Sun Studio: -fast -xarch=amd64a -xvector=simd -xprefetch -xprefetch_level=3
GCC 4.1: -O3 -funroll-all-loops -ffast-math -fpeephole -m64 -mtune=k8 -fprefetch-loop-arrays

Function
Sun Studio 11(MB/s)
Sun Studio 11(MB/s)
 4proc  Autopar
Copy
4658
18120
Scale
4614
18108
Add
4628
17758
Triad
4627
17626

For a 4CPU machine, this is roughly a 3.9x scalability, which is incredible!
Of course, the GCC compiler isnt able to exploit such scalability because it does neither Automatic Parallelization nor OpenMP at this time. (Its working on at least OpenMP support, so at least this discrepancy will be addressed in a future release.).

Tuesday Sep 26, 2006

Performance Comparison: SunStudio vs. GCC on BYTE benchmark (Nbench)


I am often asked about performance differences between Sun Studio Compilers and GCC. And whereas with performance, a single answer never works across the board, I am attempting to put out as much comparative information as I can, to show some of the differences (and hopefully advantages over GCC) as I can.
I have announced previous Sun Studio based SPEC World Record numbers in postings here (like this World Record SPECfp number and this mention of SPEC CPU2006 World Records ), and about STREAM (as in here), but these were not comparative numbers (vs GCC), so this is an attempt to fill that gap.
The first attempt is to take BYTE magazine's BYTEmark benchmark programs that are freely available at this location.
The benchmarks are designed to expose the capabilities of a system's CPU, FPU and memory system and were derived directly, without algorithmic change, from the BYTE web site.
The tests used here were ported to Linux and are actually run on a Solaris 10 system . The HW used here was a SunFire X4100 box with a dual-core 2.4GHz Opteron chip in a standard configuration.
In the following tables, the numbers are all Ratios index against a baseline of AMD K6/233 with 512KB L2-cache, gcc2.7.2.3 and libc-5.4.38 system.
Being ratios, Higher number is Better and so also in the Ratio's column, a ratio > 1 means SunStudio is better than GCC

Test
GCC4.1
SunStudio11
Ratio(SS11/GCC)
Numeric Sort
12.68
10.63
0.84
String Sort
16.07
20.93
1.30
Bitfield
15.15
16.15
1.06
FP Emulator
14.46
31.77
2.19
Fourier
11.69
25.55
2.19
Assignment
37.59
25.11
0.67
Idea
18.52
32.87
1.77
Huffman
15.10
17.02
1.13
Neural Net
24.87
33.98
1.37
Lu Decomposition
45.55
50.20
1.10
Memory Index
20.78
20.397 0.98
Integer Index
15.09
20.844
1.39
FP Index
23.772
35.190
1.48

Numeric Sort, FP Emulator, Idea and Huffman are part of Integer Index
String Sort, Bitfield and Assignment make up the Memory Index
The other tests are part of FP Index

Flags used for each were:
Sun Studio11: -fast -xarch=amd64
GCC-O3 -s -Wall -fomit-frame-pointer -funroll-loops


Wednesday Aug 30, 2006

SunStudio Portal Article: Performance Analyzer


Two new articles describe how to use the Sun Studio Performance Tools to profile Java applications, and WebLogic servers.
Profiling Java Applications with Sun Studio Performance Tools describes the challenge of profile Java applications either pure Java or mixed Java/C/C++, which need to run as a process instantiating the Java Virtual Machine (JVM), which is itself a C++ program.
Profiling WebLogic Servers with Sun Studio Performance Tools describes how to profile servers being run under BEA's WebLogic® system. A server run under BEA's WebLogic is a Java application that you launch by running a script to invoke the JVM.
This is the only profiling tool I know of that does C, C++, Fortran, Java, Weblogic servers, OpenMP, MPI, Pthreads, Auto-parallelized code well both on Solaris as well as Linux(es). Both in GUI and command-line. A true SunStudio Gem!

Friday Aug 25, 2006

Thumper (SunFire X4500) sets World Record in SPECfp


Not just another pretty face, Thumper, aka Sun Fire X4500 has scored its first World Record SPEC performance win!
Thumper, or as its known by its marketing name: Sun Fire X4500, is a terrific combination of storage (24TB) along with high performance 2-socket dual-core (and quad-core ready, BTW) Opteron server. The kind that prompts you to say I want one of those the moment you set your eyes upon it.
World Record SPECfp_rate2000 for all 2-socket x86 systems.
The message is clear: (in the words of our Performance Lead) The result puts Thumper clearly at the top of floating point CPU horsepower for all 2-socket x86 systems. Thumper is not just a lot of storage, but the fastest server in its class all at the same time.
Clearly its the rackmountable version of All This (beautiful) Storage and Brains Too!
You can find much more information about Thumper, aka Sun Fire X4500 here.

Required Disclosure Statements:

Sun Fire X4500 103 SPECfp 2000 Rate (4 cores, 2 chip, Solaris 10)

SPEC, SPECfp Rate are Registered Trademarks of Standard Performance Evaluation Corporation. Results from www.spec.org as of August 24, 2006.

About

I have worked with Sun and Oracle for 25 years now; in compilers and tools organization for most of these years followed by a couple of years in Cloud Computing. I am now in ISV Engineering, where our primary task is to improve synergy between Oracle Sun Systems and our rich ISV ecosystem

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
Interesting Links