• October 9, 2011

The SPARC T4 servers are here!

Ladies and gentlemen, it is an honour for us to announce: The SPARC T4 processors and the servers built around them are released.

Before we'd dive into the features, a short summary of our SPARC platform: 

The M-Series are designed with Mainframe-class RAS features (Reliability, Availability, Serviceability). They are based on the Sparc64-VII+ CPUs, excelling at single threaded performance. 
The T-Series are the CoolThread servers, with the CMT (chipmultithreading) design, they are designed to run heavily parallel workloads, concentrating on throughput, running up to 512 threads actively at the same time, if desired. 

The latter category just got a brand new update, let's see, what makes the T4 special: 

Just like Solaris 11 was greatly influenced by the community's requests, the T4 has been designed based on listening to customer feedback about previous (T2+, T3) processors: 

  • "The throughput of the T3 is awesome, but for some workloads a higher single thread performance is desired" 
  • "ChipMultiThreading with many cores times many threads is nice, but how about a higher clockfrequency?" 
  • "Make sure to keep/enhance the System-on-a-Chip advantages, like crypto-HW, OnChip 10G NIC and PCIe"

Engineering has added: 

  • "L2 cache is great, but you know what made the UltraSparc VI+ that powerful? L3 cache."
  • "Stick to  SPARC V9 and the well-received CMT model"
  • "Let's move the cryptoaccelerators from coprocessors into the chip itself and drive cryptography within the pipeline for crypto performance"
  • "Using out-of-order execution again would bring performance enhancements too" 

Management requested:

  • "Stick to high parallellity, high throughput and power management features."
  • "Reuse successful concepts, release time is critical for this product"
  • "If using the 28nm technology means delays, then stick to 40nm for now, and use 28nm for the next CPU."

Engineering has been working for a while on the new S3 cores, that have replaced previous generetion cores. The S3 core fulfills all the requirements, and more. It has a shared 4MB L3$, and a private L2$ on each core. It has out-of-order execution, which allows it to run non-dependent instructions parallelly out of call-order, proactively. With 8 next-gen S3 cores the T4 still matches the throughput of a T3 (which had 16 cores), and still brings a 5x higher single thread performance compared to T3. (By the way - a 3x higher single thread performance was planned, but engineering excelled themselves once again). These features make the T4 to a much more general purpose processor. 

Now, where can one use single threaded performance at this levels? For example on databases. Or, LDOM live migration times have been significantly improved as well with the T4. Or at any other application that aren't designed with purely parallel workload in mind. Let's see a graphical representation: 

SPARC T4 single thread performance

A very important feature is what we call critical thread:

  • It is the capability of the CPU to provide access to a complete core for a running thread. Your applications throughput may be limited by the serial portions of your workload. Setting the priorities of those sequentially working threads high raises the attention of the scheduler, and it can decide to dedicate a complete core to a thread instead of running 8 threads parallelly on that core. A noteworthy feature, especially for all this happens dynamically, on-the-fly. If your application needs throughput, it gets 8 threads. If it needs single threaded performance, it gets a complete core. This is a very seemless process, without the administrator having to switch CPU-modes. 

Allow us to mention some - in our opinion - very much undermarketed features of the T series: 

  • OnChip 10G NICs. This actually means that you have two 10Gbps network interfaces sitting directly on the CPU. Data does not need to fight its way through the PCI bus labyrinth, you do not need to use additional NICs with network logic built into them, all you need is an XAUI converter to lead the onchip-nics to the outer world. This way you have very low latency. 
  • Oracle VM for SPARC (former LDOMS): LDOMs are supported as a matter of course, customers can continue to partition these servers and run several different Solaris instances (even Solaris 10 and 11 mixed) next to eachother, separated by the hypervisor running in the firmware. Without a price tag!
  • Powermanagement: The T4 can do cycle skipping, just like in T3, to lower power consumption. 
  • Encryption: Solaris supports an extensive set of cryptographic algorithms, and the T4 can provide encryption services implemented in hardware, not having to software compute those. On-the-fly, no-cost, NSA-approved encryption. 

 An encryption performance comparison between T3 and T4: 

T4 crypto performance over T3

The T4 is supported in Solaris 10 u10 (u8 and u9 with patches) and in Solaris 11. Oracle applications on the SuperCluster
will be able to benefit from the critical thread capability too. We think this is actually a prime example of Hardware and Software engineered to work together, or even Hardware-, OS- and even Applications engineered to work together. 

Just like Solaris 11, the T4 CPU was designed very much with supporting Oracle software in mind, providing features those can benefit from. Oracle has the very unique situation of being able to not only to tune the software elements to the requirements, but also the operating system and the hardware it is running on. This is the key differentiator of the Oracle products. Integration throughout the whole stack. 

Oracle has released 1-, 2- and 4 Socket servers, the T4-1, the T4-2 and the T4-4. These have been delivered as per the SPARC Roadmap, which Oracle has committed to not to deliver later than defined. This is another Oracle-specific feat - no other CPU vendor declares committment publicly 5 years in advance. 

And now, the time for consolidation has arrived. One T4-2 can replace several old V-servers with UltraSparcIII or IV, not to mention the T1-T2 servers. Of course it makes also a lot of sense to bring your Oracle SW back from other HW and OS to the OS and the Systems they run best on - Oracle on Oracle. Ask your local HW presales representative for upgrade paths.

Join the discussion

Comments ( 2 )
  • Eduardo Tuesday, November 12, 2013

    We are running a gateway proxy application on Solaris 11.1 for mobile internet on a telco. We are using 8 t4 servers and are processing 2000 TPS during peak times. During peak times the servers lag sometimes. We are using processes and not threads, and we're still not sure if it's a network parameter issue or some other type of constraint. Is there a difference in performance using threads vs processes? prstat and mpstat don't show significant cpu usage yet they can't process more.

  • charlie Tuesday, November 12, 2013

    This will be kinda hard to give you meaningful advices so far remotely.

    But here are couple of questions to trigger some thoughts:

    - What exactly do you mean that the servers lag sometimes?

    - if your workload isn't CPU-bound, have you measured throughput over the

    network? Can your network cope with the load?

    - Is this application disk-I/O bound? Can your storage cope?

    Solaris doesn't schedule processes, but threads. If your application is

    multithreaded (see the output of "ps -ef" compared to "ps -efL") it will benefit heavily from the multithreaded T4 CPUs. If you need single thread performance, Solaris should just do the right thing and run those threads alone on a core (instead of running 8 per core sharing resources), but you might need to help the scheduling decisions of Solaris with the Critical Thread capability.

    - How is your memory utilisation?

    - What HW is this exactly?

    - Do you use LDoms?

    - What is the I/O configuration?

    All in all, there are so many factors to consider, it is really the best if you contact Oracle Support or Oracle Consulting and ask for a performance analysis, provide them an explorer and a GUDS output latter created during the high loads) and request performance improvement recommendations.

Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.