Jumbo Frames with Oracle RAC really does Rock!!

I have been involved in a customer situation on and off for at least 6 months now. The customer had been seeing performance issues with their application running on Oracle 10g RAC. We looked through the mounds of data initially and noticed that they were indeed waiting quite a bit on Global cache buffer waits. This was during times of fairly heavy load and we could see the CPU was fairly busy with interrupts as well. After looking at the MTU size for the cluster interconnect, we noticed that it was incorrectly set to the default (1500). Thus started the odyssey to implement Jumbo Frames.

The default MTU is 1500 for Solaris, but this is not ideal when Oracle is using an 8K block size. Simple math tells us that you will require 6 transfers to transmit just one block of data across the cluster interconnect. This just creates additional overhead on the server and additional latency waiting for global blocks to be transferred. Changing the MTU to be a "jumbo frame" of 8K or greater is fairly simple from a technical point of view, but it can quickly turn into a political issue.

The cluster interconnect is often relegated to be the responsibility of the networking group. No problem right? While this is a network component, it is really part of the server - no different really from a PCI bus or processor back plane. The networking groups will often apply their tried and true methods for LANs around the company, but this doesn't translate to RAC. Modern network switches can easily handle this configuration change as well, but policy often wins. The networking group assures everyone their switch can handle the traffic with the default MTU and everyone goes on their merry way.

So, what happened?

After months looking at "other things", they finally were convinced to try this "Best Practice" with Jumbo Frames. Immediately, they saw:

  • 50% reduction in CPU overhead
  • 75% reduction in Global cache buffer waits
  • IP Reassemblies dropped by 10x

Moral of the story: Implement Jumbo Frames for Oracle RAC interconnects... It is a best practice after all :)

Comments:

Thank You for your idea.

That's a great practice with Jumbo Frames.
I'm so exciting when i saw sun's blog with oracle rac ;)

Posted by Surachart Opun on September 25, 2009 at 06:58 PM PDT #

Great. Now, can you be so kind as letting the Oracle consulting morons know that tuning the frame size is a common and desirable practice and there is nothing wrong with it?

Because the ones here always come up with the "that's not needed anymore, these days we have 1Gbps network connections!" total, utter nonsense.

Thanks!

Posted by Noons on September 27, 2009 at 12:12 PM PDT #

Very fair note. While I have no doubt that jumbo frames is the way to go (just did configuration review for one RAC system last week and flagged that out), minimizing interconnect traffic is even more productive by ensuring that fewer blocks need to be sent across interconnect - application design, SQL tuning and etc - yeah call me idealist. :)

A note for readers if I may - don't read this blog post that way - "enable jumbo frames on your interconnect and you won't have any problems with your RAC performance".

Glenn, what are the "other things" people were looking at?

Posted by Alex Gorbachev on September 27, 2009 at 03:02 PM PDT #

Was that using UDP or RDS?

Posted by Greg Rahn on September 28, 2009 at 12:54 AM PDT #

Alex,

I agree that you must minimize interconnect traffic. The application is always the place to tune, but you should have best practices like Jumbo Frames in place as well. Let's face it at the end of the day after all the tuning, there are still some exceptions that will bite you as did this customer. As for the "other things", they were diversions that avoided following the data. Not worth mentioning.

Greg,

This configuration used simple 1Gb interconnects over UDP. Most customers we see use this configuration. I would think they would have better luck with IB/RDS. I certainly hope they would not have to deal with the networking group for the IB switch.

Posted by Glenn Fawcett on September 28, 2009 at 01:48 AM PDT #

What kind of hardware are you using, and how much network traffic is involved?

Doug

Posted by guest on October 12, 2009 at 03:36 AM PDT #

Doug,

They were Sun T5220's, part of the CMT line, but the advice is the same on all SPARC and x64 Linux and Solaris. Really light weigh apps with good partitioning won't suffer much but as soon as you turn up the heat and start making remote cache references, things get ugly.

Posted by Glenn Fawcett on October 25, 2009 at 04:01 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

This blog discusses performance topics as running on Sun servers. The main focus is in database performance and architecture but other topics can and will creep in.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
News

No bookmarks in folder

Blogroll

No bookmarks in folder