SUPerG conference - Paper and Presentation

I am going to be presenting at the spring SUPerG next week 4/21 - 4/23. IMHO, this is the best conference Sun offers - A great forum for techies to meet techies. Since I typically, update my presentation a few times before the conference, I thought it best to post LAG copies.

Hope to see you there,
Glenn

Demystifying High-End Sun Fire Behavior
when Scaling Database Applications

by Glenn Fawcett and Marcus Heckel
Paper rev1.02 updated 4/19/05 after blog comments!!
Presentation rev1.03 updated 4/21/05 prior to pres :)

This paper attempts to explain the difference in the Sun Fire Server line. After examining the inherent differences, scaling topics will be discussed. The effects of large user, memory, and cpu count will be discussed along with Tips on how to best manage and scale applications.

Comments:

Thank you much for the preview. In reading the paper I have run across a few questions and a couple typos.

The questions...

  • When a process exits, does it do a cross-call to all processors or just the processors in its processor set?
  • What part (SB/Expander/etc) does the AXQ 6.2 ASIC reside on? How do I tell which version I am running?
  • Solaris 9 kernel patch 117171-16 adds what appears to be several performance improvements around the idle loop (bugs 5059920, 4802594) and the dispatcher (bug 5054052). What amount of performance improvement has been seen on 40+ core systems?

And the typos...

  • Page 10 'Insert "interrupts=1;" > ce.conf under' should probably have syntax more like the other examples.
  • Page 12, paragraph just before section 14 ends with "A".

Again, thanks for the preview of your paper. I look forward to seeing you present it.

Posted by Mike Gerdts on April 11, 2005 at 10:39 AM PDT #

Your slides show a Sun Fire 3900 server (slides 3 and 5). There was not a model 3900 server, as the Sun Fire 3800 was EOLed, and a US-IV model of the 3800 was not produced.

Posted by Mark on April 11, 2005 at 11:39 AM PDT #

In the paper - section 13 "Database Design/Layout" - is there a sentence missing at the end ? The last paragraph ends "so as to avoid block contention. A" Thanks, Alex

Posted by Alex Madden on April 11, 2005 at 06:48 PM PDT #

Thank you for your edits and questions. I will anwser questions here and update the paper/presentation to include any edits or clarifications.

(Q1) When a process exits, does it do a cross-call to all processors or just the processors in its processor set?

Simply, it only results in a cross-call on processors where the process has run. So, this would, most likely result in all processors in a processor set, receving cross-call on process exit.

Now of course it is not \*really\* that simple. The xcalls are really caused by the breakdown or unmapping of an address space. This is most commonly done at process exit which is why it is referenced. After discussing this with kernel engineering, and even deeper understanding.

The umapping of pages on exit, commonly referred to as "tlb shootdown" doesn't occur in US-III and later CPUs. Instead, the context is put on the "dirty" list, and when the "free" list of process contexts is empty, a batch TLB flush occurs and move all the "dirty" contexts are moved to the "free" list for reuse by new processes.

To further drill down, you can use dtrace on Solaris 10 to find out exactly where the xcalls are occuring on your system.

 dtrace -n 'xcalls{@a[stack(20)]=count()}' &

(Q2) What part (SB/Expander/etc) does the AXQ 6.2 ASIC reside on? How do I tell which version I am running?

Officially, you should contact your service representative to find out this information since it requires an unsupported command "redx". WARNING, this command can bring down your system if not used properly.

sumocatsc1:sms-svc:20> redx -x shaxq 9 0 | grep Rev
AXQ  EX9 (9)   Component ID = E4312049   Rev 6.2

For further information on configuration best pratices look at the following blueprint.

(Q3) Solaris 9 kernel patch 117171-16 adds what appears to be several performance improvements around the idle loop (bugs 5059920, 4802594) and the dispatcher (bug 5054052). What amount of performance improvement has been seen on 40+ core systems?

The actual amount of improvement varies quite a bit based on the application. I have seen anywhere from 0-20% improvement. IO intensive workloads \*tend\* to benefit more from this kernel patch.

=====
EDITS
=====
Page 10 'Insert "interrupts=1;" > ce.conf under' should probably have syntax more like the other examples. -- Thanks.

Page 12, paragraph just before section 14 ends with "A". -- Thanks... This was just a Typo, no more sentences.

Yes... There is no such thing as a E3900.... typo.

Posted by Glenn Fawcett on April 13, 2005 at 10:23 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

This blog discusses performance topics as running on Sun servers. The main focus is in database performance and architecture but other topics can and will creep in.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
News

No bookmarks in folder

Blogroll

No bookmarks in folder