Memory page coalescing update and Solaris 10


Well, some of you may remember my december technical brief
talking about the Solaris memory page coalescing on high-end servers.
If you don't know what I'm talking about feel free to to send me an email at
benoit@sun.com and I will give you the link.

Since this technical brief, I have received a lot of request on this topic
that I would like to answer today :

-> Request 1 : What is the list of issues linked to this one ?

Here is the list with bugIds so you have a complete picture :

4802594 - Idle loop degrades IO performance on large psets
5059920 - Idle loop is not scalable on large systems
5054052 - disp_getwork() is greedy and negatively impacts dispatch latency
5050686 - Solaris mutexes should be made more efficient under contention
5095432 - Oracle startup takes too long due to memory fragmentation
5046939 - kcage_freemem grows too large when large ISM segments assigned on SF15k
4904187 - page_freelist_coalesce() holds the page freelist locks for too long



--> Request 2 : You mentioned that the fixes for this issues are available in
IDRs for Solaris 8 and Solaris 9. What is the status of the patches ?

Good news here. Solaris 8 patches have just been released. The fixes for this issues
are available in the following kernel update patches (KUP) :

        Solaris 8        Solaris 9
Sparc        117350-23        117171-17
x86        117351            117172
            
Now, you are ready to ask me : what about Solaris 10 ?
The answer for Solaris 10 (and nevada) is : in progress....
I tested last month some of this issues on Solaris 10 and while the
problems are still there (the page_freelist_coalesce() routine is in the common Solaris
code), the impact is much,much lower. As an example, the 10G Oracle startup testcase we
built took 50s on a normal system. With no 4M pages available , it took up to 2 hours on
Solaris 8, up to 15 minutes on Solaris 9 and up to 3 minutes on Solaris 10.

--> Request 3 : It is very complicated to get a memory picture on our system.
    vmstat or sar data are not detailed enough. Can you help ?

No need here for complex packaged tools. The best kept secret of Solaris is
the numerous options of mdb. So if you write a little script like :

#!/bin/ksh
#
# Displaying the memory map...
#
echo Browsing memory...
echo
mdb -k 2>/dev/null <<!
::memstat
!
echo
date

You will get this output :

    Browsing memory...

    Page Summary                Pages                MB  %Tot
    ------------     ----------------  ----------------  ----
    Kernel                      36480               285    1%
    Anon                        12891               100    0%
    Exec and libs                5106                39    0%
    Page cache                 208799              1631    5%
    Free (cachelist)           139913              1093    3%
    Free (freelist)           3780231             29533   90%

    Total                     4183420             32682
    Physical                  4116397             32159

    Fri Apr  1 10:18:16 PST 2005


Cool !

--> Request 4 : Solaris 10 provide updated memory structures and the page freelist is now available. Can we use it to get the amount of free 4M pages ?
    
This request came last week from the VOS escalation team. And the answer is : yes but it requires a close look at how the page_freelists is implemented to get the right number.
We worked on this question with my good friend Mike C. in December and here is
the updated script for Solaris 10 (yes, mdb again) :

#!/bin/ksh
#
# Walking the page_freelist in Solaris 10 to get the amount of 4M pages...
#
mdb -k 1>/tmp/1 2>&1 <<!
page_freelists+30::array uintptr_t 1 | \\
::print uintptr_t | ::array uintptr_t 0t18 | \\
::print uintptr_t | ::array uintptr_t 0t2 | \\
::print uintptr_t | ::grep ".!=0" | ::list page_t p_vpnext
!
cat /tmp/1 |grep -v failed|wc -l

And on my v490, I have :

v490 # ./4m_s10.sh
    5168

That's it for now...
Comments:

Why no line-break with your blog? Is it just your style or some kind of posting issue? I am interested in the memory structure and freelist script using mdb but couldn't understand "echo mdb -k 2>/dev/null < Request 4" in your post. Thanks.

Posted by Tao on April 01, 2005 at 03:11 AM PST #

Thanks Tao. It is fixed. Let me know how it goes. benoit

Posted by MrBenchmark on April 01, 2005 at 03:24 AM PST #

Looks good now, thank you!

Posted by Tao on April 01, 2005 at 04:29 AM PST #

Nice post. Now how does one determine what is chewing up kernel memory? I have a Solaris 9 SB1000 with 1GB of ram and the kernel size is 220MB. Thanks

Posted by Lyle on April 03, 2005 at 11:48 PM PDT #

Regarding the "::memstat" collection with a script suggested in the Technocrat (Issue 32): The script was equivalent to collecting echo ::memstat | mdb -k Unfortunately, on a system with a large memory, the above may run 25+ minutes (done such a test on a 15k with 288G memory). In other words it is "Cool" but not always. :( -zenon

Posted by zenon on April 07, 2005 at 08:20 AM PDT #

Thanks for fixed

Posted by Egitim on December 09, 2010 at 10:55 PM PST #

Post a Comment:
  • HTML Syntax: NOT allowed
About

mrbenchmark

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today
News
Blogroll
deepdive

No bookmarks in folder