Friday Jan 04, 2008

bufhwm on large systems

I was asked yesterday to look at a busy system with high system time. Its Solaris 9 on a big config 25K. This output was the top of the lockstat -C -s 50 output.

-------------------------------------------------------------------------------
Count indv cuml rcnt     spin Lock                   Hottest Caller          
132614  59%  59% 1.00      199 blist_lock[8]          bio_recycle+0x224       
     spin ------ Time Distribution ------ count     Stack  
            1 |                               186       bio_recycle+0x224              
            2 |                               2335      bio_getfreeblk+0x4
            4 |                               4247      getblk_common+0x2bc
            8 |@                              7190      bread_common+0x80
           16 |@@                             11570     bmap_read+0x20c
           32 |@@@@                           18285     ufs_directio_read+0x2e
           64 |@@@@@                          25634     rdip+0x198
          128 |@@@@@@                         28613     ufs_read+0x17c 
          256 |@@@@@                          22918     pread+0x28c   
          512 |@@                             9707
         1024 |                               1761 
         2048 |                               157   
         4096 |                               11        

A bit of Solaris code reading lead me from the stack above to question the value of bufhwm. I checked it out again on docs.sun.com to really understand what this value does. Its the high water mark in K of the size of allocated buffers used for UFS indirect blocks, directories and other bits of metadata.

I went back to check some basic assumptions(always a good plan) and did an Explorer review. The following line is set in /etc/system :

set bufhwm=8000

I have no idea why it was set to 8000 on this system. I have seen it set many times on many systems and have not paid much attention on this and other systems. 8000 is proposed in many places as a reasonable value. I must admit I have never needed to suggest this value is tuned and my unconcious just assumed that it was just a good idea because common wisdom said so and never made a comment when other people tuned it.

By default this value would be 2% of memory. So this system had > 200Gb which would default to around 4GB. I expect 4gb would waste some memory, but then its a high water mark. 8mb is far too small on this size of server give that the buffer cache is used to store indirect blocks, directories, etc from a set of filesystems near 2 TB!

We can observe if buffer recycling is causing an issue using the following

echo "bfreelist$ buf" | mdb -k
echo "v::print -t struct var" | mdb -k
kstat -p -n biostats

and sar -b might also give some insight.

So the morals to repeat to myself include

  • Turn off your unconcious mind when examining /etc/system. Don't assume any /etc/system setting is valid
  • Never carry /etc/system tunables forward
  • Put a comment in /etc/system if you set a value based on a attribute like memory size with has a potential to change citing the assumption.

Various customers who I have visited over the years comments in the form in /etc/system

# clive.king@sun.com 4/1/2008 
# bufhwm value of 8000 assumes a memory size of 4gb and 600GB of UFS filesystem. revisit if size changes
# Check with kstat -p -n biostats before changing
set bufhwm=8000

At least if something goes wrong, then I can be emailed in capital letters.

About

clive

Search

Categories
Archives
« April 2014
MonTueWedThuFriSatSun
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today