ZFS write throttle observations

The new ZFS write throttle feature, which integrated in Nevada build 87, specifically addresses write intensive workloads. Today, we take a closer look at the write throttle in action. Our test system is a Sun Fire X4500 running Nevada build 94 with a single ZFS pool of 42 striped disks.

blog@x4500> zpool list

NAME   SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT

h     19.0T   620K  19.0T     0%  ONLINE  -
  

The zfs_write_throttle.d DTrace script is used to observe the write throttle. In a first test, we start generating write I/O load using a couple of “dd if=/dev/zero of=/h/<file> bs=1024k” commands. Here's an extract of the script output:

--- 2008 Jul 28 14:04:17

                                                      Sync rate (/s)

  h                                                                1



                                                                MB/s

  h                                                             1540



                                                            Delays/s

  h                                                               47





  h                                                   Sync time (ms)                                    

           value  ------------- Distribution ------------- count    

              80 |                                         0        

             100 |@@@@@@@@@@@                              3        

             120 |@@@@                                     1        

...snip...
             260 |@@@@                                     1        

...snip...
             580 |@@@@                                     1        

...snip...
             780 |@@@@                                     1        

...snip...
            1320 |@@@@@@@                                  2        

            1340 |                                         0        

            1360 |@@@@                                     1        

...snip...
            1520 |@@@@                                     1        

            1540 |                                         0        





  h                                                   Written (MB)                                      

           value  ------------- Distribution ------------- count    

           < 200 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@        9        

...snip...
            3000 |@@@@                                     1        

...snip...
         >= 4000 |@@@@                                     1        





  h                                                   Write limit (MB)                                  

           value  ------------- Distribution ------------- count    

            7750 |                                         0        

         >= 8000 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 11       


    

The output has been shortened for clarity. With the default settings in place, one can observe that the average time for synchronizing data to disks takes well over a second [range 100 ms to 1540 ms] (please refer to Sync time distribution).

In a second test, we are reducing the target time for synchronizing data on disk from five seconds (default) to one second (using the zfs_txg_synctime variable). Here's again an extract of the script output:

--- 2008 Jul 28 14:08:27

                                                      Sync rate (/s)

  h                                                                1



                                                                MB/s

  h                                                             1681



                                                            Delays/s

  h                                                               56





  h                                                   Sync time (ms)                                    

           value  ------------- Distribution ------------- count    

             340 |                                         0        

             360 |@@@                                      1        

...snip...
             460 |@@@                                      1        

             480 |                                         0        

             500 |@@@                                      1        

...snip...
             600 |@@@                                      1        

...snip...
             660 |@@@                                      1        

...snip...
             740 |@@@                                      1        

             760 |@@@                                      1        

             780 |@@@                                      1        

             800 |                                         0        

             820 |@@@                                      1        

             840 |@@@@@@                                   2        

             860 |@@@@@@                                   2        

...snip...
            1040 |@@@                                      1        

            1060 |                                         0        





  h                                                   Written (MB)                                      

           value  ------------- Distribution ------------- count    

           < 200 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@            10       

...snip...
            2400 |@@@                                      1        

            2600 |                                         0        

            2800 |@@@                                      1        

...snip...
         >= 4000 |@@@@@@                                   2        





  h                                                   Write limit (MB)                                  

           value  ------------- Distribution ------------- count    

            2500 |                                         0        

            2750 |@@@@@@                                   2        

...snip...
            4750 |@@@                                      1        

            5000 |@@@@@@                                   2        

            5250 |                                         0        

            5500 |@@@@@@@@@@@                              4        

...snip...
            6500 |@@@                                      1        

            6750 |                                         0        

            7000 |@@@@@@@@@                                3        

...snip...
         >= 8000 |@@@                                      1        



    


Two things can be seen when comparing with the first test:

a) the average time for synchronizing data to disks has gone down [range 360 ms to 1060 ms].

b) the pool “write limit” mark did move around over time (please refer to Write limit distribution), thus dynamically throttling the incoming application write rate to the available I/O bandwidth.

More parameters are available for tuning (please see the source code), but as usual, use them with caution. To wrap-up, here's one last output extract where the parameter zfs_write_limit_override was set to 800 MB. In setting this parameter, we are enforcing the write limit to the value specified. This can be beneficial for applications that generate a continuous well paced write stream but are sensitive to write delays.

--- 2008 Jul 28 14:54:49

                                                      Sync rate (/s)

  h                                                                4



                                                                MB/s

  h                                                              677



                                                            Delays/s

  h                                                                1





  h                                                   Sync time (ms)                                    

           value  ------------- Distribution ------------- count    

             120 |                                         0        

             140 |@@@@@@                                   6        

             160 |@@@@@@@@@@@@@@@                          15       

             180 |@@@@@@@@@@@@@@@                          15       

             200 |@@@@@                                    5        

             220 |                                         0        





  h                                                   Written (MB)                                      

           value  ------------- Distribution ------------- count    

           < 200 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@           31       

             200 |                                         0        

             400 |@@@@@                                    5        

             600 |                                         0        

             800 |@@@@@                                    5        

            1000 |                                         0        





  h                                                   Write limit (MB)                                  

           value  ------------- Distribution ------------- count    

            1250 |                                         0        

            1500 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 41       

            1750 |                                         0        

Hopefully, you have enjoyed these little observations!


 
    


Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

Application tuning, sizing, monitoring, porting on Solaris 11

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
5
6
8
9
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today