slog blog (or blogging on slogging)

slog_blog I've been slogging for a while on support for separate intent logs (slogs) for ZFS. Without slogs, an intent log is allocated dynamically from the main pool. It consists of a chain of varying block sizes which are anchored in fixed objects. Specifying a separate log device enables the use of limited capacity but fast block devices such NVRAM  and Solid State Drives (SSDs).

Using chained logs (clogs?) can also lead to pool fragmentation. This is because log blocks are allocated and then freed as soon as the pool transaction group has committed. So we get a swiss cheesing effect.


Interface

        zpool create <pool> <pool devices> log <log devices>

Creates a pool with separate intent log device(s). If more than one log device is specified then writes are load-balanced between devices. It's also possible to mirror log devices. For example a log consisting of two sets of two mirrors could be created thus:

                zpool create whirl <pool devices> \\
log mirror c1t8d0 c1t9d0 \\
mirror c1t10d0 c1t11d0

        zpool add <pool> log <log devices>

Creates a log device if it doesn't exist, or adds extra log devices if it does.

        zpool replace <pool> <old device> <new device>

Replace old log device with new log device.

        zpool attach <pool> <log device> <new log device>

Attaches a new log device to an existing log device. If the existing device is not a mirror then a 2 way mirror is created. If device is part of a two-way log mirror, attaching new_device creates a three-way log mirror, and so on.

        zpool detach pool <log device>

Detaches a log device from a mirror.

        zpool status

Additionally displays the log devices

        zpool iostat

Additionally shows IO statistics for log devices.

When a slog is full or if a non mirrored log device fails then ZFS will start using chained logs within the main pool.

Performance

The performance of databases and NFS is dictated by the latency of making data stable. They need to be assured that their transactions are not lost on power or system failure. So they are heavily dependent on the speed of the intent log devices.
Here's some database performance testing results:

  • Test program creates 32 threads and each does 8K O_DSYNC writes randomly to a 400MB byte file.
  • Test hardware was a Sun X4500 (aka thumper) with 48 x 500GB disks.
  • The NVRAM is the battery backed pci Micro Memory pci1332,5425 card.
  • Table values are MB/s

Main pool disks

1
2
4
8
16
32
0 slogs
11
14
17
15
16
13
1 slog
12
12
12
12
12
11
2 slogs
17
17
17
19
19
16
4 slogs
17
16
15
15
16
16
8 slogs
18
19
20
18
16
18
NVRAM
221
221
218
217
215
217


I also ran the same without write disk cache flushing
(echo zfs_nocacheflush/W 1 | mdb -kw)
Note, this should not be done on a real system unless the device cache is non-volatile.


Main pool disks

1
2
4
8
16
32
0 slogs
33
83
123
136
142
143
1 slog
45
46
44
45
45
46
2 slogs
97
99
90
94
94
95
4 slogs
124
125
127
124
127
127
8 slogs
135
137
134
138
138
138
NVRAM
225
220
226
226
226
227

Note, these tables can be a bit mis-leading.  If you had 2 disks you'd have a choice of 2 main pool device  or 1 slog and 1 main pool device. So looking at the table  you should compare the following entries:
  • 2 main pool: 83MB/s
  • 1 slog, 1 main pool: 45MB/s
The first table highlights some issues of scaling which will be investigated further.

Perf summary

For this micro-benchmark and from limited other perf testing it makes sense to only use fast devices for the slog. However, there may be some cases where using regular disks as slog disks is faster than putting the same disks in the main pool.

Status/Bugs

This support was recently putback into Solaris Nevada build snv_68. Here's a list of slog bugs - fixed and to be fixed.

6574298  "slog still uses main pool for dmu_sync()"   - now fixed in snv_69
6574286 "removing a slog doesn't work" - now fixed
6575965 "panic/thread=2a1016b5ca0: BAD TRAP: type=9 ...:" - panic when no main pool devices present - now fixed in snv_83


Comments:

Great! Now will Sun be selling that NVRAM card? Do you have any details on the card? Also what happens if NVRAM wile break while ZFS is mounted? Any data will be lost? System will panic?

Posted by Robert Milkowski on July 18, 2007 at 05:52 AM PDT #

Extremely exciting benchmarks. I have all the same questions Robert does.

Posted by benr on July 18, 2007 at 06:30 AM PDT #

I don't expect sun to sell that NVRAM card. I only gave details of it to be complete in my perf results. I've just that the card is only available for OEMs. I hope the results are representative of what may be achieved with other NVRAM devices.

If there's a slog device problem (failure or full) then ZFS will revert to using main pool devices.

Posted by Neil Perrin on July 18, 2007 at 06:34 AM PDT #

How about using the Gigabyte i-Ram device for the intent log? http://www.gigabyte.com.tw/Products/Storage/Products_Overview.aspx?ProductID=2180&ProductName=GC-RAMDISK

Posted by Florin on July 19, 2007 at 05:33 AM PDT #

[Trackback] Neil gives an introduction into the usage of the new "seperate zil" feature (dubbed "slog" for "seperate log"), that was introduced with build 68 of Opensolaris. Addtionally, he made some benchmarks. Interesting read. The improvement by using a NVRAM b...

Posted by c0t0d0s0.org on July 23, 2007 at 01:59 AM PDT #

Post a Comment:
Comments are closed for this entry.
About

perrin

Search

Top Tags
Categories
Archives
« July 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Today
Bookmarks