Fun with zvols - Swap on a zvol

I mentioned recently that I just spent a week in a ZFS internals TOI. Got a few ideas to play with there that I will share. Hopefully folks might have suggestions as to how to improve / test / validate some of these things.

ZVOLs as Swap

The first thing that I thought about was using ZFS as a swap device. Of course, this is right there in the zfs(1) man page as an example, but it still deserves a mention here.  There has been some discussion of this on the zfs-discuss list at opensolaris.org (I just retyped that dot four times thinking it was a comma. Turns out there was crud on my laptop screen).  The dump device cannot be on a zvol (at least if you want to catch a crash dump) but this still gives a lot of flexibility.  With root on ZFS (coming before too long) ZFS swap makes a lot of sense and is the natural choice. We were talking in class that maybe it would be nice if there were a way to turn off ZFS' caching for the swap surface to improve performance, but that remains to be seen.

At any rate, setting up mirrored swap with ZFS is way simple! Much simpler even than with SVM, which in turn is simpler than VxVM. Here's all it takes:


bash-3.00# zpool create -f p mirror c2t10d0 c2t11d0
bash-3.00# zfs create -V 2g p/swap
bash-3.00# swap -a /dev/zvol/dsk/p/swap

Pretty darn simple, if you ask me. You can make it permanent by changing the lines for swap in your /etc/vfstab (below).  Notice that you use the path to the zvol in the /dev tree rather than the ZFS dataset name.


bash-3.00# cat /etc/vfstab
#device device mount FS fsck mount mount
#to mount to fsck point type pass at boot options
#
#/dev/dsk/c1t0d0s1 - - swap - no -
/dev/zvol/dsk/p/swap - - swap - no -

I would like to do some performance testing to see what kind of performance you can get with swap on a zvol.  I am curious about how this will affect kernel memory usage.  I am curious about the effect of things like compression on the swap volume.  Thinking about that one, it doesn't make a lot of sense.  I am also curious about the ability to dynamically change the size of the swap space.  At first glance, changing the size of the volume does not automatically change the amount of available swap space.  That makes sense.  That makes sense for expanding swap space.  But if you reduce the size of the volume and the kernel doesn't notice, that sounds like a it could be a problem.  Maybe I should file a bug.

Suggestions for things to try and ways to measure overhead and performance for this are welcomed.

Comments:

I think worrying about swap device performance is the wrong place to go. If you're worried about performance, then you should be trying to stay away from swapping anyway: zvol, slice, or metadevices won't be orders of magnitude different. I agree compression for the swap device doesn't seem to make a lot of sense. The only benefit I can think of is that transferring compressed data to disk takes less I/O than transferring the full data. But allocation can no longer be predicted well. Imagine a scenario where the pool is full, but the zvol (without reservation) is taking up little space due to the highly compressible zeros. Now you try to write real data in there and you run out of space. I don't think the page daemon is going to cope well with that. And if you do have a reservation, then compressing the data doesn't save you any space. What some might find nice is that we've got a ton of ECC RAM running around, but nothing checking on the swapfile in the same way. With ZFS data validation, you've got it. Yes, if you reduce the size of a swap device without telling the kernel, that would be a problem. What if that last block were the only location for a swapped out page? The kernel could no longer page it back into RAM. Most ZFS ops will notice if the device is in use (including by swap). I haven't tried the zvol swap to see if it warns or fails an attempted shrink or not. -- Darren

Posted by Darren Dunham on December 01, 2006 at 11:07 AM EST #

Absolutely agree about avoiding pageouts if you want performance. I always size systems that way when I can. I was mostly wondering how it would perform by comparison.

As to the scenario you describe, zvols always have a reservation. They have a size associated with them. Of course, you can change it, and the reservation is different than the space used when compressed. Having compression might allow you to "oversubscribe" in some sense your swap space, but again, not a desirable scenario.

The scenario of what happens when the zvol is full, memory is full, etc. is a bug (can't recall the number right now) that has already been filed. Basically, the scanner starts looking for pages to push to the swap surface. But the zvol ends up caching the page in the ARC, which needs to get more memory in order to cache the modifications to the zvol, so it goes to the page virutal memory system to get some pages, which starts the scanner looking for pages to push to the swap surface. See the spiral?

You can change the size of the swap zvol (zfs set volsize=) and the swap space stays the same - no notification trickles back up the stack. That means you are absolutely right about losing data that way.

So, this looks like something that we will have to work on to develop some best practices.

Posted by Scott Dickson on December 01, 2006 at 12:25 PM EST #

Post a Comment:
Comments are closed for this entry.
About

Interesting bits about Solaris, Virtualization, and Ops Center

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today