ZFS Deduplication

You can do the exercises described below on any Oracle Solaris 11 machine, but we recommend using The Easiest Way to Start Learning About Oracle Solaris.

It is recommended that you make yourself familiar with ZFS Basics using our previous lab.

Exercise Z.4: ZFS Deduplication

Task: Users tend to keep a lot of similar files in their archives. Is it possible to save space by using deduplication?

Lab: We will create a ZFS file system with deduplication turned on and see if it helps.

Let's model the following situation: we have a file system which is used as an archive. We'll create separate file systems for each user and imagine that they store similar files there.

We will use the ZFS pool called labpool that we have created in the first exercise.

Create a file system with deduplication and compression:

root@solaris:~# zfs create -o dedup=on -o compression=gzip labpool/archive

Create users' file systems (we'll call them a, b, c, d for simplicity):

root@solaris:~# zfs create labpool/archive/a
root@solaris:~# zfs create labpool/archive/b
root@solaris:~# zfs create labpool/archive/c
root@solaris:~# zfs create labpool/archive/d

Check their "dedup" parameter:

root@solaris:~# zfs get dedup labpool/archive/a
NAME               PROPERTY  VALUE          SOURCE
labpool/archive/a  dedup     on             inherited from labpool/archive

Children file systems inherit parameters from their parents.

Create an archive from /usr/share/man/man1, for example.

root@solaris:~# tar czf /tmp/man.tar.gz /usr/share/man

And copy it four times to the file systems we've just created. Don't forget to check deduplication rate after each copy.

root@solaris:~# cd /labpool/archive
root@solaris:/labpool/archive# ls -lh /tmp/man.tar.gz 
-rw-r--r--   1 root     root        3.1M Dec 13 17:03 /tmp/man.tar.gz
root@solaris:/labpool/archive# zpool list labpool
NAME      SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT
labpool  1.52G  1.05M  1.52G   0%  1.00x  ONLINE  -
root@solaris:/labpool/archive# cp /tmp/man.tar.gz a/
root@solaris:/labpool/archive# zpool list labpool
NAME      SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT
labpool  1.52G  40.5M  1.48G   2%  1.00x  ONLINE  -
root@solaris:/labpool/archive# cp /tmp/man.tar.gz b/
root@solaris:/labpool/archive# zpool list labpool
NAME      SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT
labpool  1.52G  40.7M  1.48G   2%  2.00x  ONLINE  -
root@solaris:/labpool/archive# cp /tmp/man.tar.gz c/
root@solaris:/labpool/archive# zpool list labpool
NAME      SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT
labpool  1.52G  41.5M  1.48G   2%  3.00x  ONLINE  -
root@solaris:/labpool/archive# cp /tmp/man.tar.gz d/
root@solaris:/labpool/archive# zpool list labpool
NAME      SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT
labpool  1.52G  41.2M  1.48G   2%  4.00x  ONLINE  -

It might take a couple of seconds for ZFS to commit those changes and report the correct dedup ratio. Just repeat the command if you don't see the results listed above.

Remember, we set compression to "on" as well when we created the file system? Check the compression ratio:

root@solaris:/labpool/archive# zfs get compressratio labpool/archive
NAME             PROPERTY       VALUE  SOURCE
labpool/archive  compressratio  1.01x  -

The reason is simple: we placed in the file system files that are compressed already. Sometimes compression can save you some space, sometimes deduplication can help.

It's interesting to note that ZFS uses deduplication on a block level, not on a file level. That means if you have a single file but with a lot of identical blocks, it will be deduplicatied too. Let's check this. Create a new ZFS pool:

root@solaris:~# cd /dev/dsk/
root@solaris:/dev/dsk# mkfile 100m disk{10..13}
root@solaris:/dev/dsk# cd
root@solaris:~# zpool create ddpool raidz disk10 disk11 disk12 disk13

As you remember, when we create a ZFS pool, by default a new ZFS filesystem with the same name is created and mounted. We just have to turn deduplication on:

root@solaris:~# zfs set dedup=on ddpool

Now let's create a big file that contains 1000 copies of the same block. In the following commands we are figuring out the size of a ZFS block and creating a single file of that size. Then we are copying that file 1000 times into our big file.

root@solaris:~# zfs get recordsize ddpool
NAME    PROPERTY    VALUE  SOURCE
ddpool  recordsize  128K   default
root@solaris:~# mkfile 128k 128k-file
root@solaris:~# for i in {1..1024} ; do cat 128k-file >> 1000copies-file ; done

Now we can copy this file to /ddpool and see the result:

root@solaris:~# cp 1000copies-file /ddpool
root@solaris:~# zpool list
NAME     SIZE  ALLOC   FREE  CAP     DEDUP  HEALTH  ALTROOT
ddpool   382M   635K   381M   0%  1000.00x  ONLINE  -
rpool     62G  11.7G  50.3G  18%     1.00x  ONLINE  -

How can this help in real life? Imagine you have a policy which requires creating and storing an archive every day. The archive's content doesn't change a lot from day to day, but still you have to create it every day. Most of the blocks in the archive will be identical so it can be deduplicated very efficiently. Let's demonstrate it using our system's manual directories.

root@solaris:~# tar cvf /tmp/archive1.tar /usr/share/man/man1
root@solaris:~# tar cvf /tmp/archive2.tar /usr/share/man/man1 /usr/share/man/man2

Clean up our /ddpool file system and copy both files there:

root@solaris:~# rm /ddpool/*
root@solaris:~# cp /tmp/archive* /ddpool
root@solaris:~# zpool list ddpool
NAME    SIZE  ALLOC  FREE  CAP  DEDUP  HEALTH  ALTROOT
ddpool  238M  51.1M  187M  21%  1.89x  ONLINE  -

Think about your real life situations where deduplication could help. Homework exercise: compress both archive files with gzip, clean up the /ddpool and copy the compressed files again. Check if it affects deduplication rate.

Comments:

Post a Comment:
Comments are closed for this entry.
About

"I hear and I forget; I see and I remember; I do and I understand" -- Confucius

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today