FS perf 201 : Postmark

Now let's run a simple but popular benchmark - Netapp's postmark. Let's see how long it takes to do 1,000,000 transactions.

First lets try ZFS:

mcp# ./postmark                  
PostMark v1.5 : 3/27/01
pm>set location=/scsi_zfs
pm>set transactions=1000000
pm>run
Creating files...Done
Performing transactions..........Done
Deleting files...Done
Time:
        220 seconds total
        214 seconds of transactions (4672 per second)

Files:
        519830 created (2362 per second)
                Creation alone: 20000 files (4000 per second)
                Mixed with transactions: 499830 files (2335 per second)
        500124 read (2337 per second)
        494776 appended (2312 per second)
        519830 deleted (2362 per second)
                Deletion alone: 19660 files (19660 per second)
                Mixed with transactions: 500170 files (2337 per second)

Data:
        3240.97 megabytes read (14.73 megabytes per second)
        3365.07 megabytes written (15.30 megabytes per second)
pm>

During the run, i used our good buddy zpool(1M) to see how much IO we were doing:

mcp# zpool iostat 1
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
scsi_zfs    32.5K  68.0G      0    207      0  6.25M
scsi_zfs    32.5K  68.0G      0    821      0  24.1M
scsi_zfs    32.5K  68.0G      0    978      0  28.6M
scsi_zfs    32.5K  68.0G      0  1.04K      0  30.3M
scsi_zfs    32.5K  68.0G      0  1.01K      0  27.6M
scsi_zfs     129M  67.9G      0    797      0  16.2M
scsi_zfs     129M  67.9G      0    832      0  27.4M

Ok, onto UFS:

mcp# ./postmark 
PostMark v1.5 : 3/27/01
pm>set location=/export/scsi_ufs
pm>set transactions=1000000
pm>run
Creating files...Done
Performing transactions..........Done
Deleting files...Done
Time:
        3450 seconds total
        3419 seconds of transactions (292 per second)

Files:
        519830 created (150 per second)
                Creation alone: 20000 files (909 per second)
                Mixed with transactions: 499830 files (146 per second)
        500124 read (146 per second)
        494776 appended (144 per second)
        519830 deleted (150 per second)
                Deletion alone: 19660 files (2184 per second)
                Mixed with transactions: 500170 files (146 per second)

Data:
        3240.97 megabytes read (961.96 kilobytes per second)
        3365.07 megabytes written (998.79 kilobytes per second)
pm>

Also, during the run grabbed a little iostat to see how UFS's IO was doing:

mcp# iostat -Mxnz 1
                    extended device statistics              
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0  820.9    0.0    3.4 142.5 256.0  173.5  311.9 100 100 c4t1d0
                    extended device statistics              
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0  797.0    0.0    3.1 129.2 256.0  162.1  321.2 100 100 c4t1d0
                    extended device statistics              
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0  777.0    0.0    3.1 128.0 256.0  164.7  329.5 100 100 c4t1d0
                    extended device statistics              
    r/s    w/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0  827.1    0.0    4.0 128.8 256.0  155.7  309.5 100 100 c4t1d0

Yikes! so looking at throughput (number of transactions per second) ZFS is ~16x better than UFS on this benchmark. Ok so ZFS is not this good on every benchmark when compared to UFS, but we rather like this one.

This was run on a 2 way opteron box, using the same SCSI disk for both ZFS and UFS.

Comments:

Uhm, 20,000 file creations per second? Are directory operations not atomic and synchronous in ZFS? If so then this isn't an apples to apples comparison. Standard UFS with no special options guarantees that any metadata operations that are completed are written to disk when the operation returns. I'm assuming postmark is executing these file creations in a single thread using regular creat(2) calls. If it's creating a separate thread for each then that's another ballgame. But I doubt that's the case. Also, doesn't this just prove that \*your\* UFS implementation sucks compared to ZFS? It doesn't say anything about the filesystem itself unless you had already compared your implementation to others and found it competitive.

Posted by Greg on November 16, 2005 at 09:06 AM PST #

First thing, should have mentioned my intention of the blog was to should how to get standard perf numbers for filesystems - i don't want to get into the debate of whether its a legit benchmark - not yet at least.

So the 20,000 files creations per second... turns out there were two 'fs_sync' call if/def'd out in the version i was using, as in:

time(&start_time); /\* store start time \*/

   /\* create files in specified directory until simultaneous number \*/
   printf("Creating files...");
   fflush(stdout);
   for (i=0; i<simultaneous; i++)
      create_file(buffered_io);
   printf("Done\\n");

   printf("Performing transactions");
   fflush(stdout);
#if 0
   fs_sync();
#endif
   time(&t_start_time);

That fs_sync() isn't really important for throughput measurement, the second one didn't affect ZFS's throughput but it did affect UFS's. I've now updated the blog with the correct numbers using a legit postmark. thanks for noticing!

So ZFS is down to 4000 creates per second (from 20000). And running the correct benchmark, ZFS goes from 9.5x to 16x better than UFS.

So UFS isn't exactly \*mine\* - (insert smiley here). But the comparison shows that out of the box, running postmark, Solaris's ZFS is 16x better than Solaris's UFS on this particular config with those particular parameters. There's no arguing that.

Would i like to compare against every other FS (reiser, jfs, wafl, vxfs, etc)? heck yeah! It's just a little harder since you're immediatley dealing with a different OS and/or different hardware (save vxfs), which starts adding too many variables and makes it a real apples vs. squashes comparison. I also wanted to leave the whole comparison politics out of this - its more of a tutorial than numbers to run to the press with.

Oh yeah, i'd recommend checking out:
http://blogs.sun.com/roller/page/perrin?entry=the_lumberjack
http://blogs.sun.com/roller/page/realneel#the_zfs_intent_log

Posted by eric kustarz on November 16, 2005 at 11:52 AM PST #

Eric, Would it be possible for you to compare zfs/ufs with synctest which models Postfix queue behaviour This url describes one run I did a while earlier http://oss.sgi.com/archives/linux-xfs/2004-04/msg00028.html A post from Andrew Morton describing synctest http://archives.neohapsis.com/archives/postfix/2001-07/1704.html To get a lot of ISP's excited, get some numbers with Bruce Guenter's fstress which models mail being written to a Maildir and being POP'ed at the same time http://untroubled.org/benchmarking/2004-04/

Posted by Yusuf Goolamabbas on November 16, 2005 at 05:43 PM PST #

I'm still not sure I believe the numbers. Even to reach 4,000 synchronous creates per second you would need a 120,000 RPM drive (assuming each create could be done with a single block write and never required seeking). It sounds like ZFS is batching them up and writing them out in blocks. The only way to do that is to make them not synchronous. But if that's the case then ZFS is unsafe for programs like mail servers and such that expect file meta operations to be synchronous.

Posted by Greg on November 17, 2005 at 07:59 AM PST #

Also, am I going nuts and losing my basic arithmetic or is something wrong with the "simple math question" thing below. It always says I get it wrong the first time and then the second time it works fine.

Posted by Greg on November 17, 2005 at 10:57 AM PST #

I mean -- I'm pretty sure 2 + 83 is 85 but apparently that's wrong.

Posted by Greg on November 17, 2005 at 10:58 AM PST #

hey Yusuf,

I'll take a look at 'synctest' - is it similiar to filebench's 'varmail'?
http://opensolaris.org/jive/thread.jspa?messageID=7054ᮎ
http://opensolaris.org/os/community/performance/filebench/

We like running filebench in-house. And really want to get it more stable on linux.

I'll take a look at fstress too, thanks!

and remember, what i posted are not "official" numbers... just demonstrating how to run various things to test file system performance - which is cool since i'm finding other tests to run.

Did mongo ever get ported to Solaris?

Posted by eric kustarz on November 18, 2005 at 04:51 AM PST #

hmmm, i get 85 from 83 + 2 as well - bizarre! i haven't hit any problems with the posting comments dealey.

So take a look at the ZIL links i provided above. If you're app doesn't specifiy O_DSYNC/O_RSYNC/O_SYNC on creat(2)/open(2) or doesn't do a fsync(), then the operations don't get flushed to disk immediatley.

This is how UFS w/ logging works - except it has a traditional log instead of the ZIL. I imagine this is how any modern file system works. If every meta-data changing operation had to be synchronously flushed to disk, your performance would be severly degraded. Its up to the app to specify what guarantees it wants.

So its asking for 6+77... i'm going with 83... lets see what happens...

Posted by eric kustarz on November 18, 2005 at 05:00 AM PST #

I assume that "same SCSI disk" means a single drive, not that you're using the hardware RAID on those Opteron boxes, right? I'm very interested in the comparison between other popular mail server filesystems - so far it looks like ZFS is much faster than ReiserFS for pure creation/deletion but the advantage tips the other way when mixed with transactions.

Posted by Chris Adams on November 21, 2005 at 11:49 AM PST #

Yep, single disk - no RAID.

I'm also very interested in comparing ZFS to other filesystems - though, i personally don't feel it can be a totally legit comparison until both filesystems exist on the same hardware/same OS. Otherwise, too many points to debate.

Though doing some comparisons on at least the same hardware would be interesting... now that the code is out there, i hope people start doing this (hint hint).

Posted by eric kustarz on November 22, 2005 at 02:51 AM PST #

mm, i get 85 from 83 + 2 as well - bizarre! i haven't hit any problems with the posting comments dealey. So take a look at the ZIL links i provided above. If you're app doesn't specifiy O_DSYNC/O_RSYNC/O_SYNC on creat(2)/open(2) or doesn't do a fsync(), then the operations don't get flushed to disk immediatley. This is how UFS w/ logging works - except it has a traditional log instead of the ZIL. I imagine this is how any modern file system works. If every meta-data changing operation had to be synchronously flushed to disk, your performance would be severly degraded. Its up to the app to specify what guarantees it wants.

Posted by wow gold on April 24, 2006 at 12:12 AM PDT #

秃驴wow gold就是和尚 eq2 gold古代称buy cheap wow gold和尚秃驴! 秃驴[url=http://www.pcsohu.net.cn]wow gold[/url]就是和尚 [url=http://www.pcsohu.net.cn]eq2 gold[/url]古代称[url=http://www.pcsohu.net.cn]buy cheap wow gold[/url]和尚秃驴!

Posted by wow gold on July 26, 2006 at 04:29 PM PDT #

Great resource. Really helpful for understand odbc things. Thanks. my email:googlefans@163.com =============Some my site==================== (wow gold) (world of warcraft gold)

Posted by wwwww on August 01, 2006 at 12:38 PM PDT #

Post a Comment:
Comments are closed for this entry.
About

erickustarz

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today