It seems that some old school filesystem still need to statically allocate inodes to hold pointers to individual files. Normally this should not cause too much problems as default settings account for an average filesize of 32K
. Or will it ?
If the avg filesize you need to store on the filesystem is much smaller than this, they you are likely to eventually run out of inodes even if the space consumed on the storage is far from exhausted.
In ZFS inodes are allocated on demand and so the question came up, how many files can I store onto a piece of storage. I managed to scrape up an old disk of 33GB, created a pool and wanted to see how many 1K files I could store on that storage.
ZFS stores files with the smallest number of sectors possible and so 2 sectors was enough to store the data. Then of course one needs to also store some amount of metadata, indirect pointer, directory entries etc to complete the story. There I didn't know what to expect. My program would create 1000 files per directory. Max depth level is 2, nothing sophisticated attempted here.
So I let my program run for a while and eventually interrupted it at 86% of disk capacity :
Filesystem size used avail capacity Mounted on
space 33G 27G 6.5G 81% /space
Then I counted the files.
#ptime find /space/create | wc
23823805 23823805 1405247330
So 23.8M files consuming 27GB of data. Basically less than 1.2K of used disk space per KB of files. A legacy type filesystem that would allocate one inode per 32K would have run out of space after a meager 1M files but ZFS managed to store 23X more on the disk without any tuning.
The find command here is mostly gated on fstat performance and we see here that we did the 23.8M fstat in 3060 seconds or 7777 fstat per second.
But here is the best part : And how long did it take to create all those files ?
This is hard to believe but it took about 1 hour for 23.8 million files.This is on a single direct attach drive
3. c1t3d0 <FUJITSU-MAP3367N SUN36G-0401-33.92GB>
ZFS created on average 5721 files per second. Now obviously such a drive cannot do 5721 IOPS but with ZFS it didn't need to. File create is actually more of a cpu benchmark because the application is interacting with host cache. It's the task of the filesystem to then create the files on disk in the background. With ZFS, the combination of the Allocated on Write policy and the sophisticated I/O aggregation in the I/O scheduler (dynamics
) means that the I/O for multiple independant file create can be coalesced. Using dtrace I counted the number of IO required and filecreates per minutes, typical samples show more than 200K files created per minutes using about 3000 IO per minutes or 3300 files per second using a mere 50 IOPS !!! Per Minute
Sample Create IOs
#1 214643 2856
#2 215409 3342
#3 212797 2917
#4 211545 2999
Finally with all these files, is scrubbing a problem ? It took 1h34m to actually scrub that many files at a pace of 4200 scrubbed files per second. No sweat. pool: space
scrub: scrub completed after 1h34m with 0 errors on Wed Feb 11 12:17:20 2009
If you need to create, store and otherwise manipulate lots of small files efficiently, ZFS has got to be you filesystem of choice for you.