By paulvandenbogaard on Jan 21, 2010
Recently I had the good fortune to do some testing on an Amber Road. More officially known as the Sun Storage 7410 Unified Storage System.
The machine had 64 GB of internal memory and two quad core CPUs. A single JBOD holding 24 disks, of which one was a SSD optimized for write access (a writezilla.) The other 23 disks were "normal" 7200 RPM disks of one TB each. For connectivity some of the four gigabit NICs were used.
Besides the above mentioned SSD, this system also had two other SSDs optimized for reading (a readzilla.)
This appliance used ZFS as a file system. The writezilla was used to hold the ZFS intent log. The readzilla's were used as a level 2 ZFS (L2ARC) file system cache.
For more technical information about this product please check the website at:
IntroductionThe original trigger to execute these tests was a remark from an ISV that they had problems with applications using their Amber Road while an rsync was creating a backup of their email archive(s) on a Linux box to a volume on their appliance.
And indeed the throughput numbers we received were indeed not what we expected.
In order to get a better understanding a 7410 was setup together with some load generating equipment in one of our labs. The ISV was able to send us a copy of a part of their email archive we could use for testing. The Amber Road was configured in a comparable way to the one at the ISV location. The initial tests were indeed below expectations. The quest to find the bottleneck began.
There were quite a few elements that were "unknown" to me one of which were the internals of the Amber Road, including the hardware and the software (operating system, ZFS pool and file system, network interfaces , etc.)
The NICs, were they somehow dependent on each other? How much traffic can a single NIC handle? In throughput, but also in the number of packets?
The volume(s) in the Amber Road were used with NFS over gigabit ethernet. The Linux box was used to simulate the mail machine. Could this one handle the load required? The Linux volume manager was used. Using ext3 as a file system. Could this one deliver the load required? The tool used to make the backup, rsync, was this one able to drive all the resources?
I started with the front end: the Linux box. Especially the volume manager. Using
find <mail archive on lvm volume> -print | cpio -oc >/dev/null
as a way to simulate the reading part. I had a second machine on which Solaris 10 was installed. Exactly the same hardware as the Linux box. And of course couldn't resist to do the same reading tests on this one. I was not impressed with the load that could be generated from a couple of internal disks. Fortunately I had a couple of disk arrays "lying around".
I used a Sun StorageTek 6140 array holding 12 spindles spinning at 15K rpm. The array has a gigabyte of cache and two controllers.
The result of the above command:
1: 6140 Linux lvm (2 6140 volumes): 33 min
2: 6140 Solaris UFS on 2 6140 Disk Suite volume: 16 min
3: 6140 Linux nolvm (1 6140 volume): 34 min
4: 6140 Linux nolvm (4 6140 vols in parallel): 24 min
Tests 1 and 2 are performed with the same 6140 configuration. Same hardware. Different OS-es and volume managers. To check the Linux result I used half the disk capacity in the array (test 3) without lvm and 4 volumes without lvm (test 4). This last one had one volume mounted under /data1, the second volume on /data2, etc. To parallellize multiple find /dataX -print | cpio -o >/dev/null sessions were done on each file system in parallel.
After this result we decided to look into the other elements of the black box using Solaris only. Just to ensure we could generate load once we started testing the Amber Road.
The next part to check was the Amber Road network. Or better, how much load can the Amber Road NICs sustain. A single volume on the Amber Road was NFS mounted with the following options:
Since the rsync backup includes many file system operations (read, write, stat, etc) and is optimized to not write a file to backup if the file in question happens to reside on the backup volume already I changed the rsync use to a sequence of 'find ... cpio' ran in parallel. Each find command (see above) used its own private mail archive. After some analysis it was clear NFS was not a bottleneck. Using the analytics from the Amber Road the following screen dump shows the three NICs used were all running close to the theoretical speed of gigabit ethernet. In total "pumping" 300MB/sec look perfect to me, when using three interfaces. Each interface handling close to 15,000 NFSOPS/sec.
With these results I concluded that both client and network are running fine. I needed to focus on the 7410 now. I had 24 disks in this appliance. The GUI was not my favorite tool in order to look at the disk utilization and other statistics. Now I must admit, that I am in this business for quite some years. Still prefer the command line. Although it might be good to understand that the punch cards period was even before my time.....
The ISV remark included a statement about other applications suffering when a rsync was running. The rsync themselves also took a remarkable long time. This would translate to an average of 50MB/sec transfer rate.
I changed the approach a little bit. I first checked what could be seen as reasonable for the Amber Road. Fortunately there is a load generating tool called vdbench available. This open source tool can be downloaded from http://sourceforge.net/projects/vdbench. It has many possibilities. One of these is a simulation of file system tests. For this it creates a directory structure with as many files as specified. This is all done with a parameter file. Here is one example, for a pure read only test:
The generator runs on a client machine. The Amber Road volume is NFS mounted under /data_remote/test. There is a directory structure of 2 levels deep. Each directory at the end of this tree holds 10,000 files. Files of different sizes: 60% of them are 8KB, 25% is 32KB, 10% are 51KB and 5% are 4MB. A total a 96 threads are being used to generate the load. The test runs for 72,000 seconds.
The example above tries to create a continuous load of 10,000 IOPS (fwdrate).
Tests had to be done over a long time. The file system cache (ZFS ARC) was approximately 60 GB. It was already shown that the behavior of the Amber Road was perfect as long as capacity was available in ARC. Once the ARC was fully utilized the behavior started to show. For my tests this steady state behavior was needed in order to be comparable to the ISV situation.
The load test was done with a RAID-Z2 default setup, a RAID-Z2 narrow and a mirrored setup. Here's a picture with the output. Time versus number of IOPS. After the volume was created the directory structure and the files were created. After this a load run was started with 50% read.
Besides some spikes there is not much difference between these three. They all can cope with a 10,000 IOPS load very well. However this was during the first 10 minutes after all the files were created. Next graph shows the behavior over a ten times longer interval.
This data are from the same three tests. However here we see that at 700 seconds the default setup drops down to approximately 6,000 IOPS. Looking into the data presented by the vdbench log output (not shown here) also shows that there is no 50% read on average any more.
The RAID-Z2 narrow cab handle the 10,000 IOPS load longer. At 2700 seconds the behavior starts to change. It averages around 10,000 IOPS but the behavior is far less constant.
The mirrored setup performs the 10,000 IOPS nicely over the whole test period.
The tests described were executed to create a background load for running rsync. The bursts in the blue graph are points where the rsync is started. These were scheduled at 1800, 3600, and 5400 seconds. No ill effects during the test on a mirrored setup.
Here is a close up around the 900 second period.
One of the things that vdbench tries to do is to make up for missing IOs. The first big dip after T=700 is followed by a much bigger load request. The final result is that the system seems to throttle down to a 6000 IOPS level. During the first 600 seconds the average response time was around the 1 milli second. 200 seconds later this became 20 milli seconds. During the first 600 seconds the system was able to push 80 MB/sec. After the 800 second mark there was only 40-50 MB/sec left. Finally the amount of read ops settled around 85% of the total.
Finally to make this comparison complete, a close up of the graph around T=2900
The above mentioned test was redone for the three setups but now with a read write ratio of 75:25 and again with a 100% read test.
In order to see the effect on the run time of an rsync command that would create a backup of a 2 GB "mail-archive" this command was executed at 1800, 3600 and 5400 (rsync --delete -a <mail archive> <destination directory>.) The completion times are in the following tables. The times for the rsync during the default setup test were so long, that this test needed to be redone for a much longer time (14400 seconds.)
The first thing to mention is that although the default setup seems to perform unexpected, you should revisit what is happening. The appliance is being stressed by a 10,000 IOPS load. Internally there are two times eleven disks handling this load. These standard SATA disks are able to handle a load of say 70 IOPS each (google returns many articles, where this number ranges from 50 to 100). This translates to approximately 1540 IOPS for the available disks.
In a mirrored setup this appliance was able to handle 10,000 IOPS: over 6 times the capacity of those disks alone.... If there are barely any writes this is even possible with the default setup, and the narrow raid configuration too. After realizing this, I must admit, I was impressed!