By user12610824 on Jun 18, 2009
The Sun Storage 7000 Unified Storage product line currently ranges from the entry level Sun Storage 7110, which provides up to 4TB of storage in a 2U form factor, to the high end Sun Storage 7410, which scales up to half a petabyte, and can be configured with Flash accelerated Hybrid Storage Pools and/or a High-availability Cluster option. The entire line, which also includes the mid-range Sun Storage 7210 and 7310, features a common, easy to use management interface and, of course, killer-app Dtrace Analytics. My investigation has focused primarily on the Sun Storage 7110, with some additional work on the Sun Storage 7210, which has more CPU, memory, and spindle capacity, and supports optional Write Flash Accelerator components.
Key Benefits for MySQLThere are a couple of aspects of the Sun Storage 7000 line that are really interesting for MySQL. As Don pointed out in his keynote, the underlying use of ZFS in the appliance means that you can get many of the benefits of ZFS even without migrating your OS to Solaris. The snapshot and clone features, for example, make automated backups and replication slave creation a snap, and the way that cloning is implemented means that a slave created by this method may require less physical storage than a slave created by traditional methods. That is because clones share unmodified disk blocks, so anything that remains unchanged after slave creation is never actually duplicated. You can also enable compression in the appliance, if you choose, which could mean even more space savings.
Analytics, Analytics, Analytics!As a performance engineer, I can tell you that I have been frustrated many, many times by the lack of observability in storage devices. Even the best interfaces I have seen are like looking through frosted glass compared to the Dtrace Analytics interface. As Don MacAskill says "Drill down on everything. Correlate anything. God-like power." But don't take my word for it (or Don's), go to the Sun Storage 7000 Software page and download a simulator that you can run on your laptop or desktop, so you can see first hand.
Performance HighlightsOk, nobody would care about Dtrace Analytics if all it did was tell you why your system was running so darn slow. The good news is that MySQL performance on the Sun Storage 7000 line is excellent, even on the entry level Sun Storage 7110. There are a couple of issues that I will point out later, but for MySQL on Linux over NFS, performance is great, right out of the box.
Huge read cache vs traditional HW RaidThe current Sun Storage 7000 lineup is based on standard Sun x64 systems and 64-bit OpenSolaris. That means lots of addressable memory, most of which is used as cache. On a standard Sun Storage 7110 with 8GB of memory, that can translate to over 6GB of usable cache. This can grow all the way to 128GB of memory on the 7410. Compare that to typical entry level hardware RAID devices that often come with as little as 256MB of cache, or even relatively high end hardware RAID devices that often come with 2GB of cache or less.
Blazingly fast reads from cacheWith a database working set (the subset of data that clients are actively using) that fits in the cache of the Unified Storage appliance, I found that MySQL read-only performance on Linux and Solaris NFS was primarily bound by network bandwidth. For example, I observed over 6400 reads per second on the 7110, with a single MySQL/InnoDB/sysbench server accessing a 6GB working set (20GB total table space) via NFS over a 1Gb NIC. The sustained data throughput at this rate was a little over 80MB/sec, and with protocol overhead we had a sustained rate of a little over 90MB/sec on the wire, with peaks near 100MB/sec. That means we were running very near network line rate for the 1Gb NIC.
This resulted in 2x to 3x the Sysbench throughput for the MySQL/InnoDB/sysbench server using the Sun Storage 7110, compared to the same server when it was attached via fiber channel to a traditional HW RAID array with 2GB memory (providing enough cache for less than 1/3 of our working set). The read-only results were strong across the board for Linux and Solaris over both NFS and iSCSI.
On the 7210, we saw ~90% linear scaling for 1 to 6 MySQL/sysbench servers over NFS, when configured with a 10Gb NIC on the Unified Storage appliance and 1Gb nics in each DB server. Analytics showed that CPU and network in the 7210 were both at approximately 50% utilization during this test, and our working set used 36GB out of the available ~60GB of cache, so it is likely that we could have seen similar scaling to 8 or 10 DB servers.
The only tuning we applied at the Unified Storage appliance end was to set the record size of the NFS share to 16KB for the InnoDB tablespace storage. We left the record size at its default of 128KB for the NFS share that contained the InnoDB logs. There was no special network or NFS tuning applied to either the DB server or the appliance.
Strong IOPS for large working setsIf you have an active working set that exceeds the cache capacity in the Unified Storage appliance, your random read performance will eventually be bound by the IOPS rate of the underlying storage. For the Sun Storage 7110 and 7210, which do not have Read Flash Accelerator options like the 7310 and 7410, that means the IOPS rate of the disks. In a MySQL/InnoDB/sysbench test on the 7110, with an aggregate working set of 36GB (180GB aggregate table space), resulting in a 40% cache hit rate in the appliance, I observed roughly 4000 NFS reads per second and roughly 2700 resulting disk IOPS. That translates to 230 IOPS for each of the 12 data drives. In this test we were using only about 15% of the drive capacity, so we were "short stroking" the disks and getting better IOPS rates than we would if we had longer average seeks. For example, at 80% capacity we might see something on the order of 150 to 180 IOPS per drive.
For additional background on IOPS and other metrics for Sun Storage 7000 storage devices, check out Roch Bourbonnais' blogs on Sun Storage 7000 Performance invariants and Compared Performance of Sun 7000 Unified Storage Array Line and Amitabha Banerjee's blog on Analyzing the Sun Storage 7000.
Excellent MySQL read-write performance on Linux over NFSI was very pleasantly surprised by MySQL read-write performance on Linux NFS to the entry level Sun Storage 7110. I went into the study expecting good read-write performance to the 7210, but I thought that the lack of Write Flash Accelerators in the 7110 would be an issue for MySQL. Not so! In my testing, I observed over 2300 reads plus 400 writes per second on the 7110, with a single MySQL/InnoDB/sysbench server accessing a 6GB working set (20GB total table space) via NFS over a 1Gb NIC.
This resulted in 1.5x the Sysbench throughput for the MySQL/InnoDB/sysbench server using the Sun Storage 7110, compared to the same server when it was attached via fiber channel to a traditional HW RAID array with 2GB memory (providing enough cache for less than 1/3 of our working set).
As with the read-only test, the large read cache in the 7110 compared to the HW RAID array probably played a big role here. The 85% cache hit rate in the appliance provided an advantage on read performance, which offset the potential advantage of the battery backed write cache in the HW RAID array. In addition, the multi-threaded MySQL/InnoDB/sysbench workload appears to have benefited from group commit on writes, since the 7110 started with lower throughput with 1 or 2 active threads, but began to outperform the HW RAID array at 4 threads and higher.
I was surprised enough by the MySQL read-write performance over Linux NFS that I felt compelled to confirm that fsync() calls in Linux were actually resulting in correct NFS data commits. Based on a combination of Dtrace Analytics and Wireshark analysis of NFS traffic, I can confirm that the data was correctly going to disk.
Good MySQL read-write performance over iSCSI w/out write flash acceleratorsThe testing I have done so far with MySQL over iSCSI has been exclusively on the entry level Sun Storage 7110. Unlike the 7210, 7310, and 7410, this model does not currently include a write flash accelerator option, which would probably help MySQL read-write performance over iSCSI. Even so, we observed good performance over iSCSI, especially at higher thread counts. For example, using the same database and network configurations described earlier, MySQL/InnoDB/Sysbench throughput over iSCSI, at 32 to 64 active threads, was comparable to what was seen on the same server when it was attached via fiber channel to a traditional HW RAID array with 2GB memory for both Solaris and Linux.
MySQL read-write over Solaris NFSMySQL read-only performance on Solaris NFS is currently excellent, but read-write performance is impacted by the lack of VMODSORT support in the Solaris NFS client (CR 6213799). This affects fsync() performance for large files using a default, buffered NFS mount. The normal workaround for this is to eliminate client side buffering of file data by mounting the file system with the forcedirectio option, or enabling directio on a per-file basis. For example, the MySQL/InnoDB option "innodb_flush_method = O_DIRECT" enables dirctio on the InnoDB tablespace files. That is likely to work fine on an appliance that includes write flash accelerators like the Sun Storage 7210 and higher, but write flash accelerators are not currently available on the Sun Storage 7110. For the 7110 without write flash accelerators, we did not see a MySQL read-write performance gain by using directio instead of default, buffered NFS file access.
For Solaris, MySQL read-write performance on ZFS over iSCSI currently exceeds its performance over buffered or directio enabled NFS on the Sun Storage 7110, provided that fast-write-ack is enabled on the iSCSI luns.
iSCSI fast-write-ack persistenceA Solaris system running ZFS over iSCSI can realize performance gains by enabling fast-write-ack on the iSCSI luns in the storage appliance, because ZFS is known to correctly issue SCSI cache flush commands when fsync() is called (that is not currently known to be true for any Linux file system). The fast-write-ack option can be activated by enabling the "write cache enabled" option in the appliance for a given iSCSI lun. However, due to CR 6843533 the write cache enabled setting on an iSCSI lun will be silently disabled any time an iscsi login occurs for a target, although the Unified Storage BUI and CLI will still indicate that it is enabled. Examples of iscsi target login triggers include a reboot of the Unified Storage appliance, a reboot of the client, a client "iscsiadm modify target-param -p ..." command to modify negotiated parameters, or an "iscsiadm remove ..." followed by an "iscsiadm add ..." for the affected target. The workaround for CR 6843533 is to manually disable and then reenable write cache following an iscsi target login.
GeneralFor an NFS share that will be used to store MySQL tablespace files, match the record size of the share to the block size of the storage engine. For example, this should be 16k for InnoDB tablespace files. This can be configured on a per-share basis by setting the "Database record size" (in the BUI) or "recordsize" (in the CLI) for the share. This must be done before creating the tablespace files in the share.
For an NFS share that will store transaction logs or other non-tablespace files, the record size of the share can be left at its default of 128k.
For more information on Sun Storage 7000 Unified Storage network attached storage devices, including lots more performance data, check out these other blogs:
- Introducing the Sun Storage 7000 Series
- Sun Storage 7000 Performance invariants
- Analyzing the Sun Storage 7000
- Compared Performance of Sun 7000 Unified Storage Array Line
- Sun Storage 7310 perf limits
- The Fishworks Team
LinuxWe saw our best MySQL read-only and read-write performance on Linux using NFS. Read-only performance over iSCSI approached that of NFS, but read-write over NFS substantially outperformed iSCSI. Based on the read-write result, we recommend using NFS for MySQL on Linux. All of our testing on Linux used NFSv3, which was the default in the version of Linux we tested.
SolarisFor the Sun Storage 7110 without write flash accelerators, we saw our best MySQL read-write performance on Solaris by running ZFS over iSCSI. This avoided the performance impact we would have seen with MySQL over NFS, due to the lack of VMODSORT support in the Solaris NFS client (CR 6213799). The data integrity guarantees of ZFS also allowed us to disable the InnoDB double write buffer, and ZFS cache management semantics allowed us to enable fast-write-ack on the iSCSI luns in the storage appliance. Until CR 6213799 has been addressed, we recommend using ZFS over iSCSI for MySQL on Solaris if you do not have write flash accelerators in your appliance. However, be sure to review the iSCSI fast-write-ack persistence discussion in the "Current Issues" section above if you use this feature.
For the Sun Storage 7210 and higher, write flash accelerators should benefit MySQL read-write performance over NFS using either the forcedirectio mount option or the "innodb_flush_method = O_DIRECT" MySQL/InnoDB option, and will also benefit MySQL read-write performance over iSCSI. We did not have an appliance with write flash accelerators available for this test, so at this time we can not recommend one configuration over the other.