X

News, tips, partners, and perspectives for the Oracle Solaris operating system

Troubleshooting I/O Load in a Solaris OpenStack Cloud

Dave Miner
Sr. Principal Software Engineer
The other day we noticed that a number of guests running on our Solaris OpenStack cloud were very lethargic, with even simple operations like listing boot environments with beadm taking a long time, let alone more complex things like installing packages.  It was easy to rule out CPU load on the compute nodes, as prstat quickly showed me that load averages were not at all high on the compute nodes hosting my guests.

The next suspect was disk I/O.  When you're running on bare metal with local disks, this is pretty easy to check; things like zpool status, zpool iostat, and iostat provide a pretty good high-level view of what's going on, and tools like prstat or truss might help identify the culprit.  In an OpenStack cloud it's not nearly that simple.  It's all virtual machines, the "disks" are ZFS volumes served up as LUNs by an iSCSI target, and with a couple hundred VM's running on a bunch of compute nodes, you're now searching for the needle(s) in a haystack.  Where to start?

First I confirmed that we didn't have any funny business going on with the network.  We've had issues with the ixgbe NICs falling back to 100 Mb with our switch occasionally, but dladm show-aggr -x aggr0 confirmed that wasn't an issue on either the Cinder node or the compute nodes.  My next step was to look at I/O on the Cinder volume server.  We're not yet running this on a ZFS Storage Appliance, just a Solaris box with a pile of disks, so no analytics BUI yet.  A quick look with zpool iostat showed we're pretty busy:
$ zpool iostat 30
capacity operations bandwidth
pool alloc free read write read write
----- ----- ----- ----- ----- ----- -----
rpool 224G 54.2G 0 46 0 204K
tank 7.17T 25.3T 1.10K 5.53K 8.78M 92.4M
But are we really that loaded as far as the disks are concerned?
$ iostat -xnzc 30
cpu
us sy st id
0 12 0 88
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
57.9 272.6 769.1 10579.3 0.0 4.7 0.0 14.1 0 88 c1t0d0
0.0 8.9 0.0 78.1 0.0 0.0 0.0 2.1 0 1 c1t1d0
54.6 256.9 759.7 10475.1 0.0 4.6 0.0 14.8 0 90 c1t2d0
59.1 283.3 749.9 10070.2 0.0 4.3 0.0 12.5 0 87 c1t3d0
76.9 224.9 1172.9 10275.5 0.0 5.4 0.0 18.0 0 96 c1t4d0
51.7 259.8 784.0 10579.2 0.0 4.2 0.0 13.6 0 93 c1t5d0
47.3 275.2 781.3 10109.4 0.0 4.3 0.0 13.4 0 93 c1t6d0
49.6 262.0 766.2 10416.9 0.0 4.7 0.0 15.1 0 92 c1t7d0
66.3 243.2 1163.7 10172.9 0.0 5.3 0.0 17.1 0 97 c1t8d0
57.1 269.1 814.6 10584.6 0.0 4.2 0.0 13.0 0 90 c1t9d0
62.3 272.2 820.1 10298.4 0.0 4.0 0.0 12.0 0 87 c1t10d0
55.1 247.3 791.9 10558.6 0.0 4.6 0.0 15.3 0 91 c1t11d0
63.5 221.6 1166.7 10559.2 0.0 5.7 0.0 20.0 0 100 c1t12d0
0.0 9.1 0.0 78.1 0.0 0.0 0.0 4.9 0 1 c1t13d0
Yeah, that looks heavy; all of the disks in the tank pool are above 87% busy and it wasn't changing much over several minutes of monitoring, so it's not just a brief burst of activity.  Note that I don't have continuous monitoring set up for this data but past quick looks had shown we were typically about half those bandwidth numbers from zpool iostat and disks were in the 50-60% busy range.  So we have an unusually heavy load; but where's it coming from, and is it expected activity from something like a source build or something that's gone runaway?  This is where the problem gets challenging; we've got a bunch of compute nodes hammering this over iSCSI, so  we need to figure out which ones to look at.

Fortunately, my friends over at solarisinternals.com have a nice little DTrace script called iscsiio.d for breaking down iSCSI traffic at the target end.  Running it for less than a minute got this output:
# dtrace -s /var/tmp/iscsiio.d
Tracing... Hit Ctrl-C to end.
^C
REMOTE IP EVENT COUNT Kbytes KB/sec
10.134.13.59 read 56 238 11
10.134.13.37 read 631 2835 132
10.134.13.58 read 636 3719 173
10.134.13.54 read 671 3561 166
10.134.13.59 write 869 7330 342
10.134.13.54 write 2200 23153 1080
10.134.13.70 read 2273 13422 626
10.134.13.33 write 3730 69042 3221
10.134.13.58 write 4236 47745 2227
10.134.13.37 write 5997 108624 5068
10.134.13.70 write 6803 105302 4913
10.134.13.56 read 8152 45458 2121
10.134.13.56 write 9557 130043 6068
10.134.13.33 read 46666 372022 17359
Wow, a lot of read traffic on that last one.  So I logged into that node and ran:
$ iostat -xnzc 30
cpu
us sy st id
0 4 0 96
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 4.3 0.0 27.6 0.0 0.0 0.0 2.3 0 0 c0t5000CCA03C31E530d0
0.0 5.0 0.0 27.6 0.0 0.0 0.0 1.8 0 0 c0t5000CCA03C2DB384d0
0.2 2.3 1.1 7.9 0.0 0.6 0.0 256.4 0 19 c0t600144F07DA70300000055DE259000B9d0
0.2 2.4 1.1 5.4 0.0 1.1 0.0 421.1 0 24 c0t600144F07DA70300000056548EDA0080d0
5.5 4.0 4543.6 18.8 0.0 3.5 0.0 363.5 0 100 c0t600144F07DA70300000055706B06000Dd0
0.8 12.1 4.5 68.1 0.0 1.7 0.0 133.7 0 43 c0t600144F07DA7030000005655824D0085d0
8.1 7.1 6127.3 443.2 0.0 6.4 0.0 417.8 0 100 c0t600144F07DA703000000555C7D48003Bd0
0.9 11.6 4.7 110.0 0.0 2.0 0.0 157.8 0 58 c0t600144F07DA703000000564EB7950065d0
0.0 0.7 0.0 2.1 0.0 0.4 0.0 550.8 0 7 c0t600144F07DA70300000056A9F12A0121d0
0.2 2.2 1.1 6.2 0.0 0.5 0.0 215.4 0 12 c0t600144F07DA70300000056A8CB070112d0
0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0 2 c0t600144F07DA703000000569558FE00BBd0
0.2 2.4 1.1 6.5 0.0 0.2 0.0 75.6 0 7 c0t600144F07DA7030000005695379300B6d0
0.4 4.9 2.3 12.3 0.0 0.9 0.0 163.0 0 34 c0t600144F07DA703000000569522F800B4d0
0.4 2.8 2.3 14.6 0.0 0.5 0.0 158.9 0 7 c0t600144F07DA70300000056A1722900F4d0
0.6 7.6 3.4 40.0 0.0 1.2 0.0 140.0 0 27 c0t600144F07DA70300000056A652E700FBd0
0.2 2.6 1.1 6.6 0.0 0.0 0.0 0.6 0 0 c0t600144F07DA70300000056B02C5F0150d0
There are two LUNs that have a *lot* of read traffic going.  Which kernel zones are responsible?  That requires help from OpenStack to map it back to the responsible guest.  Back on the target, we get COMSTAR to tell us what a LUN maps to:
$ stmfadm list-lu -v 600144F07DA703000000555C7D48003B
LU Name: 600144F07DA703000000555C7D48003B
Operational Status : Online
Provider Name : sbd
Alias : /dev/zvol/rdsk/tank/cinder/volume-ed69313a-8954-405b-b89d-a11e5723a88f
View Entry Count : 1
Data File : /dev/zvol/rdsk/tank/cinder/volume-ed69313a-8954-405b-b89d-a11e5723a88f
Meta File : not set
Size : 107374182400
Block Size : 512
Management URL : not set
Software ID : not set
Vendor ID : SUN
Product ID : COMSTAR
Serial Num : not set
Write Protect : Disabled
Write Cache Mode Select: Enabled
Writeback Cache : Enabled
Access State : Active
Now that I have the volume name, I can feed the UUID portion of that name to Cinder:
$ cinder show ed69313a-8954-405b-b89d-a11e5723a88f
...
| attachments | [{u'device': u'c1d0',
u'server_id': u'5751bd18-78aa-4387-be55-8e395aa081eb',
u'id': u'ed69313a-8954-405b-b89d-a11e5723a88f',
u'host_name': None, u'volume_id': u'ed69313a-8954-405b-b89d-a11e5723a88f'}] |

(I've wrapped the above output to aid its readability) That maps to a server_id I can provide to nova show to identify the zone.

$ nova show 5751bd18-78aa-4387-be55-8e395aa081eb

I've ellided that output as it's private info.  From there, I'm able to talk to the owner, use zlogin to inspect it, etc.

It turned out that the two problem zones were running a test OpenStack Swift deployment that seemed to have run away a bit; we shut them down and voila, we were back to nominal performance for the rest of the cloud.



Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.