$ zpool iostat 30But are we really that loaded as far as the disks are concerned?
capacity operations bandwidth
pool alloc free read write read write
----- ----- ----- ----- ----- ----- -----
rpool 224G 54.2G 0 46 0 204K
tank 7.17T 25.3T 1.10K 5.53K 8.78M 92.4M
$ iostat -xnzc 30Yeah, that looks heavy; all of the disks in the tank pool are above 87% busy and it wasn't changing much over several minutes of monitoring, so it's not just a brief burst of activity. Note that I don't have continuous monitoring set up for this data but past quick looks had shown we were typically about half those bandwidth numbers from zpool iostat and disks were in the 50-60% busy range. So we have an unusually heavy load; but where's it coming from, and is it expected activity from something like a source build or something that's gone runaway? This is where the problem gets challenging; we've got a bunch of compute nodes hammering this over iSCSI, so we need to figure out which ones to look at.
cpu
us sy st id
0 12 0 88
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
57.9 272.6 769.1 10579.3 0.0 4.7 0.0 14.1 0 88 c1t0d0
0.0 8.9 0.0 78.1 0.0 0.0 0.0 2.1 0 1 c1t1d0
54.6 256.9 759.7 10475.1 0.0 4.6 0.0 14.8 0 90 c1t2d0
59.1 283.3 749.9 10070.2 0.0 4.3 0.0 12.5 0 87 c1t3d0
76.9 224.9 1172.9 10275.5 0.0 5.4 0.0 18.0 0 96 c1t4d0
51.7 259.8 784.0 10579.2 0.0 4.2 0.0 13.6 0 93 c1t5d0
47.3 275.2 781.3 10109.4 0.0 4.3 0.0 13.4 0 93 c1t6d0
49.6 262.0 766.2 10416.9 0.0 4.7 0.0 15.1 0 92 c1t7d0
66.3 243.2 1163.7 10172.9 0.0 5.3 0.0 17.1 0 97 c1t8d0
57.1 269.1 814.6 10584.6 0.0 4.2 0.0 13.0 0 90 c1t9d0
62.3 272.2 820.1 10298.4 0.0 4.0 0.0 12.0 0 87 c1t10d0
55.1 247.3 791.9 10558.6 0.0 4.6 0.0 15.3 0 91 c1t11d0
63.5 221.6 1166.7 10559.2 0.0 5.7 0.0 20.0 0 100 c1t12d0
0.0 9.1 0.0 78.1 0.0 0.0 0.0 4.9 0 1 c1t13d0
# dtrace -s /var/tmp/iscsiio.dWow, a lot of read traffic on that last one. So I logged into that node and ran:
Tracing... Hit Ctrl-C to end.
^C
REMOTE IP EVENT COUNT Kbytes KB/sec
10.134.13.59 read 56 238 11
10.134.13.37 read 631 2835 132
10.134.13.58 read 636 3719 173
10.134.13.54 read 671 3561 166
10.134.13.59 write 869 7330 342
10.134.13.54 write 2200 23153 1080
10.134.13.70 read 2273 13422 626
10.134.13.33 write 3730 69042 3221
10.134.13.58 write 4236 47745 2227
10.134.13.37 write 5997 108624 5068
10.134.13.70 write 6803 105302 4913
10.134.13.56 read 8152 45458 2121
10.134.13.56 write 9557 130043 6068
10.134.13.33 read 46666 372022 17359
$ iostat -xnzc 30There are two LUNs that have a *lot* of read traffic going. Which kernel zones are responsible? That requires help from OpenStack to map it back to the responsible guest. Back on the target, we get COMSTAR to tell us what a LUN maps to:
cpu
us sy st id
0 4 0 96
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 4.3 0.0 27.6 0.0 0.0 0.0 2.3 0 0 c0t5000CCA03C31E530d0
0.0 5.0 0.0 27.6 0.0 0.0 0.0 1.8 0 0 c0t5000CCA03C2DB384d0
0.2 2.3 1.1 7.9 0.0 0.6 0.0 256.4 0 19 c0t600144F07DA70300000055DE259000B9d0
0.2 2.4 1.1 5.4 0.0 1.1 0.0 421.1 0 24 c0t600144F07DA70300000056548EDA0080d0
5.5 4.0 4543.6 18.8 0.0 3.5 0.0 363.5 0 100 c0t600144F07DA70300000055706B06000Dd0
0.8 12.1 4.5 68.1 0.0 1.7 0.0 133.7 0 43 c0t600144F07DA7030000005655824D0085d0
8.1 7.1 6127.3 443.2 0.0 6.4 0.0 417.8 0 100 c0t600144F07DA703000000555C7D48003Bd0
0.9 11.6 4.7 110.0 0.0 2.0 0.0 157.8 0 58 c0t600144F07DA703000000564EB7950065d0
0.0 0.7 0.0 2.1 0.0 0.4 0.0 550.8 0 7 c0t600144F07DA70300000056A9F12A0121d0
0.2 2.2 1.1 6.2 0.0 0.5 0.0 215.4 0 12 c0t600144F07DA70300000056A8CB070112d0
0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0 2 c0t600144F07DA703000000569558FE00BBd0
0.2 2.4 1.1 6.5 0.0 0.2 0.0 75.6 0 7 c0t600144F07DA7030000005695379300B6d0
0.4 4.9 2.3 12.3 0.0 0.9 0.0 163.0 0 34 c0t600144F07DA703000000569522F800B4d0
0.4 2.8 2.3 14.6 0.0 0.5 0.0 158.9 0 7 c0t600144F07DA70300000056A1722900F4d0
0.6 7.6 3.4 40.0 0.0 1.2 0.0 140.0 0 27 c0t600144F07DA70300000056A652E700FBd0
0.2 2.6 1.1 6.6 0.0 0.0 0.0 0.6 0 0 c0t600144F07DA70300000056B02C5F0150d0
$ stmfadm list-lu -v 600144F07DA703000000555C7D48003BNow that I have the volume name, I can feed the UUID portion of that name to Cinder:
LU Name: 600144F07DA703000000555C7D48003B
Operational Status : Online
Provider Name : sbd
Alias : /dev/zvol/rdsk/tank/cinder/volume-ed69313a-8954-405b-b89d-a11e5723a88f
View Entry Count : 1
Data File : /dev/zvol/rdsk/tank/cinder/volume-ed69313a-8954-405b-b89d-a11e5723a88f
Meta File : not set
Size : 107374182400
Block Size : 512
Management URL : not set
Software ID : not set
Vendor ID : SUN
Product ID : COMSTAR
Serial Num : not set
Write Protect : Disabled
Write Cache Mode Select: Enabled
Writeback Cache : Enabled
Access State : Active
$ cinder show ed69313a-8954-405b-b89d-a11e5723a88f
...
| attachments | [{u'device': u'c1d0',
u'server_id': u'5751bd18-78aa-4387-be55-8e395aa081eb',
u'id': u'ed69313a-8954-405b-b89d-a11e5723a88f',
u'host_name': None, u'volume_id': u'ed69313a-8954-405b-b89d-a11e5723a88f'}] |
(I've wrapped the above output to aid its readability) That maps to a server_id I can provide to nova show to identify the zone.
$ nova show 5751bd18-78aa-4387-be55-8e395aa081eb
I've ellided that output as it's private info. From there, I'm able to talk to the owner, use zlogin to inspect it, etc.
It turned out that the two problem zones were running a test OpenStack Swift deployment that seemed to have run away a bit; we shut them down and voila, we were back to nominal performance for the rest of the cloud.
Once you've gone through the basics of setting up an Oracle Solaris guest on Oracle Cloud Infrastructure(OCI)...
A question that's coming up more and more often among Oracle Solaris customers is, "Can I run Solaris...
One piece of configuration that we had originally neglected in configuring Solaris Engineering's OpenStack...