Saturday Jun 11, 2016

Solaris API for Locality Group Observability

[ Related: 1st part @ Locality Group Observability on Solaris ]

As of this writing, Solaris has limited support for lgroup observability using programmatic interfaces. Applications can use the lgroup API to traverse the locality group hierarchy, discover the contents of each locality group and even affect thread and memory placement on desired lgroups.

The man page for liblgrp(3LIB) lists out the currently supported public interfaces, and a brief description for most of those interfaces can be obtained by running man -k lgroup in a shell. In order to use this API, applications must link with locality group library, liblgrp(3LIB).

The following sample code demonstrates lgroup API usage by making several lgrp_*() calls including lgrp_device_lgrps() to find the locality groups that are closest to the specified I/O device.

# cat -n iodevlgrp.c

     1  #include <stdio.h>
     2  #include <stdlib.h>
     3  #include <assert.h>
     4  #include <sys/lgrp_user.h>
     5  #include <sys/types.h>
     6
     7  int main(int argc, char **argv) {
     8
     9          if (argc != 2) {
    10                  fprintf(stderr, "Usage: %s \n", argv[0]);
    11                  exit(1);
    12          }
    13
    14          /*  lgroup interface version check */
    15          if (lgrp_version(LGRP_VER_CURRENT) != LGRP_VER_CURRENT) {
    16              fprintf(stderr, "\nBuilt with unsupported lgroup interface %d",
    17                  LGRP_VER_CURRENT);
    18              exit(1);
    19          }
    20
    21          lgrp_cookie_t cookie =lgrp_init(LGRP_VIEW_OS);
    22          lgrp_id_t node = lgrp_root(cookie);
    23
    24          /* refresh the cookie if stale */
    25          if ( lgrp_cookie_stale(cookie) ) {
    26                  lgrp_fini(cookie);
    27                  cookie =lgrp_init(LGRP_VIEW_OS);
    28          }
    29
    30          int nlgrps = lgrp_nlgrps(cookie);
    31          if ( nlgrps == -1 ) {
    32                  perror("\n lgrp_nlgrps");
    33                  lgrp_fini(cookie);
    34                  exit(1);
    35          }
    36          printf("Number of locality groups on the system: %d\n", (nlgrps-1));
    37
    38          /* lgroups closest to the target device */
    39          int numlgrps = lgrp_device_lgrps(argv[1], NULL, 0);
    40          if (numlgrps == -1 ){
    41                  fprintf(stderr, "I/O device: %s. ", argv[1]);
    42                  perror("lgrp_device_lgrps");
    43          } else {
    44                  printf("lgroups closest to the I/O device %s: ", argv[1]);
    45                  lgrp_id_t *lgrpids = (lgrp_id_t *)calloc(numlgrps, sizeof(lgrp_id_t));
    46                  assert(lgrpids != NULL);
    47                  lgrp_device_lgrps(argv[1], lgrpids, numlgrps);
    48                  for (int i = 0; i < numlgrps; ++i) {
    49                          printf(" %d ", lgrpids[i]);
    50                  }
    51                  free(lgrpids);
    52          }
    53          lgrp_fini(cookie);
    54          printf("\n");
    55          return 0;
    56
    57  }


% cc -o iodevlgrp -llgrp iodevlgrp.c

% ./iodevlgrp /dev/ixgbe0
Number of locality groups on the system: 2
lgroups closest to the I/O device /dev/ixgbe0:  1

% lgrpinfo -d /dev/ixgbe0
lgroup ID : 1

% ./iodevlgrp /dev/ixgbe1
Number of locality groups on the system: 2
lgroups closest to the I/O device /dev/ixgbe1:  2

% lgrpinfo -d /dev/ixgbe1
lgroup ID : 2

Source: Oracle Solaris 11.3 Programming Interfaces Guide

Saturday Apr 23, 2016

Solaris 11: Few Random Commands

Logical Domains
Dependencies & Dependents
Network & Power Consumption Stats
HBA Listing

Dependencies

In the following example, dummy domain depends on two domains -- primary domain for virtual disks and virtual network devices, and dom2 for virtual disks and SR-IOV virtual functions (VF).

# ldm list-dependencies dummy
DOMAIN            DEPENDENCY        TYPE
dummy             primary           VDISK,VNET
                  dom2              VDISK,IOV

-l option displays the actual devices.

Dependents

-r option in list-dependencies sub-command shows the dependents for the logical domain(s). -l option displays the actual devices in here too.

eg.,

# ldm list-dependencies -r -l dom2
DOMAIN            DEPENDENT         TYPE           DEVICE
dom2              dummy             VDISK          primary-vds0/vdisk0:vol_dummy_disk0
                                    VDISK          service-vds0/vdisk0:vol_dummy_disk0
                                    IOV            /SYS/IOU1/PCIE14/IOVIB.PF0.VF1
                                    IOV            /SYS/IOU1/EMS4/CARD/NET0/IOVNET.PF0.VF1

Network Statistics

list-netstat sub-command in ldm utility displays the statistics for all the network devices that are configured in the domain(s).

eg.,

# ldm ls-netstat dom2
DOMAIN
dom2

NAME               IPACKETS     RBYTES       OPACKETS     OBYTES
----               --------     ------       --------     ------
net4               444.42M      61.27G       444.42M      7.12G
net1               13.20M       4989.22M     0            0
net0               0            0            0            0
..
ldoms-net0.vf1     37.62M       6.44G        49.15K       3.97M
ldoms-net0.vf0     523.34K      90.15M       62.85K       14.67M
ldoms-net1.vf15    0            0            0            0
..

List HBAs

list-hba sub-command in ldm utility lists out the physical SCSI HBA initiator ports in the target domain. -t option shows SCSI transport medium type such as FC or SAS.

eg.,

# ldm ls-hba -t  primary
NAME                                                 VSAN
----                                                 ----
/SYS/IOU0/EMS1/CARD/SCSI/HBA0/PORT1
        init-port w5080020000016331
        Transport Protocol SAS
/SYS/IOU0/EMS1/CARD/SCSI/HBA0/PORT2
        init-port w5080020000016331
        Transport Protocol SAS
...

Check the latest man page for ldm(1M) for the complete list of options for those sub-commands.

Power Consumption Statistics

ldmpower command shows the breakdown of power consumed by processors and memory in different domains or in a given domain. Note that ldmpower is an independent command, not part of ldm utility.

eg.,

# ldmpower -c processors -c memory -l primary
Processor Power Consumption in Watts
DOMAIN          15_SEC_AVG      30_SEC_AVG      60_SEC_AVG
primary         313             315             317

Memory Power Consumption in Watts
DOMAIN          15_SEC_AVG      30_SEC_AVG      60_SEC_AVG
primary         594             595             595

As usual, check the man page ldmpower(1M) for details.


PCI Devices

Scan & List

scanpci utility scans PCI buses and reports configuration settings for each PCI device that it finds.

eg.,

# scanpci -v

pci bus 0x0002 cardnum 0x06 function 0x00: vendor 0x111d device 0x80ba
 Integrated Device Technology, Inc. [IDT] Device unknown
 CardVendor 0x0000 card 0x0000 (Card unknown)
  STATUS    0x0010  COMMAND 0x0147
  CLASS     0x06 0x04 0x00  REVISION 0x03
  BIST      0x00  HEADER 0x01  LATENCY 0x00  CACHE 0x10
  MAX_LAT   0x00  MIN_GNT 0x03  INT_PIN 0x00  INT_LINE 0x00
  Bus: primary=02, secondary=03, subordinate=72, sec-latency=0
  I/O behind bridge: 00000000-03ffffff
  Memory behind bridge: 00100000-2000ffff
  Prefetchable memory behind bridge: 100000000-777f0ffff

pci bus 0x0003 cardnum 0x00 function 0x00: vendor 0x8086 device 0x1521
 Intel Corporation I350 Gigabit Network Connection
 CardVendor 0x108e card 0x7b18 (Oracle/SUN, Quad Port GbE PCIe 2.0 Low Profile Adapter, UTP)
  STATUS    0x0010  COMMAND 0x0146
  CLASS     0x02 0x00 0x00  REVISION 0x01
  BIST      0x00  HEADER 0x80  LATENCY 0x00  CACHE 0x10
  BASE0     0x00100000 SIZE 1048576  MEM
  BASE3     0x00200000 SIZE 16384  MEM
  BASEROM   0x00280000 SIZE 524288
  MAX_LAT   0x00  MIN_GNT 0x00  INT_PIN 0x01  INT_LINE 0x00

...

SRU
IDRs

Detect & Show Installed

Identifying Solaris 11 SRU level is one of the popular topics and it was covered in Solaris 11 documentation, My Oracle Support (MOS) and elsewhere by numerous bloggers. After all this, we still have to wonder why there isn't a conscious effort by the development team that owns pkg utility to make this a bit easier for normal human beings. In any case, pkg list entire or pkg info entire commands show a cryptic version string with a bunch of information encoded including the SRU level.

The second numeral in "branch" version represents the Solaris update whereas the third numeral is the SRU.

eg.,

In the following example, Solaris update is 3 (running Solaris 11 - hence it is a Solaris 11.3 system) and current SRU is 5. (I underlined every alternate field)

# pkg info entire | grep -i branch
        Branch: 0.175.3.5.0.6.0

IDRs

IDRs (Interim Diagnostic Reliefs) are in reality package updates whose names usually start with "idr" - therefore, just checking for the string "idr" in the list of installed packages on the system is sufficient to list out what IDRs were installed on the target system.

# pkg list idr*
pkg list: No packages matching 'idr*' installed <-- no IDRs installed

(Ticket Stub CSS Theme Credit: Kia Skretteberg)

Friday Jan 29, 2016

[Solaris] Clock Synchronization, Max Username Length, Removable Media

Synchronize System Time with a Remote Host

One option is to rely on Network Time Protocol. If NTP is not available, rdate is an alternative to set the local system time to the time and date returned by the remote host (server).

  1. NTP

    eg.,
    # date
    Thu Jan 28 13:47:54 PST 2016
    
    # ntpdate 10.131.45.1	<-- 10.131.45.1 is the NTP server within the network
    29 Jan 15:28:38 ntpdate[21143]: step time server 10.131.45.1 offset 92435.318367 sec
    
    # date
    Fri Jan 29 15:28:42 PST 2016
    
  2. rdate

    Steps:

    • On server, ensure that time:stream service is online. If not, bring it online. server is the host that provides date/time.

      # ( svcs -H svc:/network/time:stream | grep online ) || \
      	( svcadm -v enable time:stream && svcadm -v enable time:dgram )
      svc:/network/time:stream enabled.
      svc:/network/time:dgram enabled.
      
    • On client, run: rdate <server>

      eg.,
      # date
      Thu Jan 28 13:54:37 PST 2016
      
      # rdate 10.131.45.104	<-- 10.131.45.104 is just another host with the time service enabled.
      Fri Jan 29 15:41:47 2016
      
      # date
      Fri Jan 29 15:41:49 PST 2016
      

Listing Removable Media

.. such as CD-ROM, DVD-ROM and PCMCIA cards.

Related command: rmformat

eg.,
# rmformat
Looking for devices...
     1. Logical Node: /dev/rdsk/c1t0d0s2
        Physical Node: /pci@304/pci@1/pci@0/pci@2/usb@0/storage@2/disk@0,0
        Connected Device: SUN      Remote ISO CDROM 1.01
        Device Type: CD Reader
        Bus: USB
        Size: 103.3 MB
        Label: 
        Access permissions: Medium is not write protected.
     2. Logical Node: /dev/rdsk/c2t0d0s2
        Physical Node: /pci@304/pci@2/usb@0/storage@1/disk@0,0
        Connected Device: VT       eUSB DISK        5206
        Device Type: Removable
        Bus: USB
        Size: 2.0 GB
        Label: 
        Access permissions: Medium is not write protected.
..

In addition to listing, the same command can also be used to format, label, partition removable & rewritable media.

rmmount and rmumount utilities can be used to mount and unmount removable or hot-pluggable volumes.

eg.,

List the mountable device along with path(s)

# rmmount -l
/dev/dsk/c1t0d0s2    cdrom,cdrom0,cd,cd0,sr,sr0,CDROM,/media/CDROM

SEE ALSO:
Man pages of rmformat(1) and rmmount(1).


Solaris Username Length

useradd command accepts usernames up to 32 characters. OS may warn beyond 8 characters - simply ignore if more than 8 characters is a requirement.

eg.,
# useradd guyfawkes
UX: useradd: guyfawkes name too long.

# su guyfawkes

# id guyfawkes
uid=110(guyfawkes) gid=10(staff)

Directory Listing as a Tree

Check and install file/tree package if doesn't exist.

# ( pkg list -q file/tree ) || ( pkg install pkg:/file/tree )

List the directory using tree(1) command.

# tree -d /opt/SUNWldm
/opt/SUNWldm
|-- bin
|   `-- schemas
|-- lib
|   |-- contrib
|   |-- ds
|   `-- templates
|-- man
|   |-- ja_JP.UTF-8
|   |   `-- man1m
|   `-- man1m
`-- platform
    `-- ORCL,SPARC64-X
        `-- lib
            `-- ds

14 directories

Wednesday Dec 30, 2015

[Solaris] Memory Blacklisting, Duplicate IP Address & Recovery, Group Package Installations, ..

-1-


Memory blacklist operation

To check if memory blacklist operation by LDoms Manager (ldm) is in progress, run:

echo "zeus ::print -a zeus_t mem_blacklist_inprog" | mdb -p `pgrep ldmd`

If no blacklist operation is in progress, the above may return output that is similar to:

<hex-address> mem_blacklist_inprog = 0 (false)

When a memory blacklist operation is in progress, the above may return output that is similar to:

<hex-address> mem_blacklist_inprog = 0x1 (true)

In such a situation, any attempt to run ldm commands related to memory management may fail with error:

A memory blacklist operation is being processed. Other memory operations are disabled until it completes

Sometimes a power cycle may clear the blacklist operation. In not-so-lucky situations, the affected DIMM(s) may have to be serviced.

(Credit: Eunice M.)

-2-

Duplicate IP Address & Recovery

If two nodes [running Solaris] on a network share the same IP address, Solaris kernel detects the duplicate address, marks it as duplicate and eventually disables and turns off the IP interface if the problem persists. These actions are typically recorded in the system log with warnings such as the following.

eg.,
Dec 23 16:46:18 some-host ip: [ID 759807 kern.warning] WARNING: net0 has duplicate address xx.xx.xx.xx (in use by 00:10:e0:5d:9c:83); disabled
Dec 23 16:46:18 some-host in.routed[737]: [ID 238047 daemon.warning] interface net0 to xx.xx.xx.xx turned off

When the IP interface was disabled/turned off, ipadm show-if shows down state for that interface.

Once the problem was discovered and fixed [by the administrator or user] to avoid duplication of IP address, Solaris kernel enables and brings up the IP interface that it turned off earlier upon detecting a duplicate IP address. This action too is recorded in the system log.

Dec 23 16:51:18 some-host ip: [ID 636139 kern.notice] NOTICE: recovered address ::ffff:xx.xx.xx.xx on net0

Once the system marks an interface down due to the conflicting IP address in a remote system, the local system periodically checks to see if the conflict persists. In Solaris 11.3, the time between the checks is 300,000 milliseconds (300 seconds or 5 minutes). However in some cases waiting for 5 minutes might not be desirable. In such cases, the time between the duplicate IP address checks can be tuned by modifying the IP tunable parameter, _dup_recovery, to appropriate value.

eg.,

Reduce the _dup_recovery value to 90 seconds.

  • Temporary change (non-persistent)

    # ndd -get /dev/ip ip_dup_recovery
    300000
    
    # ndd -set /dev/ip ip_dup_recovery 90000
    
    # ndd -get /dev/ip ip_dup_recovery
    90000
    
  • Permanent change (persistent across reboots)

    # ipadm show-prop -p _dup_recovery ip
    PROTO PROPERTY              PERM CURRENT      PERSISTENT   DEFAULT      POSSIBLE
    ip    _dup_recovery         rw   300000       --           300000       0-3600000
    
    # ipadm set-prop -p _dup_recovery=90000 ip
    
    # ipadm show-prop -p _dup_recovery ip
    PROTO PROPERTY              PERM CURRENT      PERSISTENT   DEFAULT      POSSIBLE
    ip    _dup_recovery         rw   90000        90000        300000       0-3600000
    

Notice the slight difference in parameter names when ndd and ipadm were used to tune the same value.


-3-

Solaris OS: What Group Package was Installed?

pkg list on target system shows this information.

eg.,
# pkg list | grep "group/system/solaris" | grep server
group/system/solaris-minimal-server               0.5.11-0.175.3.1.0.5.0     i--

List all packages that are part of the group package that was installed by running:

pkg list -as `pkg contents -r -H -o fmri -t depend -a type=group <group-package>`

List available group packages to install Solaris server:

# pkg search solaris-*-server | awk '{ print $3 "\t" $4}'
VALUE   					PACKAGE
solaris/group/system/solaris-large-server       pkg:/group/system/solaris-large-server@0.5.11-0.175.3.1.0.5.0
solaris/group/system/solaris-minimal-server     pkg:/group/system/solaris-minimal-server@0.5.11-0.175.3.1.0.5.0
solaris/group/system/solaris-small-server       pkg:/group/system/solaris-small-server@0.5.11-0.175.3.1.0.5.0
  • solaris-large-server provides an Oracle Solaris large server environment that contain all of the common network services that an enterprise server must provide. Hardware drivers that are required for servers such as InfiniBand drivers are also part of this group package.

  • solaris-small-server installs a smaller set of packages on a server and provides command-line environment

  • solaris-minimal-server installs the smallest possible set of Solaris packages which provides a minimal command-line environment

In addition to the above, solaris/group/system/solaris-desktop group package provides Solaris desktop environment. This package contains the GNOME desktop environment that includes GUI applications such as web browsers and mail clients, and drivers for graphics and audio devices.

Keep in mind that solaris-desktop group package has a lot more packages compared to the other three group packages outlined above.


-4-

Solaris OS: What AI Manifest was Used?

On target system, locate the AI manifest file that was used to perform the Solaris installation at:

  • /var/log/install/ai.xml

Installation log can also be found in the same directory.


-5-

Package History

pkg history shows command history related to a specific package or all packages installed on the system. This includes information such as who initiated the package operation, the complete command, how long it took to complete the operation, whether a new boot environment (BE) was created and the errors encountered, if any.

Refer to pkg(1) man page for the options and description.


PS:
Legible version of this post with full outputs @ [Solaris] Memory Blacklisting, Duplicate IP Address & Recovery, Group Package Installations, ..



Fancy Separator Credit: jkneb

Tuesday Sep 29, 2015

Solaris: Identifying EFI disks

EFI label supports physical disks and logical volumes that are > 2 TB in size. SMI support is limited to 2 TB.

Listed below are some of the characteristics and patterns that can help identify and differentiate an EFI labeled disk from a SMI labeled disk.

  • Device cxtxd0 [without any slice suffix] represents the entire disk

  • No cylinder information is stored in the EFI label.

  • No overlapping slices / partitions

    • eg.,

      EFI label disk:

      Notice that there are no overlapped partitions and no references to cylinders in the following prtvtoc output.

      % prtvtoc /dev/rdsk/c0t5000CCA04E0DEDD8d0
      * /dev/rdsk/c0t5000CCA04E0DEDD8d0 partition map
      *
      * Dimensions:
      *     512 bytes/sector
      * 390721968 sectors
      * 390721901 accessible sectors
      *
      * Flags:
      *   1: unmountable
      *  10: read-only
      *
      * Unallocated space:
      *       First     Sector    Last
      *       Sector     Count    Sector
      *          34         6        39
      *   390070312    635239 390705550
      *
      *                          First     Sector    Last
      * Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
             0      4    00         40   2097152   2097191
             1      4    00    2097192 384827392 386924583
             4      4    00  386924584   3145728 390070311
             8     11    00  390705551     16384 390721934
      

      SMI label disk:

      Notice the overlapped partitions (0 & 2. also 2 & 6) and references to cylinders in the following prtvtoc output.

      # prtvtoc /dev/rdsk/c0t5000A72030082BD5d0s2
      * /dev/rdsk/c0t5000A72030082BD5d0s2 partition map
      *
      * Dimensions:
      *     512 bytes/sector
      *      56 sectors/track
      *     224 tracks/cylinder
      *   12544 sectors/cylinder
      *   11429 cylinders
      *   11427 accessible cylinders
      *
      * Flags:
      *   1: unmountable
      *  10: read-only
      *
      *                          First     Sector    Last
      * Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
             0      2    00          0    263424    263423
             1      3    01     263424    263424    526847
             2      5    01          0 143340288 143340287
             6      4    00     526848 142813440 143340287
      
  • Existence of /dev/[r]dsk/cxtxd0 implies EFI label. In case of SMI label, /dev/[r]dsk/cxtxd0 won't exist.

    • eg.,

      EFI label disk:

      % ls /dev/rdsk/c0t5000CCA04E0DEDD8d0
      /dev/rdsk/c0t5000CCA04E0DEDD8d0
      

      SMI label disk:

      # ls /dev/rdsk/c0t5000A72030082BD5d0
      /dev/rdsk/c0t5000A72030082BD5d0: No such file or directory
      
  • The presence of "wd" (whole disk?) in device path of the physical device may imply EFI label.

    eg.,

    EFI label disk:

    % stat -c "%N" /dev/rdsk/c0t5000CCA04E0DEDD8d0
    ‘/dev/rdsk/c0t5000CCA04E0DEDD8d0’ -> ‘../../devices/scsi_vhci/disk@g5000cca04e0dedd8:wd,raw’
    

    SMI label disk:

    # stat -c "%N" /dev/rdsk/c0t5000A72030082BD5d0s2
    '/dev/rdsk/c0t5000A72030082BD5d0s2' -> '../../devices/scsi_vhci/disk@g5000a72030082bd5:c,raw'
    
  • As of this writing, devinfo(1M) does not support EFI labeled disks.

    • eg.,

      EFI label disk:

      % devinfo -i /dev/rdsk/c0t5000CCA04E0DEDD8d0
      devinfo: /dev/rdsk/c0t5000CCA04E0DEDD8d0: This operation is not supported on EFI labeled devices
      

      SMI label disk:

      # devinfo -i /dev/rdsk/c0t5000A72030082BD5d0s2
      /dev/rdsk/c0t5000A72030082BD5d0s2       0       0       12544   512     4
      

Credit: various internal sources

Thursday Apr 30, 2015

Few Random Solaris Commands : intrstat, croinfo, dlstat, fmstat, ..

Target: Solaris 11 and later. Some of these commands may work on earlier versions too.

-1-


Interrupt Statistics : intrstat utility

intrstat utility can be used to monitor interrupt activity generated by various hardware devices along with the CPU that serviced the interrupt and the CPU time spent in servicing those interrupts on a system. On a busy system, intrstat reported stats may help figure out which devices are busy at work, and keeping the system busy with interrupts.

eg.,

.. [idle system] showing the interrupt activity on first two vCPUs ..

# intrstat -c 0-1 5

      device |      cpu0 %tim      cpu1 %tim
-------------+------------------------------
      cnex#0 |         0  0.0         0  0.0
      ehci#0 |         0  0.0         0  0.0
    hermon#0 |         0  0.0         0  0.0
    hermon#1 |         0  0.0         0  0.0
    hermon#2 |         0  0.0         0  0.0
    hermon#3 |         0  0.0         0  0.0
       igb#0 |         0  0.0         0  0.0
     ixgbe#0 |         0  0.0         0  0.0
   mpt_sas#0 |        18  0.0         0  0.0
      vldc#0 |         0  0.0         0  0.0

      device |      cpu0 %tim      cpu1 %tim
-------------+------------------------------
      cnex#0 |         0  0.0         0  0.0
      ehci#0 |         0  0.0         0  0.0
    hermon#0 |         0  0.0         0  0.0
    hermon#1 |         0  0.0         0  0.0
    hermon#2 |         0  0.0         0  0.0
    hermon#3 |         0  0.0         0  0.0
       igb#0 |         0  0.0         0  0.0
     ixgbe#0 |         0  0.0         0  0.0
   mpt_sas#0 |        53  0.2         0  0.0
      vldc#0 |         0  0.0         0  0.0
^C

Check the outputs of the following as well.

# echo ::interrupts | mdb -k
# echo ::interrupts -d | mdb -k

-2-


Physical Location of Disk : croinfo & diskinfo commands

Both croinfo and diskinfo commands provide information about the chassis, receptacle, and occupant relative to all disks or a specific disk. Note that croinfo and diskinfo utilities share the same executable binary and function in a identical manner. The main difference being the defaults used by each of the utilities.

eg.,

# croinfo
D:devchassis-path               t:occupant-type  c:occupant-compdev
------------------------------  ---------------  ---------------------
/dev/chassis//SYS/MB/HDD0/disk  disk             c0t5000CCA0125411FCd0
/dev/chassis//SYS/MB/HDD1/disk  disk             c0t5000CCA0125341F0d0
/dev/chassis//SYS/MB/HDD2       -                -
/dev/chassis//SYS/MB/HDD3       -                -
/dev/chassis//SYS/MB/HDD4/disk  disk             c0t5000CCA012541218d0
/dev/chassis//SYS/MB/HDD5/disk  disk             c0t5000CCA01248F0B8d0
/dev/chassis//SYS/MB/HDD6/disk  disk             c0t500151795956778Ed0
/dev/chassis//SYS/MB/HDD7/disk  disk             c0t5001517959567690d0

# diskinfo -oDcpd
D:devchassis-path               c:occupant-compdev     p:occupant-paths                                                               d:occupant-devices
------------------------------  ---------------------  -----------------------------------------------------------------------------  -----------------------------------------
/dev/chassis//SYS/MB/HDD0/disk  c0t5000CCA0125411FCd0  /devices/pci@400/pci@1/pci@0/pci@0/LSI,sas@0/iport@1/disk@w5000cca0125411fd,0  /devices/scsi_vhci/disk@g5000cca0125411fc
/dev/chassis//SYS/MB/HDD1/disk  c0t5000CCA0125341F0d0  /devices/pci@400/pci@1/pci@0/pci@0/LSI,sas@0/iport@2/disk@w5000cca0125341f1,0  /devices/scsi_vhci/disk@g5000cca0125341f0
/dev/chassis//SYS/MB/HDD2       -                      -                                                                              -
/dev/chassis//SYS/MB/HDD3       -                      -                                                                              -
/dev/chassis//SYS/MB/HDD4/disk  c0t5000CCA012541218d0  /devices/pci@700/pci@1/pci@0/pci@0/LSI,sas@0/iport@1/disk@w5000cca012541219,0  /devices/scsi_vhci/disk@g5000cca012541218
/dev/chassis//SYS/MB/HDD5/disk  c0t5000CCA01248F0B8d0  /devices/pci@700/pci@1/pci@0/pci@0/LSI,sas@0/iport@2/disk@w5000cca01248f0b9,0  /devices/scsi_vhci/disk@g5000cca01248f0b8
/dev/chassis//SYS/MB/HDD6/disk  c0t500151795956778Ed0  /devices/pci@700/pci@1/pci@0/pci@0/LSI,sas@0/iport@4/disk@w500151795956778e,0  /devices/scsi_vhci/disk@g500151795956778e
/dev/chassis//SYS/MB/HDD7/disk  c0t5001517959567690d0  /devices/pci@700/pci@1/pci@0/pci@0/LSI,sas@0/iport@8/disk@w5001517959567690,0  /devices/scsi_vhci/disk@g5001517959567690

-3-


Monitoring Network Traffic Statistics : dlstat command

dlstat command reports network traffic statistics for all datalinks or a specific datalink on a system.

eg.,

# dlstat -i 5 net0
           LINK    IPKTS   RBYTES    OPKTS   OBYTES
           net0  163.12M   39.93G  206.14M   43.63G
           net0      312  196.59K      146  370.80K
           net0      198  172.18K      121  121.98K
           net0      168   91.23K       93  195.57K
^C

For the complete list of options along with examples, please consult the Solaris Documentation.

-4-


Fault Management : fmstat utility

Solaris Fault Manager gathers and diagnoses problems detected by the system software, and initiates self-healing activities such as disabling faulty components. fmstat utility can be used to check the statistics associated with the Fault Manager.

fmadm config lists out all active fault management modules that are currently participating in fault management. -m option can be used to report the diagnostic statistics related to a specific fault management module. fmstat without any option report stats from all fault management modules.

eg.,

# fmstat 5
module             ev_recv ev_acpt wait  svc_t  %w  %b  open solve  memsz  bufsz
cpumem-retire            0       0  1.0 8922.5  96   0     0     0    12b      0
disk-diagnosis        1342       0  1.1 8526.0  96   0     0     0      0      0
disk-transport           0       0  1.0 8600.3  96   1     0     0    56b      0
...
...
zfs-diagnosis          139      75  1.0 8864.5  96   0     4    12   672b   608b
zfs-retire             608       0  0.0   15.2   0   0     0     0     4b      0
...
...

# fmstat -m cpumem-retire 5
                NAME VALUE            DESCRIPTION
           auto_flts 0                auto-close faults received
            bad_flts 0                invalid fault events received
     cacheline_fails 0                cacheline faults unresolveable
      cacheline_flts 0                cacheline faults resolved
    cacheline_nonent 0                non-existent retires
   cacheline_repairs 0                cacheline faults repaired
      cacheline_supp 0                cacheline offlines suppressed
	...
	...

-5-


InfiniBand devices : List & Show Information about each device

ibv_devices lists out all available IB devices whereas ibv_devinfo shows information about all devices or a specific IB device.

eg.,

# ibv_devices
    device                 node GUID
    ------              ----------------
    mlx4_0              0021280001cee63a
    mlx4_1              0021280001cee492
    mlx4_2              0021280001cee4aa
    mlx4_3              0021280001cee4ea

# ibv_devinfo -d mlx4_0
hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.7.8130
        node_guid:                      0021:2800:01ce:e63a
        sys_image_guid:                 0021:2800:01ce:e63d
        vendor_id:                      0x02c9
        vendor_part_id:                 26428
        hw_ver:                         0xB0
        board_id:                       SUN0160000002
        phys_port_cnt:                  2
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                2048 (4)
                        active_mtu:             2048 (4)
                        sm_lid:                 56
                        port_lid:               95
                        port_lmc:               0x00
                        link_layer:             IB

                port:   2
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                2048 (4)
                        active_mtu:             2048 (4)
                        sm_lid:                 56
                        port_lid:               96
                        port_lmc:               0x00
                        link_layer:             IB

Other commands and utilities such as ibstatus, fwflash or cfgadm can also be used to retrieve similar information.

-6-


PCIe Hot-Plugging : hotplug command

When the hotplug service is enabled on a Solaris system, hotplug command to bring hot pluggable devices online or offline without physically adding or removing the device from the system.

The following command lists out the all physical [hotplug] connectors along with the current status.

eg.,

# hotplug list -c
Connection           State           Description
________________________________________________________________________________
IOU2-EMS2            ENABLED         PCIe-Native
IOU2-PCIE6           ENABLED         PCIe-Native
IOU2-PCIE7           EMPTY           PCIe-Native
IOU2-PCIE4           EMPTY           PCIe-Native
IOU2-PCIE1           EMPTY           PCIe-Native

For detailed instructions to hotplug a device, check the Solaris documentation out.



Fancy Separator Credit: jkneb

Tuesday Mar 31, 2015

Locality Group Observability on Solaris

Modern multi-socket servers exhibit NUMA characteristics that may hurt application performance if ignored. On a NUMA system (Non-uniform Memory Access), all memory is shared between/among processors. Each processor has access to its own memory - local memory - as well as memory that is local to another processor -- remote memory. However the memory access time (latency) depends on the memory location relative to the processor. A processor can access its local memory faster than the remote memory, and these varying memory latencies play a big role in application performance.

Solaris organizes the hardware resources -- CPU, memory and I/O devices -- into one or more logical groups based on their proximity to each other in such a way that all the hardware resources in a group are considered local to that group. These groups are referred as locality groups or NUMA nodes. In other words, a locality group (lgroup) is an abstraction that tells what hardware resources are near each other on a NUMA system. Each locality group has at least one processor and possibly some associated memory and/or IO devices. To minimize the impact of NUMA characteristics, Solaris considers the lgroup based physical topology when mapping threads and data to CPUs and memory.

Note that even though Solaris attempts to provide good performance out of the box, some applications may still suffer the impact of NUMA either due to misconfiguration of the hardware/software or some other reason. Engineered systems such as Oracle SuperCluster go to great lengths in setting up customer environments to minimize the impact of NUMA so applications perform as expected in a predictable manner. Still application developers and system/application administrators need to take NUMA factor into account while developing for and managing applications on large systems. Solaris provided tools and APIs can be used to observe, diagnose, control and even correct or fix the issues related to locality and latency. Rest of this post is about the tools that can be used to examine the locality of cores, memory and I/O devices.

Sample outputs are collected from a SPARC T4-4 server.

Locality Group Hierarchy

lgrpinfo prints information about the lgroup hierarchy and its contents. It is useful in understanding the context in which the OS is trying to optimize applications for locality, and also in figuring out which CPUs are closer, how much memory is near them, and the relative latencies between the CPUs and different memory blocks.

eg.,

# lgrpinfo -a

lgroup 0 (root):
        Children: 1-4
        CPUs: 0-255
        Memory: installed 1024G, allocated 75G, free 948G
        Lgroup resources: 1-4 (CPU); 1-4 (memory)
        Latency: 18
lgroup 1 (leaf):
        Children: none, Parent: 0
        CPUs: 0-63
        Memory: installed 256G, allocated 18G, free 238G
        Lgroup resources: 1 (CPU); 1 (memory)
        Load: 0.0227
        Latency: 12
lgroup 2 (leaf):
        Children: none, Parent: 0
        CPUs: 64-127
        Memory: installed 256G, allocated 15G, free 241G
        Lgroup resources: 2 (CPU); 2 (memory)
        Load: 0.000153
        Latency: 12
lgroup 3 (leaf):
        Children: none, Parent: 0
        CPUs: 128-191
        Memory: installed 256G, allocated 20G, free 236G
        Lgroup resources: 3 (CPU); 3 (memory)
        Load: 0.016
        Latency: 12
lgroup 4 (leaf):
        Children: none, Parent: 0
        CPUs: 192-255
        Memory: installed 256G, allocated 23G, free 233G
        Lgroup resources: 4 (CPU); 4 (memory)
        Load: 0.00824
        Latency: 12

Lgroup latencies:

------------------
  |  0  1  2  3  4
------------------
0 | 18 18 18 18 18
1 | 18 12 18 18 18
2 | 18 18 12 18 18
3 | 18 18 18 12 18
4 | 18 18 18 18 12
------------------

CPU Locality

lgrpinfo utility shown above already provides CPU locality in a clear manner. Here is another way to retrieve the association between CPU ids and lgroups.

# echo ::lgrp -p | mdb -k

   LGRPID  PSRSETID      LOAD      #CPU      CPUS
        1         0     17873        64      0-63
        2         0     17755        64      64-127
        3         0      2256        64      128-191
        4         0     18173        64      192-255

Memory Locality

lgrpinfo utility shown above shows the total memory that belongs to each of the locality groups. However, it doesn't show exactly what memory blocks belong to what locality groups. One of mdb's debugger command (dcmd) helps retrieve this information.

1. List memory blocks

# ldm list-devices -a memory

MEMORY
     PA                   SIZE            BOUND
     0xa00000             32M             _sys_
     0x2a00000            96M             _sys_
     0x8a00000            374M            _sys_
     0x20000000           1048064M        primary


2. Print the physical memory layout of the system

# echo ::syslayout | mdb -k

         STARTPA            ENDPA  SIZE  MG MN    STL    ETL
        20000000        200000000  7.5g   0  0      4     40
       200000000        400000000    8g   1  1    800    840
       400000000        600000000    8g   2  2   1000   1040
       600000000        800000000    8g   3  3   1800   1840
       800000000        a00000000    8g   0  0     40     80
       a00000000        c00000000    8g   1  1    840    880
       c00000000        e00000000    8g   2  2   1040   1080
       e00000000       1000000000    8g   3  3   1840   1880
      1000000000       1200000000    8g   0  0     80     c0
      1200000000       1400000000    8g   1  1    880    8c0
      1400000000       1600000000    8g   2  2   1080   10c0
      1600000000       1800000000    8g   3  3   1880   18c0
	...
	...

The values under MN column (memory node) can be treated as lgroup numbers after adding 1 to existing values. For example, a value of zero under MN translates to lgroup 1, 1 under MN translate to lgroup 2 and so on. Better yet, ::mnode debugger command lists out the mapping of mnodes to lgroups as shown below.

# echo ::mnode | mdb -k

           MNODE ID LGRP ASLEEP UTOTAL  UFREE UCACHE KTOTAL  KFREE KCACHE
     2075ad80000  0    1      -   249g   237g   114m   5.7g   714m      -
     2075ad802c0  1    2      -   240g   236g   288m    15g   4.8g      -
     2075ad80580  2    3      -   246g   234g   619m   9.6g   951m      -
     2075ad80840  3    4      -   247g   231g    24m     9g   897m      -

Unrelated notes:

  • Main memory on T4-4 is interleaved across all memory banks with 8 GB interleave size -- meaning first 8 GB chunk excluding _sys_ blocks will be populated in lgroup 1 closer to processor #1, second 8 GB chunk in lgroup 2 closer to processor #2, third 8 GB chunk in lgroup 3 closer to processor #3, fourth 8 GB chunk in lgroup 4 closer to processor #4 and then the fifth 8 GB chunk again in lgroup 1 closer to processor #1 and so on. Memory is not interleaved on T5 and M6 systems (confirm by running the ::syslayout dcmd). Conceptually memory interleaving is similar to disk striping.

  • Keep in mind that debugger commands (dcmd) are not committed - thus, there is no guarantee that they continue to work on future versions of Solaris. Some of these dcmds may not work on some of the existing versions of Solaris.

I/O Device Locality

-d option to lgrpinfo utility accepts a specified path to an I/O device and return the lgroup IDs closest to that device. Each I/O device on the system can be connected to one or more NUMA nodes - so, it is not uncommon to see more than one lgroup ID returned by lgrpinfo.

eg.,

# lgrpinfo -d /dev/dsk/c1t0d0
lgroup ID : 1

# dladm show-phys | grep 10000
net4              Ethernet             up         10000  full      ixgbe0

# lgrpinfo -d /dev/ixgbe0
lgroup ID : 1

# dladm show-phys | grep ibp0
net12             Infiniband           up         32000  unknown   ibp0

# lgrpinfo -d /dev/ibp0
lgroup IDs : 1-4

NUMA IO Groups

Debugger command ::numaio_group shows information about all NUMA I/O Groups.

# dladm show-phys | grep up
net0              Ethernet             up         1000   full      igb0
net12             Ethernet             up         10     full      usbecm2
net4              Ethernet             up         10000  full      ixgbe0

# echo ::numaio_group | mdb -k
            ADDR GROUP_NAME                     CONSTRAINT
    10050e1eba48 net4                  lgrp : 1
    10050e1ebbb0 net0                  lgrp : 1
    10050e1ebd18 usbecm2               lgrp : 1
    10050e1ebe80 scsi_hba_ngrp_mpt_sas1  lgrp : 4
    10050e1ebef8 scsi_hba_ngrp_mpt_sas0  lgrp : 1

Relying on prtconf is another way to find the NUMA IO locality for an IO device.

eg.,

# dladm show-phys | grep up | grep ixgbe
net4              Ethernet             up         10000  full      ixgbe0

== Find the device path for the network interface ==
# grep ixgbe /etc/path_to_inst | grep " 0 "
"/pci@400/pci@1/pci@0/pci@4/network@0" 0 "ixgbe"

== Find NUMA IO Lgroups ==
# prtconf -v /devices/pci@400/pci@1/pci@0/pci@4/network@0
	...
    Hardware properties:
	...
        name='numaio-lgrps' type=int items=1
            value=00000001
	...

Resource Groups

list-rsrc-group subcommand of the Logical Domains Manager command line interface (ldm) shows a consolidated list of processor cores, memory blocks and IO devices that belong to each resource group. This subcommand is available in ldm 3.2 and later versions.

In a Resource Group, resources are grouped based on the underlying physical relationship between cores, memory, and I/O buses. On different hardware platforms, some of the server configurations such as SPARC M7-8 may have a Resource Group that maps directly to a Locality Group.

# ldm ls-rsrc-group
NAME                                    CORE  MEMORY   IO
/SYS/CMIOU0                             32    480G     4
/SYS/CMIOU3                             32    480G     4

# ldm ls-rsrc-group -l /SYS/CMIOU0
NAME                                    CORE  MEMORY   IO
/SYS/CMIOU0                             32    480G     4

CORE
    CID                                             BOUND
    0, 1, 2, 3, 8, 9, 10, 11                        primary
    16, 17, 18, 19, 24, 25                          primary
    ...

MEMORY
    PA               SIZE             BOUND
    0x0              60M              _sys_
    0x3c00000        32M              _sys_
    0x5c00000        94M              _sys_
    0x4c000000       64M              _sys_
    0x50000000       15104M           primary
    0x400000000      128G             primary
    ...
    0x7400000000     16128M           primary
    0x77f0000000     64M              _sys_
    0x77f4000000     192M             _sys_

IO
    DEVICE           PSEUDONYM        BOUND
    pci@300          pci_0            primary
    pci@301          pci_1            primary
    pci@303          pci_3            primary
    pci@304          pci_4            primary

Process, Thread Locality

  • -H of prstat command shows the home lgroup of active user processes and threads.

  • -h of ps command can be used to examine the home lgroup of all user processes and threads. -H option can be used to list all processes that are in a certain locality group.
            [Related] Solaris assigns a thread to an lgroup when the thread is created. That lgroup is called the thread's home lgroup. Solaris runs the thread on the CPUs in the thread's home lgroup and allocates memory from that lgroup whenever possible.

  • plgrp tool shows the placement of threads among locality groups. Same tool can be used to set the home locality group and lgroup affinities for one or more processes, threads, or LWPs.

  • -L option of pmap command shows the lgroup that contains the physical memory backing some virtual memory.
            [Related] Breakdown of Oracle SGA into Solaris Locality Groups

  • Memory placement among lgroups can possibly be achieved using pmadvise when the application is running or by using madvise() system call during development, which provides advice to the kernel's virtual memory manager. The OS will use this hint to determine how to allocate memory for the specified range. This mechanism is beneficial when the administrators and developers understand the target application's data access patterns.

    It is not possible to specify memory placement locality for OSM & ISM segments using pmadvise command or madvise() system call (DISM is an exception).

Examples:

# prstat -H

   PID USERNAME  SIZE   RSS STATE   PRI NICE      TIME  CPU LGRP PROCESS/NLWP
  1865 root      420M  414M sleep    59    0 447:51:13 0.1%    2 java/108
  3659 oracle   1428M 1413M sleep    38    0  68:39:28 0.0%    4 oracle/1
  1814 oracle    155M  110M sleep    59    0  70:45:17 0.0%    4 gipcd.bin/9
     8 root        0K    0K sleep    60    -  70:52:21 0.0%    0 vmtasks/257
  3765 root      447M  413M sleep    59    0  29:24:20 0.0%    3 crsd.bin/43
  3949 oracle    505M  456M sleep    59    0   0:59:42 0.0%    2 java/124
 10825 oracle   1097M 1074M sleep    59    0  18:13:27 0.0%    3 oracle/1
  3941 root      210M  184M sleep    59    0  20:03:37 0.0%    4 orarootagent.bi/14
  3743 root      119M   98M sleep   110    -  24:53:29 0.0%    1 osysmond.bin/13
  3324 oracle    266M  225M sleep   110    -  19:52:31 0.0%    4 ocssd.bin/34
  1585 oracle    122M   91M sleep    59    0  18:06:34 0.0%    3 evmd.bin/10
  3918 oracle    168M  144M sleep    58    0  14:35:31 0.0%    1 oraagent.bin/28
  3427 root      112M   80M sleep    59    0  12:34:28 0.0%    4 octssd.bin/12
  3635 oracle   1425M 1406M sleep   101    -  13:55:31 0.0%    4 oracle/1
  1951 root      183M  161M sleep    59    0   9:26:51 0.0%    4 orarootagent.bi/21
Total: 251 processes, 2414 lwps, load averages: 1.37, 1.46, 1.47

== Locality group 2 is the home lgroup of the java process with pid 1865 == 

# plgrp 1865

     PID/LWPID    HOME
    1865/1        2
    1865/2        2
	...
	...
    1865/22       4
    1865/23       4
	...
	...
    1865/41       1
    1865/42       1
	...
	...
    1865/60       3
    1865/61       3
	...
	...

# plgrp 1865 | awk '{print $2}' | grep 2 | wc -l
      30

# plgrp 1865 | awk '{print $2}' | grep 1 | wc -l
      25

# plgrp 1865 | awk '{print $2}' | grep 3 | wc -l
      25

# plgrp 1865 | awk '{print $2}' | grep 4 | wc -l
      28


== Let's reset the home lgroup of the java process id 1865 to 4 ==


# plgrp -H 4 1865
     PID/LWPID    HOME
    1865/1        2 => 4
    1865/2        2 => 4
    1865/3        2 => 4
    1865/4        2 => 4
	...
	...
    1865/184      1 => 4
    1865/188      4 => 4

# plgrp 1865 | awk '{print $2}' | egrep "1|2|3" | wc -l
       0

# plgrp 1865 | awk '{print $2}' | grep 4 | wc -l
     108

# prstat -H -p 1865

   PID USERNAME  SIZE   RSS STATE   PRI NICE      TIME  CPU LGRP PROCESS/NLWP
  1865 root      420M  414M sleep    59    0 447:57:30 0.1%    4 java/108

== List the home lgroup of all processes ==

# ps -aeH
  PID LGRP TTY         TIME CMD
    0    0 ?           0:11 sched
    5    0 ?           4:47 zpool-rp
    1    4 ?          21:04 init
    8    0 ?        4253:54 vmtasks
   75    4 ?           0:13 ipmgmtd
   11    3 ?           3:09 svc.star
   13    4 ?           2:45 svc.conf
 3322    1 ?         301:51 cssdagen
	...
11155    3 ?           0:52 oracle
13091    4 ?           0:00 sshd
13124    3 pts/5       0:00 bash
24703    4 pts/8       0:00 bash
12812    2 pts/3       0:00 bash
	...

== Find out the lgroups which shared memory segments are allocated from ==

# pmap -Ls 24513 | egrep "Lgrp|256M|2G"

         Address       Bytes Pgsz Mode   Lgrp Mapped File
0000000400000000   33554432K   2G rwxs-    1   [ osm shmid=0x78000047 ]
0000000C00000000     262144K 256M rwxs-    3   [ osm shmid=0x78000048 ]
0000000C10000000     524288K 256M rwxs-    2   [ osm shmid=0x78000048 ]
0000000C30000000     262144K 256M rwxs-    3   [ osm shmid=0x78000048 ]
0000000C40000000     524288K 256M rwxs-    1   [ osm shmid=0x78000048 ]
0000000C60000000     262144K 256M rwxs-    2   [ osm shmid=0x78000048 ]

== Apply MADV_ACCESS_LWP policy advice to a segment at a specific address ==

# pmap -Ls 1865 | grep anon

00000007DAC00000      20480K   4M rw---    4   [ anon ]
00000007DC000000       4096K    - rw---    -   [ anon ]
00000007DFC00000      90112K   4M rw---    4   [ anon ]
00000007F5400000     110592K   4M rw---    4   [ anon ]

# pmadvise -o 7F5400000=access_lwp 1865

# pmap -Ls 1865 | grep anon
00000007DAC00000      20480K   4M rw---    4   [ anon ]
00000007DC000000       4096K    - rw---    -   [ anon ]
00000007DFC00000      90112K   4M rw---    4   [ anon ]
00000007F5400000      73728K   4M rw---    4   [ anon ]
00000007F9C00000      28672K    - rw---    -   [ anon ]
00000007FB800000       8192K   4M rw---    4   [ anon ]

SEE ALSO:

  • - Man pages of lgrpinfo(1), plgrp(1), pmap(1), prstat(1M), ps(1), pmadvise(1), madvise(3C), madv.so.1(1), mdb(1)
  • - Web search keywords: NUMA, cc-NUMA, locality group, lgroup, lgrp, Memory Placement Optimization, MPO

Credit: various internal and external sources

Tuesday Dec 23, 2014

Solaris Studio : C/C++ Dynamic Analysis

First, a reminder - Oracle Solaris Studio 12.4 is now generally available. Check the Solaris Studio 12.4 Data Sheet before downloading the software from Oracle Technology Network.

Dynamic Memory Usage Analysis

Code Analyzer tool in Oracle Solaris Studio compiler suite can analyze static data, dynamic memory access data, and code coverage data collected from binaries that were compiled with the C/C++ compilers in Solaris Studio 12.3 or later. Code Analyzer is supported on Solaris and Oracle Enterprise Linux.

Refer to the static code analysis blog entry for a quick summary of steps involved in performing static analysis. The focus of this blog entry is the dynamic portion of the analysis. In this context, dynamic analysis is the evaluation of an application during runtime for memory related errors. Main objective is to find and debug memory management errors -- robustness and security assurance are nice side effects however limited their extent is.

Code Analyzer relies on another primary Solaris Studio tool, discover, to find runtime errors that are often caused by memory mismanagement. discover looks for potential errors such as accessing outside the bounds of the stack or an array, unallocated memory reads and writes, NULL pointer deferences, memory leaks and double frees. Full list of memory management issues analyzed by Code Analyzer/discover is at: Dynamic Memory Access Issues

discover performs the dynamic analysis by instrumenting the code so that it can keep track of memory operations while the binary is running. During runtime, discover monitors the application's use of memory by interposing on standard memory allocation calls such as malloc(), calloc(), memalign(), valloc() and free(). Fatal memory access errors are detected and reported immediately at the instant the incident occurs, so it is easy to correlate the failure with actual source. This behavior helps in detecting and fixing memory management problems in large applications with ease somewhat. However the effectiveness of this kind of analysis highly depends on the flow of control and data during the execution of target code - hence it is important to test the application with variety of test inputs that may maximize code coverage.

High-level steps in using Code Analyzer for Dynamic Analysis

Given the enhancements and incremental improvements in analytical tools, Solaris Studio 12.4 is recommended for this exercise.

  1. Build the application with debug flags

    –g (C) or -g0 (C++) options generate debug information. It enables Code Analyzer to display source code and line number information for errors and warnings.

    • Linux users: specify –xannotate option on compile/link line in addition to -g and other options
  2. Instrument the binary with discover

    % discover -a -H <filename>.%p.html -o <instrumented_binary> <original_binary>

    where:

    • -a : write the error data to binary-name.analyze/dynamic directory for use by Code Analyzer
    • -H : write the analysis report to <filename>.<pid>.html when the instrumented binary was executed. %p expands to the process id of the application. If you prefer the analysis report in a plain text file, use -w <filename>.%p.txt instead
    • -o : write the instrumented binary to <instrumented_binary>

    Check Command-Line Options page for the full list of discover supported options.

  3. Run the instrumented binary

    .. to collect the dynamic memory access data.

    % ./<instrumented_binary> <args>

  4. Finally examine the analysis report for errors and warnings

Example

The following example demonstrates the above steps using Solaris Studio 12.4 C compiler and discover command-line tool. Same code was used to demonstrate static analysis steps as well.

Few things to be aware of:

  • If the target application preloads one or more functions using LD_PRELOAD environment variable that discover tool need to interpose on for dynamic analysis, the resulting analysis may not be accurate.
  • If the target application uses runtime auditing using LD_AUDIT environment variable, this auditing will conflict with discover tool's use of auditing and may result in undefined behavior.

Reference & Recommended Reading:

  1. Oracle Solaris Studio 12.4 : Code Analyzer User's Guide
  2. Oracle Solaris Studio 12.4 : Discover and Uncover User's Guide

Friday Nov 28, 2014

Solaris Studio 12.4 : C/C++ Static Code Analysis

First things first -- Oracle Solaris Studio 12.4 is now generally available. One of the key features of this release is the support for the latest industry standards including C++11, C11 and OpenMP 4.0. Check the Solaris Studio 12.4 Data Sheet before downloading the software from Oracle Technology Network.

Static Code Analysis

Code Analyzer tool in Oracle Solaris Studio compiler suite can analyze static data, dynamic memory access data, and code coverage data collected from binaries that were compiled with the C/C++ compilers in Solaris Studio 12.3 or later. Code Analyzer is supported on Solaris and Oracle Enterprise Linux.

Primary focus of this blog entry is the static code analysis.

Static code analysis is the process of detecting common programming errors in code during compilation. The static code checking component in Code Analyzer looks for potential errors such as accessing outside the bounds of the array, out of scope variable use, NULL pointer deferences, infinite loops, uninitialized variables, memory leaks and double frees. The following webpage in Solaris Studio 12.4: Code Analyzer User's Guide has the complete list of errors with examples.

    Static Code Issues analyzed by Code Analyzer

High-level steps in using Code Analyzer for Static Code analysis

Given the enhancements and incremental improvements in analysis tools, Solaris Studio 12.4 is recommended for this exercise.

  1. Collect static data

    Compile [all source] and link with –xprevise=yes option.

    • when using Solaris Studio 12.3 compilers, compile with -xanalyze=code option.
    • Linux users: specify –xannotate option on compile/link line in addition to -xprevise=yes|-xanalyze=code.

    During compilation, the C/C++ compiler extracts static errors automatically, and writes the error information to the sub-directory in <binary-name>.analyze directory.

  2. Analyze the static data

    Two options available to analyze and display the errors in a report format.

Example

The following example demonstrates the above steps using Solaris Studio 12.4 C compiler and codean command-line tool.

Few things to be aware of:

  • compilers may not be able to detect all of the static errors in target code especially if the errors are complex.
  • some errors depend on data that is available only at runtime -- perform dynamic analysis as well.
  • some errors are ambiguous, and also might not be actual errors -- expect few false-positives.

Reference & Recommended Reading:
    Oracle Solaris Studio 12.4 Code Analyzer User's Guide

Saturday May 10, 2014

Solaris 11.2 Highlights [Part 2] in 4 Minutes or Less

Part 1: Solaris 11.2 Highlights in 6 Minutes or Less

Highlights contd.,

Package related ..

Minimal Set of System Packages

For the past few years, it is one of the hot topics -- what is the bare minimum [set of packages] needed to run applications. There were a number of blog posts and few technical articles around creating minimal Solaris configurations. Finally users/customers who wish to have their OS installed with minimal set of required system packages for running most of the applications in general, can just install solaris-minimal-server package and not worry about anything else such as removing unwanted packages.

# pkg install pkg:/group/system/solaris-minimal-server

Oracle Database Pre-requisite Package

Until Solaris 11.1, it is up to the users to check the package dependencies and make sure to have those installed before attempting to install Oracle database software especially using graphic installer. Solaris 11.2 frees up the users from the burden of checking and installing individual [required] packages by providing a brand new package called oracle-rdbms-server-12cR1-preinstall. Users just need to install this package for a smoother database software installation later.

# pkg install pkg:/group/prerequisite/oracle/oracle-rdbms-server-12cR1-preinstall

Mirroring a Package Repository

11.2 provides the ability to create local IPS package repositories and keeps them in synch with the IPS package repositories hosted publicly by Oracle Corporation. The key in achieving this is the SMF service svc:/application/pkg/mirror. The following webpage has the essential steps listed on a high-level.

How to Automatically Copy a Repository From the Internet

Another enhancement is the cloning of a package repository using --clone option of pkgrecv command.

Observability related ..

Network traffic diagnostics:

A brand new command, ipstat(1M), reports IP traffic statistics.

# ipstat -?
Usage:	ipstat [-cmnrt] [-a address[,address...]] [-A address[,address...]]
[-d d|u] [-i interface[,interface...]] [-l nlines] [-p protocol[,protocol...]]
[-s key | -S key] [-u R|K|M|G|T|P] [-x opt[=val][,opt[=val]...]]

# ipstat -uM 5

SOURCE                     DEST                       PROTO    INT        BYTES
etc5mdbadm01.us.oracle.com etc2m-appadm01.us.oracle.c TCP      net8       76.3M
etc2m-appadm01.us.oracle.c etc5mdbadm01.us.oracle.com TCP      net8        0.6M
dns1.us.oracle.com         etc2m-appadm01.us.oracle.c UDP      net8        0.0M
169.254.182.76             169.254.182.77             UDP      net20       0.0M
...

Total: bytes in: 76.3M bytes out:  0.6M

Another new command, tcpstat(1M), reports TCP and UDP traffic statistics.

# tcpstat -?
Usage:	tcpstat [-cmnrt] [-a address[,...]] [-A address[,...]] [-d d|u] [-i pid[,...]] 
[-l nlines] [-p port[,...]] [-P port[,...]] [-s key | -S key] [-x opt[=val][,...]] 
[-z zonename[,...]] [interval [count]]

# tcpstat 5

ZONE         PID PROTO  SADDR             SPORT DADDR             DPORT   BYTES
global      1267 TCP    etc5mdbadm01.us.  42972 etc2m-appadm01.u     22   84.3M
global      1267 TCP    etc2m-appadm01.u     22 etc5mdbadm01.us.  42972   48.0K
global      1089 UDP    169.254.182.76      161 169.254.182.77    33436  137.0 
global      1089 UDP    169.254.182.77    33436 169.254.182.76      161   44.0 
...
...

Total: bytes in: 84.3M bytes out: 48.4K

# tcpstat -i 43982 5		<-- TCP stats for a given pid

ZONE         PID PROTO  SADDR             SPORT DADDR             DPORT   BYTES
global     43982 TCP    etc2m-appadm01.u  43524 etc5mdbadm02.us.     22   73.7M
global     43982 TCP    etc5mdbadm02.us.     22 etc2m-appadm01.u  43524   41.9K

Total: bytes in: 42.1K bytes out: 73.7M

Up until 11.1, it is not so straight-forward to figure out what process created a network endpoint -- one has to rely on a combination of commands such as netstat, pfiles or lsof and proc filesystem (/proc) to extract that information. Solaris 11.2 attempts to make it easy by enhancing the existing tool netstat(1M). Enhanced netstat(1M) shows what user, pid created and control a network endpoint. -u is the magic flag.

#  netstat -aun			<-- notice the -u flag in netstat command; and User, Pid, Command columns in the output

UDP: IPv4
   Local Address        Remote Address      User    Pid      Command       State
-------------------- -------------------- -------- ------ -------------- ----------
      *.*                                 root        162 in.mpathd      Unbound
      *.*                                 netadm      765 nwamd          Unbound
      *.55388                             root        805 picld          Idle
	...
	...

TCP: IPv4
   Local Address        Remote Address      User     Pid     Command     Swind  Send-Q  Rwind  Recv-Q    State
-------------------- -------------------- -------- ------ ------------- ------- ------ ------- ------ -----------
10.129.101.1.22      10.129.158.100.38096 root       1267 sshd           128872      0  128872      0 ESTABLISHED
192.168.28.2.49540   192.168.28.1.3260    root          0       2094176      0 1177974      0 ESTABLISHED
127.0.0.1.49118            *.*            root       2943 nmz                 0      0 1048576      0 LISTEN
127.0.0.1.1008             *.*            pkg5srv   16012 httpd.worker        0      0 1048576      0 LISTEN
	...

[x86 only] Memory Access Locality Characterization and Analysis

Solaris 11.2 introduced another brand new tool, numatop(1M), that helps in characterizing the NUMA behavior of processes and threads on systems with Intel Westmere, Sandy Bridge and Ivy Bridge processors. If not installed by default, install the numatop package as shown below.

# pkg install pkg:/diagnostic/numatop

Performance related ..

This is a grey area - so, just be informed that there are some ZFS and Oracle database related performance enhancements.

Starting with 11.2, ZFS synchronous write transactions are committed in parallel, which should help improve the I/O throughput.

Database startup time has been greatly improved in Solaris 11 releases -- it's been further improved in 11.2. Customers with databases that use hundreds of Gigabytes or Terabyte(s) of memory will notice the improvement to the database startup times. Other changes to asynchronous I/O, inter-process communication using event ports etc., help improve the performance of the recent releases of Oracle database such as 12c.

Miscellaneous ..

Java 8

Java 7 is still the default in Solaris 11.2 release, but Java 8 can be installed from the IPS package repository.

eg.,

# pkg install pkg:/developer/java/jdk-8		<-- Java Development Kit
# pkg install pkg:/runtime/java/jre-8		<-- Java Runtime

Bootable USB Media

Solaris 11.2 introduces the support for booting SPARC systems from USB media. Use Solaris Distribution Constructor (requires distribution-constructor package) to create the USB bootable media, or copy a bootable/installation image to the USB media using usbcopy(1M) and dd(1M) commands.

Oracle Hardware Management Pack

Oracle Hardware Management Pack is a set of tools that are integrated into the Solaris OS distribution, that show the existing hardware configuration, help configure hardware RAID volumes, update server firmware, configure ILOM service processor, enable monitoring the hardware using existing tools etc., Look for pkg:/system/management/hmp/hmp-* packages.

Few other interesting packages:

Parallel implementation of bzip2 : compress/pbzip2
NVM Express (nvme) utility : system/storage/nvme-utilities
Utility to administer cluster of servers : terminal/cssh

About

Benchmark announcements, HOW-TOs, Tips and Troubleshooting

Search

Archives
« June 2016
SunMonTueWedThuFriSat
   
1
2
3
4
5
6
7
8
9
10
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
  
       
Today