Saturday Jun 11, 2016

Solaris API for Locality Group Observability

[ Related: 1st part @ Locality Group Observability on Solaris ]

As of this writing, Solaris has limited support for lgroup observability using programmatic interfaces. Applications can use the lgroup API to traverse the locality group hierarchy, discover the contents of each locality group and even affect thread and memory placement on desired lgroups.

The man page for liblgrp(3LIB) lists out the currently supported public interfaces, and a brief description for most of those interfaces can be obtained by running man -k lgroup in a shell. In order to use this API, applications must link with locality group library, liblgrp(3LIB).

The following sample code demonstrates lgroup API usage by making several lgrp_*() calls including lgrp_device_lgrps() to find the locality groups that are closest to the specified I/O device.

# cat -n iodevlgrp.c

     1  #include <stdio.h>
     2  #include <stdlib.h>
     3  #include <assert.h>
     4  #include <sys/lgrp_user.h>
     5  #include <sys/types.h>
     7  int main(int argc, char **argv) {
     9          if (argc != 2) {
    10                  fprintf(stderr, "Usage: %s \n", argv[0]);
    11                  exit(1);
    12          }
    14          /*  lgroup interface version check */
    15          if (lgrp_version(LGRP_VER_CURRENT) != LGRP_VER_CURRENT) {
    16              fprintf(stderr, "\nBuilt with unsupported lgroup interface %d",
    17                  LGRP_VER_CURRENT);
    18              exit(1);
    19          }
    21          lgrp_cookie_t cookie =lgrp_init(LGRP_VIEW_OS);
    22          lgrp_id_t node = lgrp_root(cookie);
    24          /* refresh the cookie if stale */
    25          if ( lgrp_cookie_stale(cookie) ) {
    26                  lgrp_fini(cookie);
    27                  cookie =lgrp_init(LGRP_VIEW_OS);
    28          }
    30          int nlgrps = lgrp_nlgrps(cookie);
    31          if ( nlgrps == -1 ) {
    32                  perror("\n lgrp_nlgrps");
    33                  lgrp_fini(cookie);
    34                  exit(1);
    35          }
    36          printf("Number of locality groups on the system: %d\n", (nlgrps-1));
    38          /* lgroups closest to the target device */
    39          int numlgrps = lgrp_device_lgrps(argv[1], NULL, 0);
    40          if (numlgrps == -1 ){
    41                  fprintf(stderr, "I/O device: %s. ", argv[1]);
    42                  perror("lgrp_device_lgrps");
    43          } else {
    44                  printf("lgroups closest to the I/O device %s: ", argv[1]);
    45                  lgrp_id_t *lgrpids = (lgrp_id_t *)calloc(numlgrps, sizeof(lgrp_id_t));
    46                  assert(lgrpids != NULL);
    47                  lgrp_device_lgrps(argv[1], lgrpids, numlgrps);
    48                  for (int i = 0; i < numlgrps; ++i) {
    49                          printf(" %d ", lgrpids[i]);
    50                  }
    51                  free(lgrpids);
    52          }
    53          lgrp_fini(cookie);
    54          printf("\n");
    55          return 0;
    57  }

% cc -o iodevlgrp -llgrp iodevlgrp.c

% ./iodevlgrp /dev/ixgbe0
Number of locality groups on the system: 2
lgroups closest to the I/O device /dev/ixgbe0:  1

% lgrpinfo -d /dev/ixgbe0
lgroup ID : 1

% ./iodevlgrp /dev/ixgbe1
Number of locality groups on the system: 2
lgroups closest to the I/O device /dev/ixgbe1:  2

% lgrpinfo -d /dev/ixgbe1
lgroup ID : 2

Source: Oracle Solaris 11.3 Programming Interfaces Guide

Tuesday Jun 30, 2015

Oracle Database : Profiling PL/SQL Code

AWR gathers and reports performance statistics that are useful in monitoring and tuning databases. Even though the combination of AWR/ASH reports and explain plans are very useful in analyzing the performance of queries, their usefulness is limited when dealing with PL/SQL-heavy applications. AWR and ASH reports help figure out if the database is spending significant amount of time and/or CPU cycles in executing PL/SQL code and in what PL/SQL modules -- however to identify performance bottlenecks withinin PL/SQL modules/routines and to pin-point hot spots in PL/SQL code at line level, DBMS_PROFILER, a database package, might be the right tool to use. Also check the documentation of DBMS_HPROF package out (Credit to guest for the tip/suggestion).

Rest of this post lists out the steps involved (on a high level) in generating the profile using DBMS_PROFILER; and subsequently generating a report or extracting relevant pieces of information.


  1. [Pre-11g env only] Create PLAN_TABLE as SYS user

    SQL> @?/rdbms/admin/utlxplan.sql
    SQL> GRANT ALL ON sys.plan_table TO public;
    SQL> CREATE PUBLIC SYNONYM plan_table FOR sys.plan_table;
  2. Create PL/SQL Profiler tables under the application schema

    As application user:

    SQL> @?/rdbms/admin/proftab.sql
  3. Install DBMS_PROFILER package as SYS user, if missing

    SQL> @?/rdbms/admin/profload.sql
  4. Start PL/SQL Profiler

    As application user:

  5. Execute one or more transactions to be profiled

  6. Stop the PL/SQL Profiler

    As application user:

  7. At this point, there are couple of options to analyze the profile.

    • Easy option - if you have access to My Oracle Support (MOS), download the profiler.sql from MOS Document ID 243755.1 "Script to produce HTML report with top consumers out of PL/SQL Profiler DBMS_PROFILER data", and execute.

      As application user:

      SQL> profiler.sql

      Select appropriate runid from the list displayed on stdout. profiler.sql script generates an HTML report that shows top time consumers for the duration of the execution of the PL/SQL Profiler run.

    • Not-so-Easy option - refer to the DBMS_PROFILER documentation to extract relevant information and details such as top time consumers, top lines sorted by total time, profiled PL/SQL module list, top modules etc.,

      Checking the documentation out for the following three key tables is a good start -- PLSQL_PROFILER_RUNS (information related to a profiling session), PLSQL_PROFILER_UNITS (information about each PL/SQL unit that was executed during the profiler session) and PLSQL_PROFILER_DATA (execution statistics for each line of code in PL/SQL units). Ancillary tables: DBA_SOURCE and PLAN_TABLE.

Tuesday Mar 31, 2015

Locality Group Observability on Solaris

Modern multi-socket servers exhibit NUMA characteristics that may hurt application performance if ignored. On a NUMA system (Non-uniform Memory Access), all memory is shared between/among processors. Each processor has access to its own memory - local memory - as well as memory that is local to another processor -- remote memory. However the memory access time (latency) depends on the memory location relative to the processor. A processor can access its local memory faster than the remote memory, and these varying memory latencies play a big role in application performance.

Solaris organizes the hardware resources -- CPU, memory and I/O devices -- into one or more logical groups based on their proximity to each other in such a way that all the hardware resources in a group are considered local to that group. These groups are referred as locality groups or NUMA nodes. In other words, a locality group (lgroup) is an abstraction that tells what hardware resources are near each other on a NUMA system. Each locality group has at least one processor and possibly some associated memory and/or IO devices. To minimize the impact of NUMA characteristics, Solaris considers the lgroup based physical topology when mapping threads and data to CPUs and memory.

Note that even though Solaris attempts to provide good performance out of the box, some applications may still suffer the impact of NUMA either due to misconfiguration of the hardware/software or some other reason. Engineered systems such as Oracle SuperCluster go to great lengths in setting up customer environments to minimize the impact of NUMA so applications perform as expected in a predictable manner. Still application developers and system/application administrators need to take NUMA factor into account while developing for and managing applications on large systems. Solaris provided tools and APIs can be used to observe, diagnose, control and even correct or fix the issues related to locality and latency. Rest of this post is about the tools that can be used to examine the locality of cores, memory and I/O devices.

Sample outputs are collected from a SPARC T4-4 server.

Locality Group Hierarchy

lgrpinfo prints information about the lgroup hierarchy and its contents. It is useful in understanding the context in which the OS is trying to optimize applications for locality, and also in figuring out which CPUs are closer, how much memory is near them, and the relative latencies between the CPUs and different memory blocks.


# lgrpinfo -a

lgroup 0 (root):
        Children: 1-4
        CPUs: 0-255
        Memory: installed 1024G, allocated 75G, free 948G
        Lgroup resources: 1-4 (CPU); 1-4 (memory)
        Latency: 18
lgroup 1 (leaf):
        Children: none, Parent: 0
        CPUs: 0-63
        Memory: installed 256G, allocated 18G, free 238G
        Lgroup resources: 1 (CPU); 1 (memory)
        Load: 0.0227
        Latency: 12
lgroup 2 (leaf):
        Children: none, Parent: 0
        CPUs: 64-127
        Memory: installed 256G, allocated 15G, free 241G
        Lgroup resources: 2 (CPU); 2 (memory)
        Load: 0.000153
        Latency: 12
lgroup 3 (leaf):
        Children: none, Parent: 0
        CPUs: 128-191
        Memory: installed 256G, allocated 20G, free 236G
        Lgroup resources: 3 (CPU); 3 (memory)
        Load: 0.016
        Latency: 12
lgroup 4 (leaf):
        Children: none, Parent: 0
        CPUs: 192-255
        Memory: installed 256G, allocated 23G, free 233G
        Lgroup resources: 4 (CPU); 4 (memory)
        Load: 0.00824
        Latency: 12

Lgroup latencies:

  |  0  1  2  3  4
0 | 18 18 18 18 18
1 | 18 12 18 18 18
2 | 18 18 12 18 18
3 | 18 18 18 12 18
4 | 18 18 18 18 12

CPU Locality

lgrpinfo utility shown above already provides CPU locality in a clear manner. Here is another way to retrieve the association between CPU ids and lgroups.

# echo ::lgrp -p | mdb -k

        1         0     17873        64      0-63
        2         0     17755        64      64-127
        3         0      2256        64      128-191
        4         0     18173        64      192-255

Memory Locality

lgrpinfo utility shown above shows the total memory that belongs to each of the locality groups. However, it doesn't show exactly what memory blocks belong to what locality groups. One of mdb's debugger command (dcmd) helps retrieve this information.

1. List memory blocks

# ldm list-devices -a memory

     PA                   SIZE            BOUND
     0xa00000             32M             _sys_
     0x2a00000            96M             _sys_
     0x8a00000            374M            _sys_
     0x20000000           1048064M        primary

2. Print the physical memory layout of the system

# echo ::syslayout | mdb -k

         STARTPA            ENDPA  SIZE  MG MN    STL    ETL
        20000000        200000000  7.5g   0  0      4     40
       200000000        400000000    8g   1  1    800    840
       400000000        600000000    8g   2  2   1000   1040
       600000000        800000000    8g   3  3   1800   1840
       800000000        a00000000    8g   0  0     40     80
       a00000000        c00000000    8g   1  1    840    880
       c00000000        e00000000    8g   2  2   1040   1080
       e00000000       1000000000    8g   3  3   1840   1880
      1000000000       1200000000    8g   0  0     80     c0
      1200000000       1400000000    8g   1  1    880    8c0
      1400000000       1600000000    8g   2  2   1080   10c0
      1600000000       1800000000    8g   3  3   1880   18c0

The values under MN column (memory node) can be treated as lgroup numbers after adding 1 to existing values. For example, a value of zero under MN translates to lgroup 1, 1 under MN translate to lgroup 2 and so on. Better yet, ::mnode debugger command lists out the mapping of mnodes to lgroups as shown below.

# echo ::mnode | mdb -k

     2075ad80000  0    1      -   249g   237g   114m   5.7g   714m      -
     2075ad802c0  1    2      -   240g   236g   288m    15g   4.8g      -
     2075ad80580  2    3      -   246g   234g   619m   9.6g   951m      -
     2075ad80840  3    4      -   247g   231g    24m     9g   897m      -

Unrelated notes:

  • Main memory on T4-4 is interleaved across all memory banks with 8 GB interleave size -- meaning first 8 GB chunk excluding _sys_ blocks will be populated in lgroup 1 closer to processor #1, second 8 GB chunk in lgroup 2 closer to processor #2, third 8 GB chunk in lgroup 3 closer to processor #3, fourth 8 GB chunk in lgroup 4 closer to processor #4 and then the fifth 8 GB chunk again in lgroup 1 closer to processor #1 and so on. Memory is not interleaved on T5 and M6 systems (confirm by running the ::syslayout dcmd). Conceptually memory interleaving is similar to disk striping.

  • Keep in mind that debugger commands (dcmd) are not committed - thus, there is no guarantee that they continue to work on future versions of Solaris. Some of these dcmds may not work on some of the existing versions of Solaris.

I/O Device Locality

-d option to lgrpinfo utility accepts a specified path to an I/O device and return the lgroup IDs closest to that device. Each I/O device on the system can be connected to one or more NUMA nodes - so, it is not uncommon to see more than one lgroup ID returned by lgrpinfo.


# lgrpinfo -d /dev/dsk/c1t0d0
lgroup ID : 1

# dladm show-phys | grep 10000
net4              Ethernet             up         10000  full      ixgbe0

# lgrpinfo -d /dev/ixgbe0
lgroup ID : 1

# dladm show-phys | grep ibp0
net12             Infiniband           up         32000  unknown   ibp0

# lgrpinfo -d /dev/ibp0
lgroup IDs : 1-4

NUMA IO Groups

Debugger command ::numaio_group shows information about all NUMA I/O Groups.

# dladm show-phys | grep up
net0              Ethernet             up         1000   full      igb0
net12             Ethernet             up         10     full      usbecm2
net4              Ethernet             up         10000  full      ixgbe0

# echo ::numaio_group | mdb -k
            ADDR GROUP_NAME                     CONSTRAINT
    10050e1eba48 net4                  lgrp : 1
    10050e1ebbb0 net0                  lgrp : 1
    10050e1ebd18 usbecm2               lgrp : 1
    10050e1ebe80 scsi_hba_ngrp_mpt_sas1  lgrp : 4
    10050e1ebef8 scsi_hba_ngrp_mpt_sas0  lgrp : 1

Relying on prtconf is another way to find the NUMA IO locality for an IO device.


# dladm show-phys | grep up | grep ixgbe
net4              Ethernet             up         10000  full      ixgbe0

== Find the device path for the network interface ==
# grep ixgbe /etc/path_to_inst | grep " 0 "
"/pci@400/pci@1/pci@0/pci@4/network@0" 0 "ixgbe"

== Find NUMA IO Lgroups ==
# prtconf -v /devices/pci@400/pci@1/pci@0/pci@4/network@0
    Hardware properties:
        name='numaio-lgrps' type=int items=1

Resource Groups

list-rsrc-group subcommand of the Logical Domains Manager command line interface (ldm) shows a consolidated list of processor cores, memory blocks and IO devices that belong to each resource group. This subcommand is available in ldm 3.2 and later versions.

In a Resource Group, resources are grouped based on the underlying physical relationship between cores, memory, and I/O buses. On different hardware platforms, some of the server configurations such as SPARC M7-8 may have a Resource Group that maps directly to a Locality Group.

# ldm ls-rsrc-group
NAME                                    CORE  MEMORY   IO
/SYS/CMIOU0                             32    480G     4
/SYS/CMIOU3                             32    480G     4

# ldm ls-rsrc-group -l /SYS/CMIOU0
NAME                                    CORE  MEMORY   IO
/SYS/CMIOU0                             32    480G     4

    CID                                             BOUND
    0, 1, 2, 3, 8, 9, 10, 11                        primary
    16, 17, 18, 19, 24, 25                          primary

    PA               SIZE             BOUND
    0x0              60M              _sys_
    0x3c00000        32M              _sys_
    0x5c00000        94M              _sys_
    0x4c000000       64M              _sys_
    0x50000000       15104M           primary
    0x400000000      128G             primary
    0x7400000000     16128M           primary
    0x77f0000000     64M              _sys_
    0x77f4000000     192M             _sys_

    DEVICE           PSEUDONYM        BOUND
    pci@300          pci_0            primary
    pci@301          pci_1            primary
    pci@303          pci_3            primary
    pci@304          pci_4            primary

Process, Thread Locality

  • -H of prstat command shows the home lgroup of active user processes and threads.

  • -h of ps command can be used to examine the home lgroup of all user processes and threads. -H option can be used to list all processes that are in a certain locality group.
            [Related] Solaris assigns a thread to an lgroup when the thread is created. That lgroup is called the thread's home lgroup. Solaris runs the thread on the CPUs in the thread's home lgroup and allocates memory from that lgroup whenever possible.

  • plgrp tool shows the placement of threads among locality groups. Same tool can be used to set the home locality group and lgroup affinities for one or more processes, threads, or LWPs.

  • -L option of pmap command shows the lgroup that contains the physical memory backing some virtual memory.
            [Related] Breakdown of Oracle SGA into Solaris Locality Groups

  • Memory placement among lgroups can possibly be achieved using pmadvise when the application is running or by using madvise() system call during development, which provides advice to the kernel's virtual memory manager. The OS will use this hint to determine how to allocate memory for the specified range. This mechanism is beneficial when the administrators and developers understand the target application's data access patterns.

    It is not possible to specify memory placement locality for OSM & ISM segments using pmadvise command or madvise() system call (DISM is an exception).


# prstat -H

  1865 root      420M  414M sleep    59    0 447:51:13 0.1%    2 java/108
  3659 oracle   1428M 1413M sleep    38    0  68:39:28 0.0%    4 oracle/1
  1814 oracle    155M  110M sleep    59    0  70:45:17 0.0%    4 gipcd.bin/9
     8 root        0K    0K sleep    60    -  70:52:21 0.0%    0 vmtasks/257
  3765 root      447M  413M sleep    59    0  29:24:20 0.0%    3 crsd.bin/43
  3949 oracle    505M  456M sleep    59    0   0:59:42 0.0%    2 java/124
 10825 oracle   1097M 1074M sleep    59    0  18:13:27 0.0%    3 oracle/1
  3941 root      210M  184M sleep    59    0  20:03:37 0.0%    4
  3743 root      119M   98M sleep   110    -  24:53:29 0.0%    1 osysmond.bin/13
  3324 oracle    266M  225M sleep   110    -  19:52:31 0.0%    4 ocssd.bin/34
  1585 oracle    122M   91M sleep    59    0  18:06:34 0.0%    3 evmd.bin/10
  3918 oracle    168M  144M sleep    58    0  14:35:31 0.0%    1 oraagent.bin/28
  3427 root      112M   80M sleep    59    0  12:34:28 0.0%    4 octssd.bin/12
  3635 oracle   1425M 1406M sleep   101    -  13:55:31 0.0%    4 oracle/1
  1951 root      183M  161M sleep    59    0   9:26:51 0.0%    4
Total: 251 processes, 2414 lwps, load averages: 1.37, 1.46, 1.47

== Locality group 2 is the home lgroup of the java process with pid 1865 == 

# plgrp 1865

    1865/1        2
    1865/2        2
    1865/22       4
    1865/23       4
    1865/41       1
    1865/42       1
    1865/60       3
    1865/61       3

# plgrp 1865 | awk '{print $2}' | grep 2 | wc -l

# plgrp 1865 | awk '{print $2}' | grep 1 | wc -l

# plgrp 1865 | awk '{print $2}' | grep 3 | wc -l

# plgrp 1865 | awk '{print $2}' | grep 4 | wc -l

== Let's reset the home lgroup of the java process id 1865 to 4 ==

# plgrp -H 4 1865
    1865/1        2 => 4
    1865/2        2 => 4
    1865/3        2 => 4
    1865/4        2 => 4
    1865/184      1 => 4
    1865/188      4 => 4

# plgrp 1865 | awk '{print $2}' | egrep "1|2|3" | wc -l

# plgrp 1865 | awk '{print $2}' | grep 4 | wc -l

# prstat -H -p 1865

  1865 root      420M  414M sleep    59    0 447:57:30 0.1%    4 java/108

== List the home lgroup of all processes ==

# ps -aeH
    0    0 ?           0:11 sched
    5    0 ?           4:47 zpool-rp
    1    4 ?          21:04 init
    8    0 ?        4253:54 vmtasks
   75    4 ?           0:13 ipmgmtd
   11    3 ?           3:09
   13    4 ?           2:45 svc.conf
 3322    1 ?         301:51 cssdagen
11155    3 ?           0:52 oracle
13091    4 ?           0:00 sshd
13124    3 pts/5       0:00 bash
24703    4 pts/8       0:00 bash
12812    2 pts/3       0:00 bash

== Find out the lgroups which shared memory segments are allocated from ==

# pmap -Ls 24513 | egrep "Lgrp|256M|2G"

         Address       Bytes Pgsz Mode   Lgrp Mapped File
0000000400000000   33554432K   2G rwxs-    1   [ osm shmid=0x78000047 ]
0000000C00000000     262144K 256M rwxs-    3   [ osm shmid=0x78000048 ]
0000000C10000000     524288K 256M rwxs-    2   [ osm shmid=0x78000048 ]
0000000C30000000     262144K 256M rwxs-    3   [ osm shmid=0x78000048 ]
0000000C40000000     524288K 256M rwxs-    1   [ osm shmid=0x78000048 ]
0000000C60000000     262144K 256M rwxs-    2   [ osm shmid=0x78000048 ]

== Apply MADV_ACCESS_LWP policy advice to a segment at a specific address ==

# pmap -Ls 1865 | grep anon

00000007DAC00000      20480K   4M rw---    4   [ anon ]
00000007DC000000       4096K    - rw---    -   [ anon ]
00000007DFC00000      90112K   4M rw---    4   [ anon ]
00000007F5400000     110592K   4M rw---    4   [ anon ]

# pmadvise -o 7F5400000=access_lwp 1865

# pmap -Ls 1865 | grep anon
00000007DAC00000      20480K   4M rw---    4   [ anon ]
00000007DC000000       4096K    - rw---    -   [ anon ]
00000007DFC00000      90112K   4M rw---    4   [ anon ]
00000007F5400000      73728K   4M rw---    4   [ anon ]
00000007F9C00000      28672K    - rw---    -   [ anon ]
00000007FB800000       8192K   4M rw---    4   [ anon ]


  • - Man pages of lgrpinfo(1), plgrp(1), pmap(1), prstat(1M), ps(1), pmadvise(1), madvise(3C),, mdb(1)
  • - Web search keywords: NUMA, cc-NUMA, locality group, lgroup, lgrp, Memory Placement Optimization, MPO

Credit: various internal and external sources

Tuesday Dec 23, 2014

Solaris Studio : C/C++ Dynamic Analysis

First, a reminder - Oracle Solaris Studio 12.4 is now generally available. Check the Solaris Studio 12.4 Data Sheet before downloading the software from Oracle Technology Network.

Dynamic Memory Usage Analysis

Code Analyzer tool in Oracle Solaris Studio compiler suite can analyze static data, dynamic memory access data, and code coverage data collected from binaries that were compiled with the C/C++ compilers in Solaris Studio 12.3 or later. Code Analyzer is supported on Solaris and Oracle Enterprise Linux.

Refer to the static code analysis blog entry for a quick summary of steps involved in performing static analysis. The focus of this blog entry is the dynamic portion of the analysis. In this context, dynamic analysis is the evaluation of an application during runtime for memory related errors. Main objective is to find and debug memory management errors -- robustness and security assurance are nice side effects however limited their extent is.

Code Analyzer relies on another primary Solaris Studio tool, discover, to find runtime errors that are often caused by memory mismanagement. discover looks for potential errors such as accessing outside the bounds of the stack or an array, unallocated memory reads and writes, NULL pointer deferences, memory leaks and double frees. Full list of memory management issues analyzed by Code Analyzer/discover is at: Dynamic Memory Access Issues

discover performs the dynamic analysis by instrumenting the code so that it can keep track of memory operations while the binary is running. During runtime, discover monitors the application's use of memory by interposing on standard memory allocation calls such as malloc(), calloc(), memalign(), valloc() and free(). Fatal memory access errors are detected and reported immediately at the instant the incident occurs, so it is easy to correlate the failure with actual source. This behavior helps in detecting and fixing memory management problems in large applications with ease somewhat. However the effectiveness of this kind of analysis highly depends on the flow of control and data during the execution of target code - hence it is important to test the application with variety of test inputs that may maximize code coverage.

High-level steps in using Code Analyzer for Dynamic Analysis

Given the enhancements and incremental improvements in analytical tools, Solaris Studio 12.4 is recommended for this exercise.

  1. Build the application with debug flags

    –g (C) or -g0 (C++) options generate debug information. It enables Code Analyzer to display source code and line number information for errors and warnings.

    • Linux users: specify –xannotate option on compile/link line in addition to -g and other options
  2. Instrument the binary with discover

    % discover -a -H <filename>.%p.html -o <instrumented_binary> <original_binary>


    • -a : write the error data to binary-name.analyze/dynamic directory for use by Code Analyzer
    • -H : write the analysis report to <filename>.<pid>.html when the instrumented binary was executed. %p expands to the process id of the application. If you prefer the analysis report in a plain text file, use -w <filename>.%p.txt instead
    • -o : write the instrumented binary to <instrumented_binary>

    Check Command-Line Options page for the full list of discover supported options.

  3. Run the instrumented binary

    .. to collect the dynamic memory access data.

    % ./<instrumented_binary> <args>

  4. Finally examine the analysis report for errors and warnings


The following example demonstrates the above steps using Solaris Studio 12.4 C compiler and discover command-line tool. Same code was used to demonstrate static analysis steps as well.

Few things to be aware of:

  • If the target application preloads one or more functions using LD_PRELOAD environment variable that discover tool need to interpose on for dynamic analysis, the resulting analysis may not be accurate.
  • If the target application uses runtime auditing using LD_AUDIT environment variable, this auditing will conflict with discover tool's use of auditing and may result in undefined behavior.

Reference & Recommended Reading:

  1. Oracle Solaris Studio 12.4 : Code Analyzer User's Guide
  2. Oracle Solaris Studio 12.4 : Discover and Uncover User's Guide

Friday Nov 28, 2014

Solaris Studio 12.4 : C/C++ Static Code Analysis

First things first -- Oracle Solaris Studio 12.4 is now generally available. One of the key features of this release is the support for the latest industry standards including C++11, C11 and OpenMP 4.0. Check the Solaris Studio 12.4 Data Sheet before downloading the software from Oracle Technology Network.

Static Code Analysis

Code Analyzer tool in Oracle Solaris Studio compiler suite can analyze static data, dynamic memory access data, and code coverage data collected from binaries that were compiled with the C/C++ compilers in Solaris Studio 12.3 or later. Code Analyzer is supported on Solaris and Oracle Enterprise Linux.

Primary focus of this blog entry is the static code analysis.

Static code analysis is the process of detecting common programming errors in code during compilation. The static code checking component in Code Analyzer looks for potential errors such as accessing outside the bounds of the array, out of scope variable use, NULL pointer deferences, infinite loops, uninitialized variables, memory leaks and double frees. The following webpage in Solaris Studio 12.4: Code Analyzer User's Guide has the complete list of errors with examples.

    Static Code Issues analyzed by Code Analyzer

High-level steps in using Code Analyzer for Static Code analysis

Given the enhancements and incremental improvements in analysis tools, Solaris Studio 12.4 is recommended for this exercise.

  1. Collect static data

    Compile [all source] and link with –xprevise=yes option.

    • when using Solaris Studio 12.3 compilers, compile with -xanalyze=code option.
    • Linux users: specify –xannotate option on compile/link line in addition to -xprevise=yes|-xanalyze=code.

    During compilation, the C/C++ compiler extracts static errors automatically, and writes the error information to the sub-directory in <binary-name>.analyze directory.

  2. Analyze the static data

    Two options available to analyze and display the errors in a report format.


The following example demonstrates the above steps using Solaris Studio 12.4 C compiler and codean command-line tool.

Few things to be aware of:

  • compilers may not be able to detect all of the static errors in target code especially if the errors are complex.
  • some errors depend on data that is available only at runtime -- perform dynamic analysis as well.
  • some errors are ambiguous, and also might not be actual errors -- expect few false-positives.

Reference & Recommended Reading:
    Oracle Solaris Studio 12.4 Code Analyzer User's Guide

Monday Oct 14, 2013

[Script] Breakdown of Oracle SGA into Solaris Locality Groups

Goal: for a given process, find out how the SGA was allocated in different locality groups on a system running Solaris operating system.

Download the shell script, The script accepts any Oracle database process id as input, and prints out the memory allocated in each locality group.

Usage: ./ <pid>


# prstat -p 12820

 12820 oracle     32G   32G sleep    60  -20   0:00:16 0.0% oracle/2

# ./ 12820

Number of Locality Groups (lgrp): 4

lgroup 1 :   8.56 GB
lgroup 2 :   6.56 GB
lgroup 3 :   6.81 GB
lgroup 4 :  10.07 GB

Total allocated memory:  32.00 GB

For those who wants to have a quick look at the source code, here it is.

# cat


# check the argument count
if [ $# -lt 1 ]
        echo "usage: ./ <oracle pid>"
        exit 1

# find the number of locality groups
lgrp_count=$(kstat -l lgrp | tail -1 | awk -F':' '{ print $2 }')
echo "\nNumber of Locality Groups (lgrp): $lgrp_count"
echo "------------------------------------\n"

# save the ism output using pmap
pmap -sL $1 | grep ism | sort -k5 > /tmp/tmp_pmap_$1

# calculate the total amount of memory allocated in each lgroup
for i in `seq 1 $lgrp_count`
        echo -n "lgroup $i : "
        grep "$i   \[" /tmp/tmp_pmap_$1 | awk '{ print $2 }' | sed 's/K//g' | 
               awk '{ sum+=$1} END {printf ("%6.2f GB\n", sum/(1024*1024))}'

echo -n "Total allocated memory: "
awk '{ print $2 }' /tmp/tmp_pmap_$1 | sed 's/K//g' | awk '{ sum+=$1} END 
         {printf ("%6.2f GB\n\n", sum/(1024*1024))}'

rm /tmp/tmp_pmap_$1

Like many things in life, there will always be a better or simpler way to achieve this. If you find one, do not fret over this approach. Please share, if possible.

Tuesday Jul 30, 2013

Oracle Tips : Solaris lgroups, CT optimization, Data Pump, Recompilation of Objects, ..

1. [Re]compiling all objects in a schema
exec DBMS_UTILITY.compile_schema(schema => 'SCHEMA');

To recompile only the invalid objects in parallel:

exec UTL_RECOMP.recomp_parallel(<NUM_PARALLEL_THREADS>, 'SCHEMA');

A NULL value for SCHEMA recompiles all invalid objects in the database.

2. SGA breakdown in Solaris Locality Groups (lgroup)

To find the breakdown, execute pmap -L | grep shm. Then separate the lines that are related to each locality group and sum up the value in 2nd column to arrive at a number that shows the total SGA memory allocated in that locality group.

(I'm pretty sure there will be a much easier way that I am not currently aware of.)

3. Default values for shared pool, java pool, large pool, ..

If the *pool parameters were not set explicitly, executing the following query is one way to find out what are they currently set to.

SQL> select * from v$sgainfo;

NAME                                  BYTES RES
-------------------------------- ---------- ---
Fixed SGA Size                      2171296 No
Redo Buffers                      373620736 No
Buffer Cache Size                8.2410E+10 Yes
Shared Pool Size                 1.7180E+10 Yes
Large Pool Size                   536870912 Yes
Java Pool Size                   1879048192 Yes
Streams Pool Size                 268435456 Yes
Shared IO Pool Size                       0 Yes
Granule Size                      268435456 No
Maximum SGA Size                 1.0265E+11 No
Startup overhead in Shared Pool  2717729536 No
Free SGA Memory Available                 0
12 rows selected.

4. Fix to PLS-00201: identifier 'GV$SESSION' must be declared error

Grant select privilege on gv_$SESSION to the owner of the database object that failed to compile.

SQL> alter package OWF_MGR.FND_SVC_COMPONENT compile body;
Warning: Package Body altered with compilation errors.

SQL> show errors

-------- -----------------------------------------------------------------
390/22   PL/SQL: Item ignored
390/22   PLS-00201: identifier 'GV$SESSION' must be declared

SQL> grant select on gv_$SESSION to OWF_MGR;
Grant succeeded.

SQL> alter package OWF_MGR.FND_SVC_COMPONENT compile body;
Package body altered.

5. Solaris Critical Thread (CT) optimization for Oracle logwriter (lgrw)

Critical Thread is a new scheduler optimization available in Oracle Solaris releases Solaris 10 Update 10 and later versions. Latency sensitive single threaded components of software such as Oracle database's logwriter benefit from CT optimization.

On a high level, LWPs marked as critical will be granted more exclusive access to the hardware. For example, on SPARC T4 and T5 systems, such a thread will be assigned exclusive access to a core as much as possible. CT optimization won't delay scheduling of any runnable thread in the system.

Critical Thread optimization is enabled by default. However the users of the system have to hint the OS by marking a thread or two "critical" explicitly as shown below.

priocntl -s -c FX -m 60 -p 60 -i pid <pid_of_critical_single_threaded_process>

From database point of view, logwriter (lgwr) is one such process that can benefit from CT optimization on Solaris platform. Oracle DBA's can either make the lgwr process 'critical' once the database is up and running, or can simply patch the database software by installing RDBMS patch 12951619 to let the database take care of it automatically. I believe Oracle 12c does it by default. Future releases of 11g software may make lgwr critical out of the box.

Those who install the database patch 12951619 need to carefully follow the post installation steps documented in the patch README to avoid running into unwanted surprises.

6. ORA-14519 error while importing a table from a Data Pump export dump
ORA-14519: Conflicting tablespace blocksizes for table : Tablespace XXX block \
size 32768 [partition specification] conflicts with previously specified/implied \
tablespace YYY block size 8192
 [object-level default]
Failing sql is:

All partitions in table XYZ are using 32K blocks whereas the implicit default partition is pointing to a 8K block tablespace. Workaround is to use the REMAP_TABLESPACE option in Data Pump impdp command line to remap the implicit default tablespace of the partitioned table to the tablespace where the rest of partitions are residing.

7. Index building task in Data Pump import process

When Data Pump import process is running, by default, index building is performed with just one thread, which becomes a bottleneck and causes the data import process to take a long time especially if many large tables with millions of rows are being imported into the target database. One way to speed up the import process execution is by skipping index building as part of data import task with the help of EXCLUDE=INDEX impdp command line option. Extract the index definitions for all the skipped indexes from the Data Pump dump file as shown below.

impdp <userid>/<password> directory=<directory> dumpfile=<dump_file>.dmp \
    sqlfile=<index_def_file>.sql INCLUDE=INDEX

Edit <index_def_file>.sql to set the desired number of parallel threads to build each index. And finally execute the <index_def_file>.sql to build the indexes once the data import task is complete.

Friday Aug 03, 2012

Enabling 2 GB Large Pages on Solaris 10

Few facts:

  • - 8 KB is the default page size on Solaris 10 and 11 as of this writing
  • - both hardware and software must have support for 2 GB large pages
  • - SPARC T4 hardware is capable of supporting 2 GB pages
  • - Solaris 11 kernel has in-built support for 2 GB pages
  • - Solaris 10 has no default support for 2 GB pages
  • - Memory intensive 64-bit applications may benefit the most from using 2 GB pages


OS: Solaris 10 8/11 (Update 10) or later
Hardware: SPARC T4. eg., SPARC T4-1, T4-2 or T4-4

Steps to enable 2 GB large pages on Solaris 10:

  1. Install the latest kernel patch or ensure that 147440-04 or later was installed

  2. Add the following line to /etc/system and reboot
    • set max_uheap_lpsize=0x80000000

  3. Finally check the output of the following command when the system is back online
    • pagesize -a

    % pagesize -a
    8192		<-- 8K
    65536		<-- 64K
    4194304		<-- 4M
    268435456	<-- 256M
    2147483648	<-- 2G
    % uname -a
    SunOS jar-jar 5.10 Generic_147440-21 sun4v sparc sun4v

Also See:

Saturday Aug 27, 2011

Oracle 11g: Travel back in time with the Database Flashback

Error recovery, historical reporting, trend analysis, data forensics and fraud detection are just some of the business problems that can be solved by using the Flashback Data Archive feature in Oracle 11g database. The Flashback option can be enabled for the entire database or for a selected set of tables. It can be enabled in the database with no application changes.

At work I usually run performance tests by starting with a clean copy of the database. I analyze the test results at the end of the test, determine the next course of action (tuning), restore the clean copy of the database from a backup, apply the tuning and re-run the performance test. It goes on in a cycle until I'm happy with the overall test result. In some cases especially with large data sets, restoring the database from a backup becomes one of the time consuming tasks. In such situations, using the database flashback to go back to a previously saved restore point saves quite a bit of time. Rest of this blog post demonstrates how to enable database flashback and to go back to a specified restore point. Check Oracle Total Recall with Oracle Database 11g Release 2 white paper for more information on Flashback Data Archive (FDA).


Revert the entire database to a previously saved state at will

Steps to perform

  • Configure the following initialization parameters: db_recovery_file_dest & db_recovery_file_dest_size
  • Enable Archive Log mode
  • Enable database Flashback option
  • Create a restore point. Decide whether to create a normal or a guaranteed restore point
  • Finally flashback database to the created restore point when required

Be aware that there will be some performance and storage overhead in using the database flashback. Evaluate all your options carefully before configuring database flashback.


The following example uses guaranteed restore point to flashback the database in a two-node RAC environment. Most of the example is self-explanatory.

% srvctl status database -d DEMO
Instance DEMO1 is running on node racnode01
Instance DEMO2 is running on node racnode02

/* stop all the database instances except one (anyone) in RAC config */

% srvctl stop instance -d DEMO -i DEMO2


/* put one of the instances in non-cluster mode */

% sqlplus / as sysdba
SQL> alter system set cluster_database=false scope=spfile;

% srvctl stop instance -d DEMO -i DEMO1

% sqlplus / as sysdba
SQL> startup mount

/* enable archive log mode */

SQL> alter database archivelog;

SQL> archive log list
Database log mode	       Archive Mode
Automatic archival	       Enabled
Archive destination	       USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     2
Next log sequence to archive   4
Current log sequence	       4

SQL> show parameter db_recovery_file

------------------------------------ ----------- ------------------------------
db_recovery_file_dest		     string	 +FRA
db_recovery_file_dest_size	     big integer 512G

/* enable flashback option */

SQL> alter database flashback on;

SQL> select flashback_on from v$database;


/* put the instance back in cluster mode and restart the database */

SQL> alter system set cluster_database=true scope=spfile;

SQL> alter database open;

% srvctl stop instance -d DEMO -i DEMO1

% srvctl start database -d DEMO

/* create a guaranteed restore point */

% sqlplus / as sysdba
SQL> create restore point demo_clean_before_test guarantee flashback database;

Restore point created.

SQL> column NAME format A25
SQL> column TIME format A40
SQL> set lines 120

------------------------- ---------- -------------------------------- --- ------------
DEMO_CLEAN_BEFORE_TEST     17460960 21-AUG-11 AM	      YES     67125248

/* flashback database to the saved restore point */

% srvctl stop database -d DEMO


% rman TARGET /


Starting flashback at 21-AUG-11
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed time: 00:00:25

Finished flashback at 21-AUG-11


database opened


% srvctl start database -d DEMO

/* ============================================================================== */

/* alternatively run the following RMAN script as shown below */

% cat restore.rman


% rman TARGET / cmdfile=restore.rman

It is not mandatory to enable logging for flashback database in order to create and use restore points. The requirement in such a case is to put the database in ARCHIVELOG mode and creating the first guaranteed restore point when the database is in mounted state.

Friday Jul 01, 2011

Be Eco-Responsible. Use Resource Management features in Solaris, Oracle Database and Go Green

This blog post is about a white paper that introduces the resource management features in Oracle Solaris and Oracle Database.

Who needs more hardware when under-utilized systems are sitting idle in datacenters consuming valuable space, energy and polluting environment with carbon emissions. These days virtualization and con solidation are more popular than ever before. And then there is Oracle Corporation with great products consisting rich features for resource manageability in consolidated, virtualized and isolated envir onments. Put together all these pieces what do we get? - a complete green solution that helps make this planet a better place.

In an attempt to increase the awareness of the resource management features available in Oracle Solaris and Oracle Database, we just published a four-part white paper surrounding Oracle Solaris Resource Manager and Oracle Database Resource Manager. Those resource managers are not really a product, process or an add-on to Solaris and Oracle Database. "Resource Manager" is the abstract term for the set of software modules that are integrated into the Solaris operating system and Oracle database management system to facilitate resource management from the OS and RDBMS perspective.

The first part of the series introduces: the concept of resource management, Solaris Resource Manager and Oracle Database Resource Manager; compares and contrasts both of those resource managers andends the discussion with a brief overview of resource management in high-availability (HA) environments. Access this paper from the following URL:

        Part 1: Introduction to Resource Management in Oracle Solaris and Oracle Database

The second part is all about the available resource management features in Oracle Solaris. This includes resource management in virtualized environments too. The range of topics vary from the simple process binding to the dynamic reconfiguration of CPU and memory resources in a logical domain (LDom). The value of this paper lies in the volume of examples that follow each of the introduced feature. Access the second part from the following URL:

        Part 2: Effective Resource Management Using Oracle Solaris Resource Manager

Similar to the second part, the third part talks about the available resource management features in Oracle Database Management System. Majority of the discussion revolves around creating resource pl ans in a Oracle database. This part also contains plenty of examples. Access the third part from the following URL:

        Part 3: Effective Resource Management Using Oracle Database Resource Manager

The final part briefly introduces various hardware/software products that makes Oracle Corporation the ideal consolidation platform. Also an imaginary case study was presented to demonstrate how to consolidate multiple applications and databases onto a single server using Oracle virtualization technologies and the resource management features found in Oracle Solaris and Oracle Database. Description of the consolidation plan and the implementation of the plan takes up major portion of this final part. Access the fourth and final part of this series from the following URL:

        Part 4: Resource Management Case Study for Mixed Workloads and Server Sharing

Happy reading.


Benchmark announcements, HOW-TOs, Tips and Troubleshooting


« June 2016