Tuesday Sep 29, 2015

Solaris: Identifying EFI disks

EFI label supports physical disks and logical volumes that are > 2 TB in size. SMI support is limited to 2 TB.

Listed below are some of the characteristics and patterns that can help identify and differentiate an EFI labeled disk from a SMI labeled disk.

  • Device cxtxd0 [without any slice suffix] represents the entire disk

  • No cylinder information is stored in the EFI label.

  • No overlapping slices / partitions

    • eg.,

      EFI label disk:

      Notice that there are no overlapped partitions and no references to cylinders in the following prtvtoc output.

      % prtvtoc /dev/rdsk/c0t5000CCA04E0DEDD8d0
      * /dev/rdsk/c0t5000CCA04E0DEDD8d0 partition map
      * Dimensions:
      *     512 bytes/sector
      * 390721968 sectors
      * 390721901 accessible sectors
      * Flags:
      *   1: unmountable
      *  10: read-only
      * Unallocated space:
      *       First     Sector    Last
      *       Sector     Count    Sector
      *          34         6        39
      *   390070312    635239 390705550
      *                          First     Sector    Last
      * Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
             0      4    00         40   2097152   2097191
             1      4    00    2097192 384827392 386924583
             4      4    00  386924584   3145728 390070311
             8     11    00  390705551     16384 390721934

      SMI label disk:

      Notice the overlapped partitions (0 & 2. also 2 & 6) and references to cylinders in the following prtvtoc output.

      # prtvtoc /dev/rdsk/c0t5000A72030082BD5d0s2
      * /dev/rdsk/c0t5000A72030082BD5d0s2 partition map
      * Dimensions:
      *     512 bytes/sector
      *      56 sectors/track
      *     224 tracks/cylinder
      *   12544 sectors/cylinder
      *   11429 cylinders
      *   11427 accessible cylinders
      * Flags:
      *   1: unmountable
      *  10: read-only
      *                          First     Sector    Last
      * Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
             0      2    00          0    263424    263423
             1      3    01     263424    263424    526847
             2      5    01          0 143340288 143340287
             6      4    00     526848 142813440 143340287
  • Existence of /dev/[r]dsk/cxtxd0 implies EFI label. In case of SMI label, /dev/[r]dsk/cxtxd0 won't exist.

    • eg.,

      EFI label disk:

      % ls /dev/rdsk/c0t5000CCA04E0DEDD8d0

      SMI label disk:

      # ls /dev/rdsk/c0t5000A72030082BD5d0
      /dev/rdsk/c0t5000A72030082BD5d0: No such file or directory
  • The presence of "wd" (whole disk?) in device path of the physical device may imply EFI label.


    EFI label disk:

    % stat -c "%N" /dev/rdsk/c0t5000CCA04E0DEDD8d0
    β€˜/dev/rdsk/c0t5000CCA04E0DEDD8d0’ -> β€˜../../devices/scsi_vhci/disk@g5000cca04e0dedd8:wd,raw’

    SMI label disk:

    # stat -c "%N" /dev/rdsk/c0t5000A72030082BD5d0s2
    '/dev/rdsk/c0t5000A72030082BD5d0s2' -> '../../devices/scsi_vhci/disk@g5000a72030082bd5:c,raw'
  • As of this writing, devinfo(1M) does not support EFI labeled disks.

    • eg.,

      EFI label disk:

      % devinfo -i /dev/rdsk/c0t5000CCA04E0DEDD8d0
      devinfo: /dev/rdsk/c0t5000CCA04E0DEDD8d0: This operation is not supported on EFI labeled devices

      SMI label disk:

      # devinfo -i /dev/rdsk/c0t5000A72030082BD5d0s2
      /dev/rdsk/c0t5000A72030082BD5d0s2       0       0       12544   512     4

Credit: various internal sources

Tuesday Mar 31, 2015

Locality Group Observability on Solaris

Modern multi-socket servers exhibit NUMA characteristics that may hurt application performance if ignored. On a NUMA system (Non-uniform Memory Access), all memory is shared between/among processors. Each processor has access to its own memory - local memory - as well as memory that is local to another processor -- remote memory. However the memory access time (latency) depends on the memory location relative to the processor. A processor can access its local memory faster than the remote memory, and these varying memory latencies play a big role in application performance.

Solaris organizes the hardware resources -- CPU, memory and I/O devices -- into one or more logical groups based on their proximity to each other in such a way that all the hardware resources in a group are considered local to that group. These groups are referred as locality groups or NUMA nodes. In other words, a locality group (lgroup) is an abstraction that tells what hardware resources are near each other on a NUMA system. Each locality group has at least one processor and possibly some associated memory and/or IO devices. To minimize the impact of NUMA characteristics, Solaris considers the lgroup based physical topology when mapping threads and data to CPUs and memory.

Note that even though Solaris attempts to provide good performance out of the box, some applications may still suffer the impact of NUMA either due to misconfiguration of the hardware/software or some other reason. Engineered systems such as Oracle SuperCluster go to great lengths in setting up customer environments to minimize the impact of NUMA so applications perform as expected in a predictable manner. Still application developers and system/application administrators need to take NUMA factor into account while developing for and managing applications on large systems. Solaris provided tools and APIs can be used to observe, diagnose, control and even correct or fix the issues related to locality and latency. Rest of this post is about the tools that can be used to examine the locality of cores, memory and I/O devices.

Sample outputs are collected from a SPARC T4-4 server.

Locality Group Hierarchy

lgrpinfo prints information about the lgroup hierarchy and its contents. It is useful in understanding the context in which the OS is trying to optimize applications for locality, and also in figuring out which CPUs are closer, how much memory is near them, and the relative latencies between the CPUs and different memory blocks.


# lgrpinfo -a

lgroup 0 (root):
        Children: 1-4
        CPUs: 0-255
        Memory: installed 1024G, allocated 75G, free 948G
        Lgroup resources: 1-4 (CPU); 1-4 (memory)
        Latency: 18
lgroup 1 (leaf):
        Children: none, Parent: 0
        CPUs: 0-63
        Memory: installed 256G, allocated 18G, free 238G
        Lgroup resources: 1 (CPU); 1 (memory)
        Load: 0.0227
        Latency: 12
lgroup 2 (leaf):
        Children: none, Parent: 0
        CPUs: 64-127
        Memory: installed 256G, allocated 15G, free 241G
        Lgroup resources: 2 (CPU); 2 (memory)
        Load: 0.000153
        Latency: 12
lgroup 3 (leaf):
        Children: none, Parent: 0
        CPUs: 128-191
        Memory: installed 256G, allocated 20G, free 236G
        Lgroup resources: 3 (CPU); 3 (memory)
        Load: 0.016
        Latency: 12
lgroup 4 (leaf):
        Children: none, Parent: 0
        CPUs: 192-255
        Memory: installed 256G, allocated 23G, free 233G
        Lgroup resources: 4 (CPU); 4 (memory)
        Load: 0.00824
        Latency: 12

Lgroup latencies:

  |  0  1  2  3  4
0 | 18 18 18 18 18
1 | 18 12 18 18 18
2 | 18 18 12 18 18
3 | 18 18 18 12 18
4 | 18 18 18 18 12

CPU Locality

lgrpinfo utility shown above already provides CPU locality in a clear manner. Here is another way to retrieve the association between CPU ids and lgroups.

# echo ::lgrp -p | mdb -k

        1         0     17873        64      0-63
        2         0     17755        64      64-127
        3         0      2256        64      128-191
        4         0     18173        64      192-255

Memory Locality

lgrpinfo utility shown above shows the total memory that belongs to each of the locality groups. However, it doesn't show exactly what memory blocks belong to what locality groups. One of mdb's debugger command (dcmd) helps retrieve this information.

1. List memory blocks

# ldm list-devices -a memory

     PA                   SIZE            BOUND
     0xa00000             32M             _sys_
     0x2a00000            96M             _sys_
     0x8a00000            374M            _sys_
     0x20000000           1048064M        primary

2. Print the physical memory layout of the system

# echo ::syslayout | mdb -k

         STARTPA            ENDPA  SIZE  MG MN    STL    ETL
        20000000        200000000  7.5g   0  0      4     40
       200000000        400000000    8g   1  1    800    840
       400000000        600000000    8g   2  2   1000   1040
       600000000        800000000    8g   3  3   1800   1840
       800000000        a00000000    8g   0  0     40     80
       a00000000        c00000000    8g   1  1    840    880
       c00000000        e00000000    8g   2  2   1040   1080
       e00000000       1000000000    8g   3  3   1840   1880
      1000000000       1200000000    8g   0  0     80     c0
      1200000000       1400000000    8g   1  1    880    8c0
      1400000000       1600000000    8g   2  2   1080   10c0
      1600000000       1800000000    8g   3  3   1880   18c0

The values under MN column (memory node) can be treated as lgroup numbers after adding 1 to existing values. For example, a value of zero under MN translates to lgroup 1, 1 under MN translate to lgroup 2 and so on. Better yet, ::mnode debugger command lists out the mapping of mnodes to lgroups as shown below.

# echo ::mnode | mdb -k

     2075ad80000  0    1      -   249g   237g   114m   5.7g   714m      -
     2075ad802c0  1    2      -   240g   236g   288m    15g   4.8g      -
     2075ad80580  2    3      -   246g   234g   619m   9.6g   951m      -
     2075ad80840  3    4      -   247g   231g    24m     9g   897m      -

Unrelated notes:

  • Main memory on T4-4 is interleaved across all memory banks with 8 GB interleave size -- meaning first 8 GB chunk excluding _sys_ blocks will be populated in lgroup 1 closer to processor #1, second 8 GB chunk in lgroup 2 closer to processor #2, third 8 GB chunk in lgroup 3 closer to processor #3, fourth 8 GB chunk in lgroup 4 closer to processor #4 and then the fifth 8 GB chunk again in lgroup 1 closer to processor #1 and so on. Memory is not interleaved on T5 and M6 systems (confirm by running the ::syslayout dcmd). Conceptually memory interleaving is similar to disk striping.

  • Keep in mind that debugger commands (dcmd) are not committed - thus, there is no guarantee that they continue to work on future versions of Solaris. Some of these dcmds may not work on some of the existing versions of Solaris.

I/O Device Locality

-d option to lgrpinfo utility accepts a specified path to an I/O device and return the lgroup IDs closest to that device. Each I/O device on the system can be connected to one or more NUMA nodes - so, it is not uncommon to see more than one lgroup ID returned by lgrpinfo.


# lgrpinfo -d /dev/dsk/c1t0d0
lgroup ID : 1

# dladm show-phys | grep 10000
net4              Ethernet             up         10000  full      ixgbe0

# lgrpinfo -d /dev/ixgbe0
lgroup ID : 1

# dladm show-phys | grep ibp0
net12             Infiniband           up         32000  unknown   ibp0

# lgrpinfo -d /dev/ibp0
lgroup IDs : 1-4

NUMA IO Groups

Debugger command ::numaio_group shows information about all NUMA I/O Groups.

# dladm show-phys | grep up
net0              Ethernet             up         1000   full      igb0
net12             Ethernet             up         10     full      usbecm2
net4              Ethernet             up         10000  full      ixgbe0

# echo ::numaio_group | mdb -k
            ADDR GROUP_NAME                     CONSTRAINT
    10050e1eba48 net4                  lgrp : 1
    10050e1ebbb0 net0                  lgrp : 1
    10050e1ebd18 usbecm2               lgrp : 1
    10050e1ebe80 scsi_hba_ngrp_mpt_sas1  lgrp : 4
    10050e1ebef8 scsi_hba_ngrp_mpt_sas0  lgrp : 1

Relying on prtconf is another way to find the NUMA IO locality for an IO device.


# dladm show-phys | grep up | grep ixgbe
net4              Ethernet             up         10000  full      ixgbe0

== Find the device path for the network interface ==
# grep ixgbe /etc/path_to_inst | grep " 0 "
"/pci@400/pci@1/pci@0/pci@4/network@0" 0 "ixgbe"

== Find NUMA IO Lgroups ==
# prtconf -v /devices/pci@400/pci@1/pci@0/pci@4/network@0
    Hardware properties:
        name='numaio-lgrps' type=int items=1

Resource Groups

list-rsrc-group subcommand of the Logical Domains Manager command line interface (ldm) shows a consolidated list of processor cores, memory blocks and IO devices that belong to each resource group. This subcommand is available in ldm 3.2 and later versions.

In a Resource Group, resources are grouped based on the underlying physical relationship between cores, memory, and I/O buses. On different hardware platforms, some of the server configurations such as SPARC M7-8 may have a Resource Group that maps directly to a Locality Group.

# ldm ls-rsrc-group
NAME                                    CORE  MEMORY   IO
/SYS/CMIOU0                             32    480G     4
/SYS/CMIOU3                             32    480G     4

# ldm ls-rsrc-group -l /SYS/CMIOU0
NAME                                    CORE  MEMORY   IO
/SYS/CMIOU0                             32    480G     4

    CID                                             BOUND
    0, 1, 2, 3, 8, 9, 10, 11                        primary
    16, 17, 18, 19, 24, 25                          primary

    PA               SIZE             BOUND
    0x0              60M              _sys_
    0x3c00000        32M              _sys_
    0x5c00000        94M              _sys_
    0x4c000000       64M              _sys_
    0x50000000       15104M           primary
    0x400000000      128G             primary
    0x7400000000     16128M           primary
    0x77f0000000     64M              _sys_
    0x77f4000000     192M             _sys_

    DEVICE           PSEUDONYM        BOUND
    pci@300          pci_0            primary
    pci@301          pci_1            primary
    pci@303          pci_3            primary
    pci@304          pci_4            primary

Process, Thread Locality

  • -H of prstat command shows the home lgroup of active user processes and threads.

  • -h of ps command can be used to examine the home lgroup of all user processes and threads. -H option can be used to list all processes that are in a certain locality group.
            [Related] Solaris assigns a thread to an lgroup when the thread is created. That lgroup is called the thread's home lgroup. Solaris runs the thread on the CPUs in the thread's home lgroup and allocates memory from that lgroup whenever possible.

  • plgrp tool shows the placement of threads among locality groups. Same tool can be used to set the home locality group and lgroup affinities for one or more processes, threads, or LWPs.

  • -L option of pmap command shows the lgroup that contains the physical memory backing some virtual memory.
            [Related] Breakdown of Oracle SGA into Solaris Locality Groups

  • Memory placement among lgroups can possibly be achieved using pmadvise when the application is running or by using madvise() system call during development, which provides advice to the kernel's virtual memory manager. The OS will use this hint to determine how to allocate memory for the specified range. This mechanism is beneficial when the administrators and developers understand the target application's data access patterns.

    It is not possible to specify memory placement locality for OSM & ISM segments using pmadvise command or madvise() system call (DISM is an exception).


# prstat -H

  1865 root      420M  414M sleep    59    0 447:51:13 0.1%    2 java/108
  3659 oracle   1428M 1413M sleep    38    0  68:39:28 0.0%    4 oracle/1
  1814 oracle    155M  110M sleep    59    0  70:45:17 0.0%    4 gipcd.bin/9
     8 root        0K    0K sleep    60    -  70:52:21 0.0%    0 vmtasks/257
  3765 root      447M  413M sleep    59    0  29:24:20 0.0%    3 crsd.bin/43
  3949 oracle    505M  456M sleep    59    0   0:59:42 0.0%    2 java/124
 10825 oracle   1097M 1074M sleep    59    0  18:13:27 0.0%    3 oracle/1
  3941 root      210M  184M sleep    59    0  20:03:37 0.0%    4 orarootagent.bi/14
  3743 root      119M   98M sleep   110    -  24:53:29 0.0%    1 osysmond.bin/13
  3324 oracle    266M  225M sleep   110    -  19:52:31 0.0%    4 ocssd.bin/34
  1585 oracle    122M   91M sleep    59    0  18:06:34 0.0%    3 evmd.bin/10
  3918 oracle    168M  144M sleep    58    0  14:35:31 0.0%    1 oraagent.bin/28
  3427 root      112M   80M sleep    59    0  12:34:28 0.0%    4 octssd.bin/12
  3635 oracle   1425M 1406M sleep   101    -  13:55:31 0.0%    4 oracle/1
  1951 root      183M  161M sleep    59    0   9:26:51 0.0%    4 orarootagent.bi/21
Total: 251 processes, 2414 lwps, load averages: 1.37, 1.46, 1.47

== Locality group 2 is the home lgroup of the java process with pid 1865 == 

# plgrp 1865

    1865/1        2
    1865/2        2
    1865/22       4
    1865/23       4
    1865/41       1
    1865/42       1
    1865/60       3
    1865/61       3

# plgrp 1865 | awk '{print $2}' | grep 2 | wc -l

# plgrp 1865 | awk '{print $2}' | grep 1 | wc -l

# plgrp 1865 | awk '{print $2}' | grep 3 | wc -l

# plgrp 1865 | awk '{print $2}' | grep 4 | wc -l

== Let's reset the home lgroup of the java process id 1865 to 4 ==

# plgrp -H 4 1865
    1865/1        2 => 4
    1865/2        2 => 4
    1865/3        2 => 4
    1865/4        2 => 4
    1865/184      1 => 4
    1865/188      4 => 4

# plgrp 1865 | awk '{print $2}' | egrep "1|2|3" | wc -l

# plgrp 1865 | awk '{print $2}' | grep 4 | wc -l

# prstat -H -p 1865

  1865 root      420M  414M sleep    59    0 447:57:30 0.1%    4 java/108

== List the home lgroup of all processes ==

# ps -aeH
    0    0 ?           0:11 sched
    5    0 ?           4:47 zpool-rp
    1    4 ?          21:04 init
    8    0 ?        4253:54 vmtasks
   75    4 ?           0:13 ipmgmtd
   11    3 ?           3:09 svc.star
   13    4 ?           2:45 svc.conf
 3322    1 ?         301:51 cssdagen
11155    3 ?           0:52 oracle
13091    4 ?           0:00 sshd
13124    3 pts/5       0:00 bash
24703    4 pts/8       0:00 bash
12812    2 pts/3       0:00 bash

== Find out the lgroups which shared memory segments are allocated from ==

# pmap -Ls 24513 | egrep "Lgrp|256M|2G"

         Address       Bytes Pgsz Mode   Lgrp Mapped File
0000000400000000   33554432K   2G rwxs-    1   [ osm shmid=0x78000047 ]
0000000C00000000     262144K 256M rwxs-    3   [ osm shmid=0x78000048 ]
0000000C10000000     524288K 256M rwxs-    2   [ osm shmid=0x78000048 ]
0000000C30000000     262144K 256M rwxs-    3   [ osm shmid=0x78000048 ]
0000000C40000000     524288K 256M rwxs-    1   [ osm shmid=0x78000048 ]
0000000C60000000     262144K 256M rwxs-    2   [ osm shmid=0x78000048 ]

== Apply MADV_ACCESS_LWP policy advice to a segment at a specific address ==

# pmap -Ls 1865 | grep anon

00000007DAC00000      20480K   4M rw---    4   [ anon ]
00000007DC000000       4096K    - rw---    -   [ anon ]
00000007DFC00000      90112K   4M rw---    4   [ anon ]
00000007F5400000     110592K   4M rw---    4   [ anon ]

# pmadvise -o 7F5400000=access_lwp 1865

# pmap -Ls 1865 | grep anon
00000007DAC00000      20480K   4M rw---    4   [ anon ]
00000007DC000000       4096K    - rw---    -   [ anon ]
00000007DFC00000      90112K   4M rw---    4   [ anon ]
00000007F5400000      73728K   4M rw---    4   [ anon ]
00000007F9C00000      28672K    - rw---    -   [ anon ]
00000007FB800000       8192K   4M rw---    4   [ anon ]


  • - Man pages of lgrpinfo(1), plgrp(1), pmap(1), prstat(1M), ps(1), pmadvise(1), madvise(3C), madv.so.1(1), mdb(1)
  • - Web search keywords: NUMA, cc-NUMA, locality group, lgroup, lgrp, Memory Placement Optimization, MPO

Credit: various internal and external sources

Tuesday Feb 25, 2014

AIX customers: Run for the Hills ..

.. or keep your cool and embrace Solaris.

When Oracle acquired Sun, IBM tried to capitalize the situation just like every other competitor Sun had – doubts raised about Oracle's ability to turn Sun's hardware business around, and Solaris customers were advised to flee SPARC. Fast forward four years .. Oracle appears to have successfully dispelled the doubts with proven long-term commitment to the Solaris/SPARC business with consistent investment and delivery on established roadmaps. Besides, Oracle has been innovating in the server space with engineered systems that are pre-integrated to reduce the cost and complexity of IT infrastructures while increasing productivity and performance.

On the other hand, judging by the recent turn of events at IBM such as selling off critical server technologies, decline in data center business, employee furloughs, layoffs etc., it appears that Big Blue has its own struggles to deal with. In any case, irrespective of what is happening at IBM, AIX customers who are contemplating to migrate to a modern operating platform that is reliable, secure, cloud-ready and offers a rich set of features to virtualize, consolidate, diagnose, debug and most importantly scale and perform, have an attractive alternative — Oracle Solaris. Act before it is too late.

Unfortunately migrating larger deployments from one platform to another is not as easy as migrating desktop users from one operating system to another. So, Oracle put together a bunch of documents to make the AIX to Solaris transition as smooth as possible for the existing AIX customers. Access the AIX-to-Solaris migration pages at:

     Modernizing IBM AIX/Power to Oracle Solaris/SPARC (Oracle Technology Network)

The above pages have pointers to white papers such as IBM AIX to Oracle Solaris Technology Mapping Guide (for system admins, power users), Simplify the Migration of Oracle Database and Oracle Applications from AIX to Oracle Solaris (for DBAs, application specific admins) and IBM AIX Technologies Compared to Oracle Solaris 11 along with hands-on labs, training, blogs and other useful resources. Check those out, and use the contact information available in those pages to speak or chat with relevant Oracle team(s) who can help get started with the migration process. Good luck.

Friday May 31, 2013

Oracle Internet Directory 11g Benchmark on SPARC T5


System Under Test (SUT)     Oracle's SPARC T5-2 server
Software     Oracle Internet Directory 11gR1-PS6
Target Load     50 million user entries
Reference URL     OID/T5 benchmark white paper

Oracle Internet Directory (OID) is an LDAP v3 Directory Server that has multi-threaded, multi-process, multi-instance process architecture with Oracle database as the directory store.


Five test scenarios were executed in this benchmark - each test scenario performing a different type of LDAP operation. The key metrics are throughput -- the number of operations completed per second, and latency -- the time it took in milliseconds to complete an operation.


1. LDAP Search operation : search for and retrieve specific entries from the directory

In this test scenario, each LDAP search operation matches a single unique entry. Each Search operation results in the lookup of an entry in such a way that no client looks up the same entry twice and no two clients lookup the same entry, and all entries are looked-up randomly.

#clients Throughput
1,000 944,624 1.05

2. LDAP Add operation : add entries, their object classes, attributes and values to the directory

In this test scenario, 16 concurrent LDAP clients added 500,000 entries of object class InetOrgPerson with 21 attributes to the directory.

#clients Throughput
16 1,000 15.95

3. LDAP Compare operation : compare a given attribute value to the attribute value in a directory entry

In this test scenario, userpassword attribute was compared. That is, each LDAP Compare operation matches user password of a user.

#clients Throughput
1,000 594,426 1.68

4. LDAP Modify operation : add, delete or replace attributes for entries

In this test scenario, 50 concurrent LDAP clients updated a unique entry each time and a total of 50 million entries were updated. Attribute that is being modified was not indexed

#clients Throughput
50 16,735 2.98

5. LDAP Authentication operation : authenticates the credentials of a user

In this test scenario, 1000 concurrent LDAP clients authenticated 50 million users.

#clients Throughput
1,000 305,307 3.27

BONUS: LDAP Mixed operations Test

In this test scenario, 1000 LDAP clients were used to perform LDAP Search, Bind and Modify operations concurrently.
Operation breakdown (load distribution): Search: 65%. Bind: 30%. Modify: 5%

LDAP Operation #clients Throughput
Search 650 188,832 3.86
Bind 300 87,159 1.08
Modify 50 14,528 12

And finally, the:


1 x Oracle SPARC T5-2 Server
    » 2 x 3.6 GHz SPARC T5 sockets each with 16 Cores (Total Cores: 32) and 8 MB L3 cache
    » 512 GB physical memory
    » 2 x 10 GbE cards
    » 1 x Sun Storage F5100 Flash Array with 80 flash modules
    » Oracle Solaris 11.1 operating system


Major credit goes to our colleague, Ramaprakash Sathyanarayan

Friday Apr 12, 2013

Siebel Benchmark on SPARC T5

Hardly six months after announcing Siebel benchmark results on Oracle SPARC T4 servers, we have a brand new set of Siebel benchmark results on Oracle SPARC T5 servers. There are no updates to the Siebel benchmark kit in the last couple years - so, we continued to use the Siebel benchmark workload to measure the performance of Siebel Financial Services Call Center and Order Management business transactions on the recently announced SPARC T5 servers.

Benchmark Details

The latest Siebel benchmark was executed on a mix of SPARC T5-2, SPARC T4-2 and SPARC T4-1 servers. The benchmark test simulated the actions of a large corporation with 40,000 concurrent active users. To date, this is the highest user count we achieved in a Siebel benchmark.

User Load Breakdown & Achieved Throughput

Siebel Application Module %Total Load #Users Business Trx per Hour
Financial Services Call Center 70 28,000 273,786
Order Management 30 12,000 59,553
Total     100 40,000 333,339

Average Transaction Response Times for both Financial Services Call Center and Order Management transactions were under one second.

Software & Hardware Specification

 Test Component Software Version Server Model Server Qty Per Server Specification OS
Chips Cores vCPUs CPU Speed CPU Type Memory
Application Server Siebel SPARC T5-2 2 2 32 256 3.6 GHz SPARC-T5 512 GB Solaris 10 1/13 (S10U11)
Database Server Oracle 11g R2 SPARC T4-2 1 2 16 128 2.85 GHz SPARC-T4 256 GB Solaris 10 8/11 (S10U10)
Web Server iPlanet Web Server 7.0.9 (7 U9) SPARC T4-1 1 1 8 64 2.85 GHz SPARC-T4 128 GB Solaris 10 8/11 (S10U10)
Load Generator Oracle Application Test Suite 9.21.0043 SunFire X4200 1 2 4 4 2.6 GHz AMD Opteron 285 SE 16 GB Windows 2003 R2 SP2
Load Drivers (Agents) Oracle Application Test Suite 9.21.0043 SunFire X4170 8 2 12 12 2.93 GHz Intel Xeon X5670 48 GB Windows 2003 R2 SP2

Additional Notes:

  • Siebel Gateway Server was configured to run on one of the application server nodes
  • Four Siebel application servers were configured in the Siebel Enterprise to handle 40,000 concurrent users
    • - Each SPARC T5-2 was configured to run two Siebel application server instances
    • - Each of the Siebel application server instances on SPARC T5-2 servers were separated using Solaris virtualization technology, Zones
    • - 40,000 concurrent user sessions were load balanced across all four Siebel application server instances
  • Siebel database was hosted on a Sun Storage F5100 Flash Array consisting 80 x 24 GB flash modules (FMODs)
    • - Siebel benchmark workload is not I/O intensive and does not require flash storage for better I/O performance
  • Fourteen iPlanet Web Server virtual servers were configured with Siebel Web Server Extension (SWSE) plug-in to handle 40,000 concurrent user load
    • - All fourteen iPlanet Web Server instances forwarded HTTP requests from Siebel clients to all four Siebel application server instances in a round robin fashion
  • Oracle Application Test Suite (OATS) was stable and held up amazingly well over the entire duration of the test run.
  • The benchmark test results were validated and thoroughly audited by the Siebel benchmark and PSR teams
    • - Nothing new here. All Sun published Siebel benchmarks including the SPARC T4 one were properly audited before releasing those to the outside world

Resource Utilization

Component #Users CPU% Memory Footprint
Gateway/Application Server 20,000 67.03 205.54 GB
Application Server 20,000 66.09 206.24 GB
Database Server 40,000 33.43 108.72 GB
Web Server 40,000 29.48 14.03 GB

Finally, how does this benchmark stack up against other published benchmarks? Short answer is "very well". Head over to the Oracle Siebel Benchmark White Papers webpage to do the comparison yourself.

[Credit to our hard working colleagues in SAE, Siebel PSR, benchmark and Oracle Platform Integration (OPI) teams. Special thanks to Sumti Jairath and Venkat Krishnaswamy for the last minute fire drill]

Copy of this blog post is also available at:
Siebel Benchmark on SPARC T5

Tuesday Feb 12, 2013

OBIEE 11g Benchmark on SPARC T4

Just like the Siebel 8.1.x/SPARC T4 benchmark post, this one too was overdue for at least four months. In any case, I hope the Oracle BI customers already knew about the OBIEE 11g/SPARC T4 benchmark effort. In here I will try to provide few additional / interesting details that aren't covered in the following Oracle PR that was posted on oracle.com on 09/30/2012.

    SPARC T4 Server Delivers Outstanding Performance on Oracle Business Intelligence Enterprise Edition 11g

Benchmark Details

System Under Test

The entire BI middleware stack including the WebLogic 11g Server, OBI Server, OBI Presentation Server and Java Host was installed and configured on a single SPARC T4-4 server consisting four 8-Core 3.0 GHz SPARC T4 processors (total #cores: 32) and 128 GB physical memory. Oracle Solaris 10 8/11 is the operating system.

BI users were authenticated against Oracle Internet Directory (OID) in this benchmark - hence OID software which was part of Oracle Identity Management was also installed and configured on the system under test (SUT). Oracle BI Server's Query Cache was turned on, and as a result, most of the query results were cached in OBIS layer, that resulted in minimal database activity making it ideal to have the Oracle 11g R2 database server with the OBIEE database running on the same box as well.

Oracle BI database was hosted on a Sun ZFS Storage 7120 Appliance. The BI Web Catalog was under a ZFS/zpool on a couple of SSDs.

Test Scenario

In this benchmark, 25000 concurrent users assumed five different business user roles -- Marketing Executive, Sales Representative, Sales Manager, Sales Vice-president, and Service Manager. The load was distributed equally among those five business user roles. Each of those different BI users accessed five different pre-built dashboards with each dashboard having an average of five reports - a mix of charts, tables and pivot tables - and returning 50-500 rows of aggregated data. The benchmark test scenario included drilling down into multiple levels from a table or chart within a dashboard. There is a 60 second think time between requests, per user.

BI Setup & Test Results

OBIEE 11g was deployed on SUT in a vertical scale-out fashion. Two Oracle BI Presentation Server processes, one Oracle BI Server process, one Java Host process and two instances of WebLogic Managed Servers handled 25,000 concurrent user sessions smoothly. This configuration resulted in a sub-second overall average transaction response time (average of averages over a duration of 120 minutes or 2 hours). On average, 450 business transactions were executed per second, which triggered 750 SQL executions per second.

It took only 52% of CPU on average (~5% system CPU and rest in user land) to do all this work to achieve the throughput outlined above. Since 25,000 unique test/BI users hammered different dashboards consistently, not so surprisingly bulk of the CPU was spent in Oracle BI Presentation Server layer, which took a whopping 29%. BI Server consumed about 10-11% and the rest was shared by Java Host, OID, WebLogic Managed Server instances and the Oracle database.

So, what is the key take away from this whole exercise?

SPARC T4 rocks Oracle BI world. OBIEE 11g/SPARC T4 is an ideal combination that may work well for majority of OBIEE deployments on Solaris platform. Or in marketing jargon - The excellent vertical and horizontal scalability of the SPARC T4 server gives customer the option to scale up as well as scale out growth, to support large BI EE installations, with minimal hardware investment.

Evaluate and decide for yourself.

[Credit to our colleagues in Oracle FMW PSR, ISVe teams and SCA lab support engineers]

Wednesday Jan 30, 2013

Siebel Benchmark on SPARC T4

Siebel is a multi-threaded native application that performs well on Oracle's T-series SPARC hardware. We have several versions of Siebel benchmarks published on previous generation T-series servers ranging from Sun Fire T2000 to Oracle SPARC T3-4. So, it is natural to see that tradition extended to the current genration SPARC T4 as well.

Benchmark Details

29,000 user Siebel benchmark on a mix of SPARC T4-1 and T4-2 servers was announced during the Oracle OpenWorld 2012 event. In this benchmark, Siebel application server instances ran on three SPARC T4-2/Solaris 10 8/11 systems where as the Oracle database server 11gR2 was configured on a single SPARC T4-1/Solaris 11 11/11 system. Several iPlanet web server 7 U9 instances with the Siebel Web Plug-in (SWE) installed ran on one SPARC T4-1/Solaris 10 8/11 system. Siebel database was hosted on a single Sun Storage F5100 flash array consisting 80 flash modules (FMODs) each with capacity 24 GB.

Siebel Call Center and Order Management System are the modules that were tested in the benchmark. The benchmark workload had 70% of virtual users running Siebel Call Center transactions and the remaining 30% vusers running Siebel Order Management System transactions. This benchmark on T4 exhibited sub-second response times on average for both Siebel Call Center and Order Management System modules.

Load balancing at various layers including web and test client systems ensured near uniform load across all web and application server instances. All three Siebel application server systems consumed ~78% CPU on average. The database and web server systems consumed ~53% and ~18% CPU respectively.

All these details are supposed to be available in a standard Oracle|Siebel benchmark template - but for some reason, I couldn't find it on Oracle's Siebel Benchmark White Papers web page yet. Meanwhile check out the following PR that was posted on oracle.com on 09/28/2012.

    SPARC T4 Servers Set World Record on Siebel CRM Benchmark

Looks like the large number of vusers (29,000 to be precise) sets this benchmark apart from the other benchmarks published with the same Siebel benchmark workload.

[Credit to our colleagues in Siebel PSR, benchmark, SAE and ISVe teams]

Monday Oct 15, 2012

Consolidating Oracle E-Business Suite R12 on Oracle's SPARC SuperCluster

An Optimized Solution for Oracle E-Business Suite (EBS) R12 12.1.3 is now available on oracle.com.

    The Oracle Optimized Solution for Oracle E-Business Suite

This solution was centered around the engineered system, SPARC SuperCluster T4-4. Check the business and technical white papers along with a bunch of relevant useful resources online at the above optimized solution page for EBS.

What is an Optimized Solution?

Oracle's Optimized Solutions are designed, tested and fully documented architectures that are tuned for optimal performance and availability. Optimized solutions are NOT pre-packaged, fully tuned, ready-to-install software bundles that can be downloaded and installed. An optimized solution is usually a well documented architecture that was thoroughly tested on a target platform. The technical white paper details the deployed application architecture along with various observations from installing the application on target platform to its behavior and performance in highly available and scalable configurations.

Oracle E-Business Suite R12 Use Case

Multiple E-Business Suite R12 12.1.3 application modules were tested in this optimized solution -- Financials (online - oracle forms & web requests), Order Management (online - oracle forms & web req uests) and HRMS (online - web requests & payroll batch). The solution will be updated with additional application modules, when they are available.

Oracle Solaris Cluster is responsible for the high availability portion of the solution.

Performance Data

For the sake of completeness, test results were also documented in the optimized solution white paper. Those test results are mainly for educational purposes only. They give good sense of application behavior under the circumstances the application was tested. Since the major focus of the optimized solution is around highly available and scalable configurations, the application was configured to me et those criteria. Hence the documented test results are not directly comparable to any other E-Business Suite performance test results published by any vendor including Oracle. Such an attempt may lead to skewed, incorrect conclusions.

Questions & Requests

Feel free to direct your questions to the author of the white papers. If you are a potential customer who would like to test a specific E-Business Suite application module on any non-engineered syste m such as SPARC T4-X or engineered system such as SPARC SuperCluster, contact Oracle Solution Center.

Friday Aug 03, 2012

Enabling 2 GB Large Pages on Solaris 10

Few facts:

  • - 8 KB is the default page size on Solaris 10 and 11 as of this writing
  • - both hardware and software must have support for 2 GB large pages
  • - SPARC T4 hardware is capable of supporting 2 GB pages
  • - Solaris 11 kernel has in-built support for 2 GB pages
  • - Solaris 10 has no default support for 2 GB pages
  • - Memory intensive 64-bit applications may benefit the most from using 2 GB pages


OS: Solaris 10 8/11 (Update 10) or later
Hardware: SPARC T4. eg., SPARC T4-1, T4-2 or T4-4

Steps to enable 2 GB large pages on Solaris 10:

  1. Install the latest kernel patch or ensure that 147440-04 or later was installed

  2. Add the following line to /etc/system and reboot
    • set max_uheap_lpsize=0x80000000

  3. Finally check the output of the following command when the system is back online
    • pagesize -a

    % pagesize -a
    8192		<-- 8K
    65536		<-- 64K
    4194304		<-- 4M
    268435456	<-- 256M
    2147483648	<-- 2G
    % uname -a
    SunOS jar-jar 5.10 Generic_147440-21 sun4v sparc sun4v

Also See:

Thursday Apr 14, 2011

Oracle Solaris: Show Me the CPU, vCPU, Core Counts and the Socket-Core-vCPU Mapping

[Replaced old code with new code on 10/03/11]

It should be easy to find this information just by running an OS command. However for some reason it ain't the case as of today. The user must know few details about the underlying hardware and run multiple commands to figure out the exact number of physical processors, cores etc.,

For the benefit of our customers, here is a simple shell script that displays the number of physical processors, cores, virtual processors, cores per physical processor, number of hardware threads (vCPUs) per core and the virtual CPU mapping for all physical processors and cores on a Solaris system (SPARC or x86/x64). This script showed valid output on recent T-series, M-series hardware as well as on some older hardware - Sun Fire 4800, x4600. Due to the changes in the output of cpu_info over the years, it is possible that the script may return incorrect information in some cases. Since it is just a shell script, tweak the code as you like. The script can be executed by any OS user.

Download the script : showcpucount

% cat showcpucount

--------------------------------------- CUT HERE -------------------------------------------

/usr/bin/kstat -m cpu_info | egrep "chip_id|core_id|module: cpu_info" > /var/tmp/cpu_info.log

nproc=`(grep chip_id /var/tmp/cpu_info.log | awk '{ print $2 }' | sort -u | wc -l | tr -d ' ')`
ncore=`(grep core_id /var/tmp/cpu_info.log | awk '{ print $2 }' | sort -u | wc -l | tr -d ' ')`
vproc=`(grep 'module: cpu_info' /var/tmp/cpu_info.log | awk '{ print $4 }' | sort -u | wc -l | tr -d ' ')`


speedinmhz=`(/usr/bin/kstat -m cpu_info | grep clock_MHz | awk '{ print $2 }' | sort -u)`
speedinghz=`echo "scale=2; $speedinmhz/1000" | bc`

echo "Total number of physical processors: $nproc"
echo "Number of virtual processors: $vproc"
echo "Total number of cores: $ncore"
echo "Number of cores per physical processor: $ncoresperproc"
echo "Number of hardware threads (strands or vCPUs) per core: $nstrandspercore"
echo "Processor speed: $speedinmhz MHz ($speedinghz GHz)"

# now derive the vcpu-to-core mapping based on above information #

echo -e "\n** Socket-Core-vCPU mapping **"
let linenum=2

for ((i = 1; i <= ${nproc}; ++i ))
        chipid=`sed -n ${linenum}p /var/tmp/cpu_info.log | awk '{ print $2 }'`
        echo -e "\nPhysical Processor $i (chip id: $chipid):"

        for ((j = 1; j <= ${ncoresperproc}; ++j ))
                let linenum=($linenum + 1)
                coreid=`sed -n ${linenum}p /var/tmp/cpu_info.log | awk '{ print $2 }'`
                echo -e "\tCore $j (core id: $coreid):"

                let linenum=($linenum - 2)
                vcpustart=`sed -n ${linenum}p /var/tmp/cpu_info.log | awk '{ print $4 }'`

                let linenum=(3 * $nstrandspercore + $linenum - 3)
                vcpuend=`sed -n ${linenum}p /var/tmp/cpu_info.log | awk '{ print $4 }'`

                echo -e "\t\tvCPU ids: $vcpustart - $vcpuend"
                let linenum=($linenum + 4)

rm /var/tmp/cpu_info.log
--------------------------------------- CUT HERE -------------------------------------------

# prtdiag | head -1
System Configuration:  Sun Microsystems  sun4u SPARC Enterprise M4000 Server

# ./showcpucount
Total number of physical processors: 4
Number of virtual processors: 32
Total number of cores: 16
Number of cores per physical processor: 4
Number of hardware threads (strands or vCPUs) per core: 2
Processor speed: 2660 MHz (2.66 GHz)

** Socket-Core-vCPU mapping **

Physical Processor 1 (chip id: 1024):
        Core 1 (core id: 0):
                vCPU ids: 0 - 1
        Core 2 (core id: 2):
                vCPU ids: 2 - 3
        Core 3 (core id: 4):
                vCPU ids: 4 - 5
        Core 4 (core id: 6):
                vCPU ids: 6 - 7

Physical Processor 2 (chip id: 1032):
        Core 1 (core id: 8):
                vCPU ids: 8 - 9
        Core 2 (core id: 10):
                vCPU ids: 10 - 11
        Core 3 (core id: 12):
                vCPU ids: 12 - 13
        Core 4 (core id: 14):
                vCPU ids: 14 - 15

Physical Processor 3 (chip id: 1040):
        Core 1 (core id: 16):
                vCPU ids: 16 - 17
        Core 2 (core id: 18):
                vCPU ids: 18 - 19
        Core 3 (core id: 20):
                vCPU ids: 20 - 21
        Core 4 (core id: 22):
                vCPU ids: 22 - 23

Physical Processor 4 (chip id: 1048):
        Core 1 (core id: 24):
                vCPU ids: 24 - 25
        Core 2 (core id: 26):
                vCPU ids: 26 - 27
        Core 3 (core id: 28):
                vCPU ids: 28 - 29
        Core 4 (core id: 30):
                vCPU ids: 30 - 31

Benchmark announcements, HOW-TOs, Tips and Troubleshooting


« August 2016