Java EE Application Servers, SPARC T4, Solaris Containers, and Resource Pools

I've obtained a substantial performance improvement on a SPARC T4-2 Server running a Java EE Application Server Cluster by deploying the cluster members into Oracle Solaris Containers and binding those containers to cores of the SPARC T4 Processor. This is not a surprising result, in fact, it is consistent with other results that are available on the Internet. See the "references", below, for some examples. Nonetheless, here is a summary of my configuration and results.

(1.0) Before deploying a Java EE Application Server Cluster into a virtualized environment, many decisions need to be made. I'm not claiming that all of the decisions that I have a made will work well for every environment. In fact, I'm not even claiming that all of the decisions are the best possible for my environment. I'm only claiming that of the small sample of configurations that I've tested, this is the one that is working best for me. Here are some of the decisions that needed to be made:

(1.1) Which virtualization option? There are several virtualization options and isolation levels that are available. Options include:

  • Hard partitions:  Dynamic Domains on Sun SPARC Enterprise M-Series Servers
  • Hypervisor based virtualization such as Oracle VM Server for SPARC (LDOMs) on SPARC T-Series Servers
  • OS Virtualization using Oracle Solaris Containers
  • Resource management tools in the Oracle Solaris OS to control the amount of resources an application receives, such as CPU cycles, physical memory, and network bandwidth.

Oracle Solaris Containers provide the right level of isolation and flexibility for my environment. To borrow some words from my friends in marketing, "The SPARC T4 processor leverages the unique, no-cost virtualization capabilities of Oracle Solaris Zones" 

(1.2) How to associate Oracle Solaris Containers with resources? There are several options available to associate containers with resources, including (a) resource pool association (b) dedicated-cpu resources and (c) capped-cpu resources. I chose to create resource pools and associate them with the containers because I wanted explicit control over the cores and virtual processors. 

(1.3) Cluster Topology? Is it best to deploy (a) multiple application servers on one node, (b) one application server on multiple nodes, or (c) multiple application servers on multiple nodes? After a few quick tests, it appears that one application server per Oracle Solaris Container is a good solution.

(1.4) Number of cluster members to deploy? I chose to deploy four big application servers. I would like go back and test many 32-bit application servers, but that is left for another day.

(2.0) Configuration tested.

(2.1) I was using a SPARC T4-2 Server which has 2 CPU and 128 virtual processors. To understand the physical layout of the hardware on Solaris 10, I used the OpenSolaris psrinfo perl script available at http://hub.opensolaris.org/bin/download/Community+Group+performance/files/psrinfo.pl:

test# ./psrinfo.pl -pv
The physical processor has 8 cores and 64 virtual processors (0-63)
The core has 8 virtual processors (0-7)
  The core has 8 virtual processors (8-15)
  The core has 8 virtual processors (16-23)
  The core has 8 virtual processors (24-31)
  The core has 8 virtual processors (32-39)
  The core has 8 virtual processors (40-47)
  The core has 8 virtual processors (48-55)
  The core has 8 virtual processors (56-63)
    SPARC-T4 (chipid 0, clock 2848 MHz)
The physical processor has 8 cores and 64 virtual processors (64-127)
  The core has 8 virtual processors (64-71)
  The core has 8 virtual processors (72-79)
  The core has 8 virtual processors (80-87)
  The core has 8 virtual processors (88-95)
  The core has 8 virtual processors (96-103)
  The core has 8 virtual processors (104-111)
  The core has 8 virtual processors (112-119)
  The core has 8 virtual processors (120-127)
    SPARC-T4 (chipid 1, clock 2848 MHz)

(2.2) The "before" test: without processor binding. I started with a 4-member cluster deployed into 4 Oracle Solaris Containers. Each container used a unique gigabit Ethernet port for HTTP traffic. The containers shared a 10 gigabit Ethernet port for JDBC traffic.

(2.3) The "after" test: with processor binding. I ran one application server in the Global Zone and another application server in each of the three non-global zones (NGZ): 

(3.0) Configuration steps. The following steps need to be repeated for all three Oracle Solaris Containers.

(3.1) Stop AppServers from the BUI.

(3.2) Stop the NGZ.

test# ssh test-z2 init 5

(3.3) Enable resource pools:

test# svcadm enable pools

(3.4) Create the resource pool:

test# poolcfg -dc 'create pool pool-test-z2'

(3.5) Create the processor set:

test# poolcfg -dc 'create pset pset-test-z2'

(3.6) Specify the maximum number of CPU's that may be addd to the processor set:

test# poolcfg -dc 'modify pset pset-test-z2 (uint pset.max=32)'

(3.7) bash syntax to add Virtual CPUs to the processor set:

test# (( i = 64 )); while (( i < 96 )); do poolcfg -dc "transfer to pset pset-test-z2 (cpu $i)"; (( i = i + 1 )) ; done

(3.8) Associate the resource pool with the processor set:

test# poolcfg -dc 'associate pool pool-test-z2 (pset pset-test-z2)'

(3.9) Tell the zone to use the resource pool that has been created:

test# zonecfg -z test-z2 set pool=pool-test-z2

(3.10) Boot the Oracle Solaris Container

test# zoneadm -z test-z2 boot

(3.11) Save the configuration to /etc/pooladm.conf

test# pooladm -s

(4.0) Verification

(4.1) View the processors in each processor set 

test# psrset

user processor set
5: processors 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
user processor set
6: processors 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
user processor set
7: processors 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 12

(4.2) Verify that the Java processes are associated with the processor sets:

test# ps -e -o,vsz,rss,pid,pset,comm | grep java | sort -n

  VSZ     RSS    PID PSET COMMAND
3715416 1543344 25143   5 <JAVA_HOME>/bin/sparcv9/java
3772120 1600088 25142   - <JAVA_HOME>/bin/sparcv9/java
3780960 1608832 25144   6 <JAVA_HOME>/bin/sparcv9/java
3792648 1620560 25145   7 <JAVA_HOME>/bin/sparcv9/java

(5.0) Results. Using the resource pools improves both throughput and response time:

(6.0) Run Time Changes

(6.1) I wanted to show an example which started from scratch, which is why I stopped the Oracle Solaris Containers, configured the pools and booted up fresh. There is no room for confusion. However, the steps should work for running containers. One exception is "zonecfg -z test-z2 set pool=pool-test-z2" which will take effect when the zone is booted.

(6.2) I've shown poolcfg with the '-d' option which specifies that the command will work directly on the kernel state. For example, at runtime, you can move CPU core 12 (virtual processors 96-103) from test-z3 to test-z2 with the following command:

test# (( i = 96 )); while (( i < 104 )); do poolcfg -dc "transfer to pset pset-test-z2 (cpu $i)"; (( i = i + 1 )) ; done

(6.3) To specify a run-time change to a container's pool binding, use the following steps:

Identify the zone ID (first column)

test# zoneadm list -vi
  ID NAME        STATUS     PATH                      BRAND    IP
   0 global      running    /                         native   shared
  28 test-z3     running    /zones/test-z3            native   shared
  31 test-z1     running    /zones/test-z1            native   shared
  32 test-z2     running    /zones/test-z2            native   shared

Modify binding if necessary:

test# poolbind -p pool-test-z2 -i zoneid 32

(7.0) Processor sets are particularly relevant to multi-socket configurations:

Processor sets reduce cross calls (xcal) and migrations (migr) in multi-socket configurations:

Single Socket Test
1 x SPARC T4 Socket
2 x Oracle Solaris Containers
mpstat samples
The impact of processor sets was hardly measurable
 (about a 1% throughput difference)
Without Processor Binding
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
 40    1   0  525   933   24 1793  124  363  153    1  2551   50   7   0  43
 41    2   0  486  1064   24 1873  137  388  159    2  2560   51   7   0  42
 42    1   0  472   973   23 1770  124  352  153    1  2329   49   7   0  44
 43    1   0  415   912   22 1697  115  320  153    1  2175   47   7   0  47
 44    1   0  369   884   22 1665  111  300  150    1  2008   45   6   0  49
 45    2   0  494   902   23 1730  116  324  152    1  2233   46   7   0  47
 46    3   0  918  1075   26 2087  163  470  172    1  2935   55   8   0  38
 47    2   0  672   999   25 1955  143  416  162    1  2777   53   7   0  40
 48    2   0  691   976   25 1904  136  396  159    1  2748   51   7   0  42
 49    3   0  849  1081   24 1933  145  411  163    1  2670   52   7   0  40
With each container bound to 4 cores.
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
 40    1   0  347  1164   20 1810  119  311  210    1  2079   42   6   0  51
 41    1   0  406  1219   21 1898  131  344  214    1  2266   45   7   0  48
 42    1   0  412  1214   21 1902  130  342  212    1  2289   45   7   0  49
 43    2   0  410  1208   21 1905  130  343  219    1  2304   45   7   0  48
 44    1   0  411  1208   21 1906  131  343  214    1  2313   45   7   0  48
 45    1   0  433  1209   21 1917  133  344  215    1  2337   45   7   0  48
 46    2   0  500  1244   24 1989  141  368  218    1  2482   46   7   0  47
 47    1   0  377  1183   21 1871  127  331  211    1  2289   45   7   0  49
 48    1   0  358   961   23 1699   77  202  183    1  2255   41   6   0  53
 49    1   0  339  1008   21 1739   84  216  188    1  2231   41   6   0  53




Two Socket Test
2 x T4 Sockets
4 x Oracle Solaris Container
mpstat sample
The impact of processor sets was substantial
(~25% better throughput)
Without Processor Binding
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
 40    1   0 1277  1553   32 2726  317  942   70    2  2620   66  11   0  24
 41    0   0 1201  1606   30 2618  309  895   71    2  2603   67  11   0  23
 42    1   0 1104  1517   30 2519  295  846   70    2  2499   65  10   0  24
 43    1   0  997  1447   28 2443  283  807   69    2  2374   64  10   0  26
 44    1   0  959  1402   28 2402  277  776   67    2  2336   64  10   0  26
 45    1   0 1057  1466   29 2538  294  841   68    2  2400   64  10   0  26
 46    3   0 2785  1776   35 3273  384 1178   74    2  2841   68  12   0  20
 47    1   0 1508  1610   33 2949  346 1039   72    2  2764   67  11   0  22
 48    2   0 1486  1594   33 2963  346 1036   72    2  2761   67  11   0  22
 49    1   0 1308  1589   32 2741  325  952   71    2  2694   67  11   0  22
With each container bound to 4 cores.
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
 40    1   0  423  1223   20 1841  157  377   60    1  2185   48   7   0  45
 41    1   0  505  1279   22 1942  168  405   65    1  2396   50   7   0  43
 42    1   0  500  1278   22 1941  170  405   65    1  2413   50   7   0  42
 43    2   0  492  1277   22 1955  171  408   64    1  2422   50   8   0  42
 44    1   0  504  1269   22 1941  167  407   64    1  2430   50   7   0  42
 45    1   0  513  1284   22 1977  173  412   64    1  2475   50   8   0  42
 46    2   0  582  1302   25 2021  177  431   67    1  2612   52   8   0  41
 47    1   0  462  1247   21 1918  168  400   62    1  2392   50   7   0  43
 48    1   0  466  1055   25 1777  120  282   56    1  2424   47   7   0  47
 49    1   0  412  1080   22 1789  122  285   56    1  2354   46   7   0  47

    (8.0) References:

    Comments:

    Post a Comment:
    • HTML Syntax: NOT allowed
    About

    user12620111

    Search

    Archives
    « April 2014
    SunMonTueWedThuFriSat
      
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
       
           
    Today