Monday Mar 31, 2014

[Solaris] ZFS Pool History, Writing to System Log, Persistent TCP/IP Tuning, ..

.. with plenty of examples and little comments aside.

[1] Check existing DNS client configuration

Solaris 11 and later:

% svccfg -s network/dns/client listprop config
config                      application        
config/value_authorization astring     solaris.smf.value.name-service.dns.client
config/options             astring     "ndots:2 timeout:3 retrans:3 retry:1"
config/search              astring     "sfbay.sun.com" "us.oracle.com" "oraclecorp.com" "oracle.com" "sun.com"
config/nameserver          net_address xxx.xx.xxx.xx xxx.xx.xxx.xx xxx.xx.xxx.xx

Solaris 10 and prior:

Check the contents of /etc/resolv.conf

% cat /etc/resolv.conf
search  sfbay.sun.com us.oracle.com oraclecorp.com oracle.com sun.com
options ndots:2 timeout:3 retrans:3 retry:1
nameserver      xxx.xx.xxx.xx
nameserver      xxx.xx.xxx.xx
nameserver      xxx.xx.xxx.xx

Note that /etc/resolv.conf file exists on Solaris 11.x releases too as of today.

[2] Logical domains: finding out the hostname of control domain

Use virtinfo(1M) command.

root@ppst58-cn1-app:~# virtinfo -a
Domain role: LDoms guest I/O service root
Domain name: n1d2
Domain UUID: 02ea1fbe-80f9-e0cf-ecd1-934cf9bbeffa
Control domain: ppst58-01
Chassis serial#: AK00083297

The above output shows that n1d2 domain is a guest domain, which is also an I/O domain, the service domain and a root I/O domain. Control domain is running on host ppst58-01.

Output from control domain:

root@ppst58-01:~# ldm list
NAME             STATE      FLAGS   CONS    VCPU  MEMORY   UTIL  NORM  UPTIME
primary          active     -n-cv-  UART    64    130304M  0.1%  0.1%  243d 2h 
n1d1             active     -n----  5001    448   916992M  0.2%  0.2%  3d 15h 26m
n1d2             active     -n--v-  5002    512   1T       0.0%  0.0%  3d 15h 29m

root@ppst58-01:~# virtinfo -a
Domain role: LDoms control I/O service root
Domain name: primary
Domain UUID: 19337210-285a-6ea4-df8f-9dc65714e3ea
Control domain: ppst58-01
Chassis serial#: AK00083297

[3] Administering NFS configuration

Solaris 11 and later:

Use sharectl(1M) command. Solaris 11.x releases include the sharectl administrative tool to configure and manage file-sharing protocols such as NFS, SMB, autofs.

eg.,
Display all property values of NFS:

# sharectl get nfs
servers=1024
lockd_listen_backlog=32
lockd_servers=1024
grace_period=90
server_versmin=2
server_versmax=4
client_versmin=2
client_versmax=4
server_delegation=on
nfsmapid_domain=
max_connections=-1
listen_backlog=32
..
..

# sharectl status
autofs  online client
nfs     disabled

eg.,
Modifying the nfs v4 grace period from the default 90s to 30s:

# sharectl get -p grace_period nfs
grace_period=90
# sharectl set -p grace_period=30 nfs
# sharectl get -p grace_period nfs
grace_period=30

Solaris 10 and prior:

Edit /etc/default/nfs file, and restart NFS related service(s).

[4] Examining ZFS Storage Pool command history

Solaris 10 8/07 and later releases log successful zfs and zpool commands that modify the underlying pool state. All those executed commands can be examined by running zpool history command. Because this command shows the actual zfs commands executed as they are, the 'history' feature is really useful in troubleshooting an error scenario that was resulted from executing some zfs command.

# zpool list
NAME       SIZE  ALLOC  FREE  CAP  DEDUP   HEALTH  ALTROOT
rpool      416G   152G  264G  36%  1.00x   ONLINE  -
zs3actact  848G  17.4G  831G   2%  1.00x   ONLINE  -

# zpool history -l zs3actact
History for 'zs3actact':
2014-03-19.22:02:32 zpool create -f zs3actact c0t600144F0AC6B9D2900005328B7570001d0 [user root on etc25-appadm05:global]
2014-03-19.22:03:12 zfs create zs3actact/iscsivol1 [user root on etc25-appadm05:global]
2014-03-19.22:03:33 zfs set recordsize=128k zs3actact/iscsivol1 [user root on etc25-appadm05:global]

Note that this log is enabled by default, and cannot be disabled.

[5] Modifying TCP/IP configuration parameters

Using ndd(1M) is the old way of tuning TCP/IP parameters, and still supported as of today (in Solaris 11.x releases). However using padm(1M) command is the recommended way to modify or retrieve TCP/IP Internet protocols on Solaris 11.x and later releases.

# ipadm show-prop -p max_buf tcp
PROTO PROPERTY              PERM CURRENT      PERSISTENT   DEFAULT      POSSIBLE
tcp   max_buf               rw   1048576      --           1048576      128000-1073741824

# ipadm set-prop -p max_buf=2097152 tcp

# ipadm show-prop -p max_buf tcp
PROTO PROPERTY              PERM CURRENT      PERSISTENT   DEFAULT      POSSIBLE
tcp   max_buf               rw   2097152      2097152      1048576      128000-1073741824

ndd style (still valid):

# ndd -get /dev/tcp tcp_max_buf
1048576

# ndd -set /dev/tcp tcp_max_buf 2097152

# ndd -get /dev/tcp tcp_max_buf
2097152

One of the advantages of using ipadm over ndd is that the configured/tuned non-default values are persistent across reboots. In case of ndd, we have to re-apply those values either manually or by creating a Run Control script (/etc/rc*.d/S*) to make sure that the intended values are set automatically during a reboot of the system.

[6] Writing to system log from a shell script

Use logger(1) command as shown in the following example.

eg.,

# logger -p local0.warning Big Brother is watching you

# dmesg | tail -1
Mar 30 18:42:14 etc27zadm01 root: [ID 702911 local0.warning] Big Brother is watching you

Check syslog.conf(4) man page for the list of available system facilities and the severity of the condition being logged (levels).

BONUS:

[*] Forceful NFS unmount on Linux

Try the lazy unmount option (-l) on systems running Linux kernel 2.4.11 or later to forcefully unmount a filesystem that keeps throwing Device or resource busy and/or device is busy error(s).

eg.,

# umount -f /bkp
umount2: Device or resource busy
umount: /bkp: device is busy
umount2: Device or resource busy
umount: /bkp: device is busy

# umount -l /bkp
#

Wednesday Mar 26, 2014

Software Availability : Solaris Studio 12.4 Beta & ORAchk

First off, these are two unrelated softwares.

Solaris Studio 12.4 Beta

Nearly two-and-a-half years after the release of Solaris Studio 12.3, Oracle is gearing up for the next major release 12.4. In addition to the compiler and library optimizations to support the latest and greatest SPARC & Intel x64 hardware such as SPARC T5, M5, M6, Fujitsu's M10, and Intel's Ivy Bridge and Haswell line of servers, support for C++ 2011 language standard is one of the highlights of this forthcoming release. The complete list of features and enhancements in release 12.4 are documented in the What's New page.

Those who feel compelled to give the updated/enhanced compilers and tools a try, can get started right away by downloading the beta bits from the following location. This software is available for Solaris 10 & 11 running on SPARC, x86 hardware; and Linux 5 & 6 runnin g on x86/x64 hardware. Anyone can download this software for free.

     Oracle Solaris Studio 12.4 Beta Download

Don't forget to check the Release Notes out for the installation instructions, known issues, limitations and workarounds, features that were removed in this release and so on.

Here's a pointer to the documentation (preview): Oracle Solaris Studio 12.4 Information Library

Finally, should you run into any issue(s) or if you have questions about anything related, feel free to use the Solaris Studio Community Forum.




ORAchk 2.2.4 (formerly known as EXAchk)

ORAchk, the Oracle Configuration Audit Tool, enhances EXAchk tool's functionality, and replaces the existing & popular RACcheck tool. In addition to the top issues reported by users/customers, ORAchk proactively scans for known problems within Oracle Database, Sun systems (especially engineered systems) and Oracle E-Business Suite Financials.

While checking, ORAchk covers a wide range of areas such as OS kernel settings, database installations (single instance and RAC), performance, backup and recovery, storage setting, and so on.

ORAchk generated reports (mostly high level) show the system health risks with the ability to drill down into specific problems and offers recommendations specific to the environment and product configuration. Those who do not like sending this data back to Oracle should be happy to know that there is no phone home feature in this release.

Note that ORAchk is available only for the Oracle Premier Support Customers - meaning only those customers with appropriate support contracts can use this tool. So, if you are a Oracle customer with the ability to access the Oracle Support website, check the following pages out for additional information.

     ORAchk - Oracle Configuration Audit Tool
     ORAchk user's guide

Feel free to use the community forum to ask any related questions.

Friday Feb 28, 2014

[Solaris] Changing hostname, Parallel Compression, pNFS, Upgrading SRUs and Clearing Faults

[1] Solaris 11+ : changing hostname

Starting with Solaris 11, a system's identify (nodename) is configured through the config/nodename service property of the svc:/system/identity:node SMF service. Solaris 10 and prior versions have this information in /etc/nodename configuration file.

The following example demonstrates the commands to change the hostname from "ihcm-db-01" to "ehcm-db-01".

eg.,
# hostname
ihcm-db-01

# svccfg -s system/identity:node listprop config
config                       application        
config/enable_mapping       boolean     true
config/ignore_dhcp_hostname boolean     false
config/nodename             astring     ihcm-db-01
config/loopback             astring     ihcm-db-01
#

# svccfg -s system/identity:node setprop config/nodename="ehcm-db-01"

# svccfg -s system/identity:node refresh  -OR- 
	# svcadm refresh svc:/system/identity:node
# svcadm restart system/identity:node

# svccfg -s system/identity:node listprop config
config                       application        
config/enable_mapping       boolean     true
config/ignore_dhcp_hostname boolean     false
config/nodename             astring     ehcm-db-01
config/loopback             astring     ehcm-db-01

# hostname
ehcm-db-01

[2] Parallel Compression

This topic is not Solaris specific, but certainly helps Solaris users who are frustrated with the single threaded implementation of all officially supported compression tools such as compress, gzip, zip.

pigz (pig-zee) is a parallel implementation of gzip that suits well for the latest multi-processor, multi-core machines. By default, pigz breaks up the input into multiple chunks of size 128 KB, and compress each chunk in parallel with the help of light-weight threads. The number of compress threads is set by default to the number of online processors. The chunk size and the number of threads are configurable.

Compressed files can be restored to their original form using -d option of pigz or gzip tools. As per the man page, decompression is not parallelized out of the box, but may show some improvement compared to the existing old tools.

The following example demonstrates the advantage of using pigz over gzip in compressing and decompressing a large file.

eg.,

Original file, and the target hardware.

$ ls -lh PT8.53.04.tar 
-rw-r--r--   1 psft     dba         4.8G Feb 28 14:03 PT8.53.04.tar

$ psrinfo -pv
The physical processor has 8 cores and 64 virtual processors (0-63)
  The core has 8 virtual processors (0-7)
	...
  The core has 8 virtual processors (56-63)
    SPARC-T5 (chipid 0, clock 3600 MHz)

gzip compression.

$ time gzip --fast PT8.53.04.tar 

real    3m40.125s
user    3m27.105s
sys     0m13.008s

$ ls -lh PT8.53*
-rw-r--r--   1 psft     dba         3.1G Feb 28 14:03 PT8.53.04.tar.gz

/* the following prstat, vmstat outputs show that gzip is compressing the 
	tar file using a single thread - hence low CPU utilization. */

$ prstat -p 42510

   PID USERNAME  SIZE   RSS STATE   PRI NICE      TIME  CPU PROCESS/NLWP      
 42510 psft     2616K 2200K cpu16    10    0   0:01:00 1.5% gzip/1

$ prstat -m -p 42510

   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP  
 42510 psft      95 4.6 0.0 0.0 0.0 0.0 0.0 0.0   0  35  7K   0 gzip/1

$ vmstat 2

 r b w   swap  free  re  mf pi po fr de sr s0 s1 s2 s3   in   sy   cs us sy id
 0 0 0 776242104 917016008 0 7 0 0 0  0  0  0  0 52 52 3286 2606 2178  2  0 98
 1 0 0 776242104 916987888 0 14 0 0 0 0  0  0  0  0  0 3851 3359 2978  2  1 97
 0 0 0 776242104 916962440 0 0 0 0 0  0  0  0  0  0  0 3184 1687 2023  1  0 98
 0 0 0 775971768 916930720 0 0 0 0 0  0  0  0  0 39 37 3392 1819 2210  2  0 98
 0 0 0 775971768 916898016 0 0 0 0 0  0  0  0  0  0  0 3452 1861 2106  2  0 98

pigz compression.

$ time ./pigz PT8.53.04.tar 

real    0m25.111s	<== wall clock time is 25s compared to gzip's 3m 27s
user    17m18.398s
sys     0m37.718s

/* the following prstat, vmstat outputs show that pigz is compressing the 
        tar file using many threads - hence busy system with high CPU utilization. */

$ prstat -p 49734

   PID USERNAME  SIZE   RSS STATE   PRI NICE      TIME  CPU PROCESS/NLWP      
49734 psft       59M   58M sleep    11    0   0:12:58  38% pigz/66

$ vmstat 2

 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr s0 s1 s2 s3   in   sy   cs us sy id
 0 0 0 778097840 919076008 6 113 0 0 0 0 0  0  0 40 36 39330 45797 74148 61 4 35
 0 0 0 777956280 918841720 0 1 0 0 0  0  0  0  0  0  0 38752 43292 71411 64 4 32
 0 0 0 777490336 918334176 0 3 0 0 0  0  0  0  0 17 15 46553 53350 86840 60 4 35
 1 0 0 777274072 918141936 0 1 0 0 0  0  0  0  0 39 34 16122 20202 28319 88 4 9
 1 0 0 777138800 917917376 0 0 0 0 0  0  0  0  0  3  3 46597 51005 86673 56 5 39

$ ls -lh PT8.53.04.tar.gz 
-rw-r--r--   1 psft     dba         3.0G Feb 28 14:03 PT8.53.04.tar.gz

$ gunzip PT8.53.04.tar.gz 	<== shows that the pigz compressed file is 
                                         compatible with gzip/gunzip

$ ls -lh PT8.53*
-rw-r--r--   1 psft     dba         4.8G Feb 28 14:03 PT8.53.04.tar

Decompression.

$ time ./pigz -d PT8.53.04.tar.gz 

real    0m18.068s
user    0m22.437s
sys     0m12.857s

$ time gzip -d PT8.53.04.tar.gz 

real    0m52.806s <== compare gzip's 52s decompression time with pigz's 18s
user    0m42.068s
sys     0m10.736s

$ ls -lh PT8.53.04.tar 
-rw-r--r--   1 psft     dba         4.8G Feb 28 14:03 PT8.53.04.tar

Of course, there are other tools such as Parallel BZIP2 (PBZIP2), which is a parallel implementation of the bzip2 tool are worth a try too. The idea here is to highlight the fact that there are better tools out there to get the job done in a quick manner compared to the existing/old tools that are bundled with the operating system distribution.


[3] Solaris 11+ : Upgrading SRU

Assuming the package repository is set up already to do the network updates on a Solaris 11+ system, the following commands are helpful in upgrading a SRU.

  • List all available SRUs in the repository.

    # pkg list -af entire
  • Upgrade to the latest and greatest.

    # pkg update

    To find out what changes will be made to the system, try a dry run of the system update.

    # pkg update -nv
  • Upgrade to a specific SRU.

    # pkg update entire@<FMRI>

    Find the Fault Managed Resource Identifier (FMRI) string by running pkg list -af entire command.

Note that it is not so easy to downgrade SRU to a lower version as it may break the system. Should there be a need to downgrade or switch between different SRUs, relying on Boot Environments (BE) might be a good idea. Check Creating and Administering Oracle Solaris 11 Boot Environments document for details.


[4] Parallel NFS (pNFS)

Just a quick note — RFC 5661, Network File System (NFS) Version 4.1 introduced a new feature called "Parallel NFS" or pNFS, which allows NFS clients to access storage devices containing file data directly. When file data for a single NFS v4 server is stored on multiple and/or higher-throughput storage devices, using pNFS can result in significant improvement in file access performance. However Parallel NFS is an optional feature in NFS v4.1. Though there was a prototype made available few years ago when OpenSolaris was still alive, as of today, Solaris has no support for pNFS. Stay tuned for any updates from Oracle Solaris teams.

Here is an interesting write-up from one of our colleagues at Oracle|Sun (dated 2007) -- NFSv4.1's pNFS for Solaris.

(Credit to Rob Schneider and Tom Gould for initiating this topic)


[5] SPARC hardware : Check for and clear faults from ILOM

Couple of ways to check the faults using ILOM command line interface.

By running:

  1. show faulty command from ILOM command prompt, or
  2. fmadm faulty command from within the ILOM faultmgmt shell

Once found, use the clear_fault_action property with the set command to clear the fault for a FRU.

The following example checks for the faulty FRUs from ILOM faultmgmt shell, then clears it out.

eg.,

-> start /SP/faultmgmt/shell
Are you sure you want to start /SP/faultmgmt/shell (y/n)? y

faultmgmtsp> fmadm faulty

------------------- ------------------------------------ -------------- --------
Time                UUID                                 msgid          Severity
------------------- ------------------------------------ -------------- --------
2014-02-26/16:17:11 18c62051-c81d-c569-a4e6-e418db2f84b4 PCIEX-8000-SQ  Critical
        ...
        ...
Suspect 1 of 1
   Fault class  : fault.io.pciex.rc.generic-ue
   Certainty    : 100%
   Affects      : hc:///chassis=0/motherboard=0/cpuboard=1/chip=2/hostbridge=4
   Status       : faulted

   FRU
      Status            : faulty
      Location          : /SYS/PM1
      Manufacturer      : Oracle Corporation
      Name              : TLA,PM,T5-4,T5-8
        ...

Description : A fault has been diagnosed by the Host Operating System.

Response    : The service required LED on the chassis and on the affected
              FRU may be illuminated.

        ...

faultmgmtsp> exit

-> set /SYS/PM1 clear_fault_action=True
Are you sure you want to clear /SYS/PM1 (y/n)? y
Set 'clear_fault_action' to 'True'

Note that this procedure clears the fault from the SP but not from the host.

Tuesday Feb 25, 2014

AIX customers: Run for the Hills ..

.. or keep your cool and embrace Solaris.

When Oracle acquired Sun, IBM tried to capitalize the situation just like every other competitor Sun had – doubts raised about Oracle's ability to turn Sun's hardware business around, and Solaris customers were advised to flee SPARC. Fast forward four years .. Oracle appears to have successfully dispelled the doubts with proven long-term commitment to the Solaris/SPARC business with consistent investment and delivery on established roadmaps. Besides, Oracle has been innovating in the server space with engineered systems that are pre-integrated to reduce the cost and complexity of IT infrastructures while increasing productivity and performance.

On the other hand, judging by the recent turn of events at IBM such as selling off critical server technologies, decline in data center business, employee furloughs, layoffs etc., it appears that Big Blue has its own struggles to deal with. In any case, irrespective of what is happening at IBM, AIX customers who are contemplating to migrate to a modern operating platform that is reliable, secure, cloud-ready and offers a rich set of features to virtualize, consolidate, diagnose, debug and most importantly scale and perform, have an attractive alternative — Oracle Solaris. Act before it is too late.

Unfortunately migrating larger deployments from one platform to another is not as easy as migrating desktop users from one operating system to another. So, Oracle put together a bunch of documents to make the AIX to Solaris transition as smooth as possible for the existing AIX customers. Access the AIX-to-Solaris migration pages at:

     http://www.oracle.com/aixtosolaris
     Modernizing IBM AIX/Power to Oracle Solaris/SPARC (Oracle Technology Network)

The above pages have pointers to white papers such as IBM AIX to Oracle Solaris Technology Mapping Guide (for system admins, power users), Simplify the Migration of Oracle Database and Oracle Applications from AIX to Oracle Solaris (for DBAs, application specific admins) and IBM AIX Technologies Compared to Oracle Solaris 11 along with hands-on labs, training, blogs and other useful resources. Check those out, and use the contact information available in those pages to speak or chat with relevant Oracle team(s) who can help get started with the migration process. Good luck.

Friday Jan 31, 2014

Solaris Tips : Automounted NFS, ZFS metaslabs, utility to manage F40 cards, powertop, ..

[1] Mounting NFS on Solaris 10 and later

With a relevant entry in /etc/vfstab, the general expectation is that Solaris automatically mounts the NFS shares upon a system reboot. However users may find that NFS shares are not being auto-mounted on some of the systems running the latest update of Solaris 10 or 11. One reason for this behavior could be the use of the Secure By Default network profile, which was introduced in Solaris 10 11/06. When this networking profile is in use, numerous services including the NFS client service are disabled. For the automounting of NFS shares, we will need the NFS client service running.

The fix is to enable NFS client service along with its dependencies.

# svcs -a | grep nfs\/client
disabled       Jan_17   svc:/network/nfs/client:default

# svcadm  enable -r svc:/network/nfs/client

# svcs -a | grep nfs\/client
online         Jan_20   svc:/network/nfs/client:default

On a similar note, if you want all default services to be enabled as they were in previous Solaris releases, run the following command as privileged user. Then use svcadm(1M) to disable unwanted services.

# netservices open

To switch back to the secure by default profile, run:

# netservices limited

[2] Utility to manage Sun Flash Accelerator F40 PCIe card(s) .. ddcli

The Sun Flash Accelerator F40 PCIe Card has two sets of firmware — NAND flash controller firmware, and SAS controller firmware (host PCIe to SAS controller). Both firmware sets are updated as a single F40 firmware package using the ddcli utility. This utility can be used to locate and display information about the cards in the system, format the cards, monitor the health and extract smart logs (to assist Oracle support in debugging and resolution) for a selected F40 card.

If ddcli utility is not available on systems where the F40 PCIe cards are installed, install patch "16005846: F40 (AURA 2) SW1.1 Release fw (08.05.01.00) and cli utility update" or later version, if available. This patch can be downloaded from support.oracle.com

Note that ddcli utility can be used to service and monitor the health of Sun Flash Accelerator F80 PCIe cards too. Install patch "Patch 17860600: SW1.0 for Sun Flash Acccelerator F80" to get access to the F80 card software package.

[3] Permission denied error when changing a password

An attempt to change the password for a local user 'XYZ' fails with Permission denied error.

# passwd XYZ
New Password: ********
Re-enter new Password: ********
Permission denied

# grep passwd /etc/nsswitch.conf
passwd: files ldap

Users have the flexibility to include and access password information in/from multiple repositories such as files and nis or ldap. Per the man page of passwd(1), when a user has a password stored in one of the name services as well as a local files entry, the passwd command tries to update both. It is possible to have different passwords in the name service and local files entry. Use passwd -r to change a specific password repository.

Hence the fix is to use the -r option in this case to ignore the nsswitch.conf file sequence and update the password information in local /etc files — /etc/passwd and /etc/shadow files.

# passwd -r files XYZ
New Password: ********
Re-enter new Password: ********
passwd: password successfully changed for oracle

[4] Microstate statistics for any process

ptime -m shows the full set of microstate accounting statistics for the lifetime of a given process. prstat -m also reports the microstate process accounting information, but the displayed statistics are accumulated since last display every interval seconds.

# prstat -p 39235

   PID USERNAME  SIZE   RSS STATE   PRI NICE      TIME  CPU PROCESS/NLWP      
 39235 psft     3585M 3320M sleep    59    0   2:23:11 0.0% java/257

# prstat -mp 39235

   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP  
 39235 psft     0.0 0.0 0.0 0.0 0.0  87  13 0.0   0   0   1   0 java/257


# ptime -mp 39235

real 428:31:25.902644700
user  2:06:32.283801209
sys     16:37.056999418
trap        2.250539737
tflt        0.000000000
dflt        2.018347218
kflt        0.000000000
lock 96013:52:37.184929717
slp  14349:50:02.286168683
lat      3:11.510473038
stop        0.002468763

In the above example, java process with pid 39235 spent most of its time sleeping waiting to acquire locks in user space (ref: 'lock' field). It also spent a lot of time in just sleeping waiting for some work (ref: 'slp' field). User CPU time is the next major one (ref: 'user' field). The process spent a little bit of time in system space (ref: 'sys' field), waiting for CPU (ref: 'lat' field) and almost negligible amount of time in processing system traps (ref: 'trap' field) and in servicing data page faults (ref: 'dflt' field).

[5] ZFS : metaslab utilization

ZFS divides the space on each device (virtual or physical) into a number of smaller, manageable regions called metaslabs. Each metaslab is associated with a space map that holds information about the free space in that region by keeping tracking of space allocations and deallocations.

The following sample outputs show that a virtual device, u01, made up of two physical disks has 139 metaslabs. The number of segments and free/available space in each metaslab is also shown in those outputs.

# zpool list u01
NAME   SIZE  ALLOC  FREE  CAP  DEDUP  HEALTH  ALTROOT
u01   1.09T   133G  979G  11%  1.00x  ONLINE  -

# zpool status u01
  pool: u01
 state: ONLINE
  scan: none requested
config:

        NAME                       STATE     READ WRITE CKSUM
        u01                        ONLINE       0     0     0
          mirror-0                 ONLINE       0     0     0
            c0t5000CCA01D1DD4A4d0  ONLINE       0     0     0
            c0t5000CCA01D1DCE88d0  ONLINE       0     0     0

errors: No known data errors

# zdb -m u01

Metaslabs:
        vdev          0   ms_array         27
        metaslabs   139   offset                spacemap          free      
        ---------------   -------------------   ---------------   -------------
        metaslab      0   offset            0   spacemap     30   free    4.65M
        metaslab      1   offset    200000000   spacemap     32   free     698K
        metaslab      2   offset    400000000   spacemap     33   free    1.25M
        metaslab      3   offset    600000000   spacemap     35   free     588K
	..
	..
        metaslab     62   offset   7c00000000   spacemap      0   free       8G
        metaslab     63   offset   7e00000000   spacemap     45   free    8.00G
        metaslab     64   offset   8000000000   spacemap      0   free       8G
	...
	...
        metaslab    136   offset  11000000000   spacemap      0   free       8G
        metaslab    137   offset  11200000000   spacemap      0   free       8G
        metaslab    138   offset  11400000000   spacemap      0   free       8G

# zdb -mm u01   

Metaslabs:
        vdev          0   ms_array         27
        metaslabs   139   offset                spacemap          free      
        ---------------   -------------------   ---------------   -------------
        metaslab      0   offset            0   spacemap     30   free    4.65M
                          segments       1136   maxsize    103K   freepct    0%
        metaslab      1   offset    200000000   spacemap     32   free     698K
                          segments         64   maxsize    118K   freepct    0%
        metaslab      2   offset    400000000   spacemap     33   free    1.25M
                          segments        113   maxsize    104K   freepct    0%
        metaslab      3   offset    600000000   spacemap     35   free     588K
                          segments        109   maxsize   28.5K   freepct    0%
	...
	...

What is the purpose of this topic? Just to introduce the ZFS debugger, zdb (check the man page zdb(1M)) to the power-users who would like to dig a little deep to find answers to tough questions such as if a ZFS filesystem is fragmented.

Keywords: ZFS zdb metaslab "space map"

[6] Roles can not login directly error on Solaris 11 and later

The root account in Solaris 11 is a role. A role is just like any other user account with the exception that users with roles cannot login directly. Here is an example that shows the failure when attempted to connect directly.

login: root
Password: ********
Roles can not login directly

In this example, connecting as a normal user (who have no roles assigned) and then using su to connect as root user would succeed. This additional step is to prevent malevolent users from getting away with no accountability. Check Bart's blog post SPOTD: The Guide Book to Solaris Role-Based Access Control for some relevant information.

If security is not a primary concern, and if connecting directly as root user is desirable, simply change the root role into a user.

# rolemod -K type=normal root

This change does not affect all the users who are currently in the root role — they retain the root role. Other users who have root access can su to root or log in to the system as the root user. To remove the root role assignment from other local users, set the role to an empty string using usermod command as shown in the following example.


/* assign root role to user 'giri' */
# usermod -R root giri

# roles giri
root

/* remove the role from user 'giri' */
# usermod -R "" giri
#

Keywords: RBAC, roles

[7] Large volume sizes (> 2 TB), and maximum size of UFS filesystem

As per the Solaris System Administration Guide, the maximum size of a UFS filesystem is ~16 TB.

To create a UFS file system greater than 2 TB, use EFI disk label. The EFI label provides support for physical disks and virtual disk volumes that are greater than 2 TB in size. Refer to the disk management section in Solaris System Administration Guide to find out the advantages and limitations of EFI.

Note that ZFS labels disks with an EFI label when creating a ZFS storage pool (zpool). And users in general need not be too concerned about the maximum size of a ZFS filesystem as it is several times larger than the maximum size supported by the UFS filesystem.

[8] powertop to observe the CPU power management

Although powertop was ported to Solaris and available as an add-on package from unofficial sources for the past few years, recent releases of Solaris bundled this tool with the core distribution. powertop can be used to monitor the effectiveness of CPU power management features on systems running Solaris. It also displays the clock frequently at which the CPU is operating along with the top events that are causing the CPU to wake up and use more energy.

Be aware that when the CPU power management is enabled with the elastic policy in effect (default on Solaris 11 and later), the CPUs on the system are susceptible to CPU throttling under certain conditions either to conserve power or to reduce the amount of heat generated by the chip. In other words, based on the load on the system, the frequency of a microprocessor can be automatically adjusted on the fly. This is referred as "CPU dynamic voltage and frequency scaling" (DVFS). Monitoring the output of powertop is one way to monitor the frequency levels of the processor on a busy system in order to minimize any performance related surprises. Set the power management policy to performance, if letting CPUs run at full speed all the time is desired. Performance policy effectively disables the CPU power management.

Power management settings can be controlled from the Service Processor's (SP) Integrated Lights Out Manager (ILOM) command line interface or browser user interface.

The following sample is gathered from an idle SPARC T5-8 server where the CPU power management was disabled.

                                                    Solaris PowerTOP version 1.3

Idle Power States       Avg     Residency             	Frequency Levels
C0 (cpu running)                (0.1%)                	 500 Mhz        0.0%
C1                      4.7ms   (99.9%)               	 800 Mhz        0.0%
                                                      	 933 Mhz        0.0%
                                                      	1067 Mhz        0.0%
                                                      	1200 Mhz        0.0%
							  ..
							  ..
                                                	3200 Mhz        0.0%
                                                	3333 Mhz        0.0%
                                                	3467 Mhz        0.0%
                                                	3600 Mhz      100.0%

Wakeups-from-idle per second: 109818.7  interval: 5.0s
no power usage estimate available

Top causes for wakeups:
94.4% (103630.7)               sched :  <xcalls> unix`dtrace_sync_func
 3.1% (3352.8)              OPMNPing :  <xcalls> unix`setsoftint_tl1
 1.1% (1155.6)                 sched :  <xcalls> unix`setsoftint_tl1
 0.4% (401.2)               <kernel> :  genunix`pm_timer
 0.3% (317.0)                  sched :  <xcalls> 
 0.2% (251.8)               <kernel> :  genunix`lwp_timer_timeout
 0.2% (204.4)                  sched :  <xcalls> unix`null_xcall
 0.1% (100.2)               <kernel> :  genunix`clock
 0.1% ( 65.6)               <kernel> :  genunix`cv_wakeup
 0.0% ( 50.2)               <kernel> :  SDC`sysdc_update
 0.0% ( 46.8)            <interrupt> :  mcxnex#0 
 0.0% ( 39.6)                   opmn :  <xcalls> unix`setsoftint_tl1
 0.0% ( 36.6)                   opmn :  <xcalls> 
 0.0% ( 36.4)                   opmn :  <xcalls> unix`vtag_flushrange_group_tl1
 0.0% ( 21.6)            <interrupt> :  ixgbe#0
	...
	...

Suggestion: enable CPU power management using poweradm(1m)

Q - Quit R - Refresh (CPU PM is disabled)

Saturday Dec 21, 2013

Measuring Network Bandwidth Using iperf

iperf is a simple, open source tool to measure the network bandwidth. It can test TCP or UDP throughput. Tools like iperf are useful to check the performance of a network real quick, by comparing the achieved bandwidth with the expectation. The example in this blog post is from a Solaris system, but the instructions and testing methodology are applicable on all supported platforms including Linux.

Download the source code from iperf's home page, and build the iperf binary. Those running Solaris 10 or later, can download the pre-built binary (file size: 245K) from this location to give it a quick try (right click and "Save Link As .." or similar option).

Testing methodology:

iperf's network performance measurements are based on the client-server communication model - hence requires establishing both a server and a client. The same iperf binary can be used to run the process in server and client modes.

  1. Start iperf in server mode
    iperf -s -i <interval>

    Option -s or --server starts the process in server mode. -i or --interval is the sampling interval in seconds.

  2. Start iperf in client mode, and test the network connection between client and the server with arbitrary data transfers.

    iperf -n <bytes> -i <interval> -c <ServerIP>
    

    Option -c or --client starts the process in client mode. Option -n or --bytes specify the number of bytes to transmit in bytes, KB (use suffix K) or MB (use suffix M). -i or --interval is the sampling interval in seconds. The last option is the IP address or the hostname of the server to connect to. By default, client connects to the server using TCP. -u or --udp switches to UDP.

  3. Check the network link speed on server and client, and compare the throughput achieved.

Check the man page out for the full list of options supported by iperf in client and server modes.

Here is a simple demonstration.

On server node:

iperfserv% dladm show-phys net0
LINK              MEDIA                STATE      SPEED  DUPLEX    DEVICE
net0              Ethernet             up         1000   full      igb0

iperfserv% ifconfig net0 | grep inet
        inet 10.129.193.63 netmask ffffff00 broadcast 10.129.193.255

iperfserv% ./iperf -v
iperf version 3.0-BETA5 (28 March 2013)SunOS iperfserv 5.11 11.1 sun4v sparc sun4v


iperfserv% ./iperf -s -i 1
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------

On client node:

client% dladm show-phys net0
LINK              MEDIA                STATE      SPEED  DUPLEX    DEVICE
net0              Ethernet             up         1000   full      igb0

client% ifconfig net0 | grep inet
        inet 10.129.193.151 netmask ffffff00 broadcast 10.129.193.255

client% ./iperf  -n 1024M  -i 1 -c 10.129.193.63
Connecting to host 10.129.193.63, port 5201
[  4] local 10.129.193.151 port 63507 connected to 10.129.193.63 port 5201
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.01   sec   105 MBytes   875 Mbits/sec
[  4]   1.01-2.02   sec   112 MBytes   934 Mbits/sec
[  4]   2.02-3.00   sec   110 MBytes   934 Mbits/sec
			[...]
[  4]   8.02-9.01   sec   110 MBytes   933 Mbits/sec
[  4]   9.01-9.27   sec  30.0 MBytes   934 Mbits/sec
[ ID] Interval           Transfer     Bandwidth
      Sent
[  4]   0.00-9.27   sec  1.00 GBytes   927 Mbits/sec
      Received
[  4]   0.00-9.27   sec  1.00 GBytes   927 Mbits/sec

iperf Done.

At the same time, somewhat similar messages are written to stdout on the server node.

iperfserv% ./iperf  -s -i 1
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 10.129.193.151, port 33457
[  5] local 10.129.193.63 port 5201 connected to 10.129.193.151 port 63507
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-1.00   sec   104 MBytes   874 Mbits/sec
[  5]   1.00-2.00   sec   111 MBytes   934 Mbits/sec
[  5]   2.00-3.00   sec   111 MBytes   934 Mbits/sec
			[...]
[ ID] Interval           Transfer     Bandwidth
      Sent
[  5]   0.00-9.28   sec  1.00 GBytes   927 Mbits/sec
      Received
[  5]   0.00-9.28   sec  1.00 GBytes   927 Mbits/sec
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------

The link speed is specified in Mbps (megabit per second). In the above example, the network link is operating at 1000 Mbps speed, and the achieved bandwidth is 927 Mbps, which is 92.7% of the advertised bandwidth.

Notes:

  • It is not necessary to execute iperf in client and server modes as root or privileged user
  • In server mode, iperf uses port 5201 by default. It can be changed to something else using -p or --port option
  • Restart iperf server after each client test to get reliable, consistent results
  • Using iperf is just one of many ways to measure the network bandwidth. There are other tools such as uperf, ttcp, netperf, bwping, udpmon, tcpmon, .. just to name a few. Research and pick the one that best suits your requirement.

Monday Oct 14, 2013

[Script] Breakdown of Oracle SGA into Solaris Locality Groups

Goal: for a given process, find out how the SGA was allocated in different locality groups on a system running Solaris operating system.

Download the shell script, sga_in_lgrp.sh. The script accepts any Oracle database process id as input, and prints out the memory allocated in each locality group.

Usage: ./sga_in_lgrp.sh <pid>

eg.,

# prstat -p 12820

   PID USERNAME  SIZE   RSS STATE   PRI NICE      TIME  CPU PROCESS/NLWP
 12820 oracle     32G   32G sleep    60  -20   0:00:16 0.0% oracle/2

# ./sga_in_lgrp.sh 12820

Number of Locality Groups (lgrp): 4
------------------------------------

lgroup 1 :   8.56 GB
lgroup 2 :   6.56 GB
lgroup 3 :   6.81 GB
lgroup 4 :  10.07 GB

Total allocated memory:  32.00 GB

For those who wants to have a quick look at the source code, here it is.

# cat sga_in_lgrp.sh

#!/bin/bash

# check the argument count
if [ $# -lt 1 ]
then
        echo "usage: ./sga_in_lgrp.sh <oracle pid>"
        exit 1
fi

# find the number of locality groups
lgrp_count=$(kstat -l lgrp | tail -1 | awk -F':' '{ print $2 }')
echo "\nNumber of Locality Groups (lgrp): $lgrp_count"
echo "------------------------------------\n"

# save the ism output using pmap
pmap -sL $1 | grep ism | sort -k5 > /tmp/tmp_pmap_$1

# calculate the total amount of memory allocated in each lgroup
for i in `seq 1 $lgrp_count`
do
        echo -n "lgroup $i : "
        grep "$i   \[" /tmp/tmp_pmap_$1 | awk '{ print $2 }' | sed 's/K//g' | 
               awk '{ sum+=$1} END {printf ("%6.2f GB\n", sum/(1024*1024))}'
done

echo
echo -n "Total allocated memory: "
awk '{ print $2 }' /tmp/tmp_pmap_$1 | sed 's/K//g' | awk '{ sum+=$1} END 
         {printf ("%6.2f GB\n\n", sum/(1024*1024))}'

rm /tmp/tmp_pmap_$1

Like many things in life, there will always be a better or simpler way to achieve this. If you find one, do not fret over this approach. Please share, if possible.

Monday Sep 30, 2013

Miscellaneous Tips: Solaris, Oracle Database, Java, FMW

[Solaris] Cleanup all IPC resources

Run the following wrapper script with root user privileges.

for i in `ipcs -a | awk '{ print $2 }'`
do
	ipcrm -m $i 2> /dev/null
	ipcrm -q $i 2> /dev/null
	ipcrm -s $i 2> /dev/null
done

[Java, WebLogic] Find the process id (pid) of a WebLogic managed server instance

Run the following as the user who owns the process, or with root user privileges.

/usr/java/bin/jps -v | grep <WLS_server_name> | awk '{ print $1 }'

I think this tip is applicable on all supported platforms.

eg.,
Finding the pid of a managed server, bi_server1.

# /usr/java/bin/jps -v | grep bi_server1 | awk '{ print $1 }'
18659

# pargs 18659 | grep weblogic.Name
argv[7]: -Dweblogic.Name=bi_server1

[Oracle Database] Make Oracle ignore hints

Set the following hidden parameter.

_optimizer_ignore_hints=TRUE

(in general, Oracle does not recommend playing with hidden parameters. Check with Oracle support when in doubt).


[Oracle Database] Data Pump Export in a RAC environment fails with ORA-31693, ORA-31617, ORA-19505, ORA-27037 errors

eg.,

ORA-31693: Table data object "<SCHEMA>"."<TABLE>":"P_1147" failed to load/unload \
        and is being skipped due to error:
ORA-31617: unable to open dump file "<FILE>" for write
ORA-19505: failed to identify file "<FILE>"
ORA-27037: unable to obtain file status
SVR4 Error: 2: No such file or directory
Additional information: 3

Workaround:

Add CLUSTER=N to the list of existing expdp options.


[Solaris, ZFS] Check the current ARC size and its breakdown

kstat -m zfs | grep size 		(any user)  - OR -
echo ::arc | mdb -k | grep size 	(root user)

echo ::memstat | mdb -k		(root user)


eg.,

# echo ::arc | mdb -k | grep size
size                      =       259391 MB
buf_size                  =         3218 MB
data_size                 =       249309 MB
other_size                =         6863 MB
l2_hdr_size               =            0 MB

# kstat -m zfs | grep size
        buf_size                        3375105344
        data_size                       261419672320
        l2_hdr_size                     0
        other_size                      7197048560
        size                            271991826224

# echo ::memstat | mdb -k
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                   14646326            114424    4%
ZFS File Data            31948806            249600   10%
Anon                     24660113            192657    7%
Exec and libs                8912                69    0%
Page cache                 126052               984    0%
Free (cachelist)            24517               191    0%
Free (freelist)         263965754           2062232   79%
Total                   335380480           2620160

[Fusion Middleware] Disable Fusion Middleware Diagnostic Framework (DFW) Dump Sampling

The Diagnostic Framework in FMW 11g environments detect, diagnose and resolve critical errors such as uncaught exceptions, deadlocked threads and out of memory errors. It is enabled by default.

Though DFW is supposed to diagnose and fix some of the issues transparently, due to the inevitable bugs in [all kinds of] software and misconfigurations, sometimes DFW itself may become a major issue. For instance, there is a bug that reported very high system CPU time on a SPARC server where FMW 11g was running. Per the bug description, the system CPU utilization spikes every minute exactly at 00s of a minute, CPU utilization goes down within few seconds - but the pattern persists and the spiky behavior returns within a minute. Another symptom was the sudden drop in available swap space from tens of giga bytes to a few mega bytes when the CPU spike occurs. Upon close examination, it was found out that DFW in FMW is forking tens of jstack processes to collect the thread dumps from an equal number of java processes running in that FMW environment, causing the sudden spike in CPU (each process is busy gathering thread dumps at the same time) and a steep drop in swap space (each jstack process forked a jmap process. both jstack and jmap processes consume some virtual memory just like any other process). All this happened because DFW thought it found a critical issue, and it wasn't noticed or addressed by anyone including the administrators (DFW couldn't fix this particular issue on its own) - so, it kept gathering the diagnostic data continuously. In this example, DFW did the right thing but the diagnostic data collection frequency was too short - only one minute, that diminished the value of DFW and made it a liability. In such dire situations, probably it is best to disable the dump sampling feature of Diagnostic Framework tentatively while the underlying original issue is being fixed in that application environment. It can be enabled again when the critical issue was fixed, and no longer an issue.

Steps to disable Fusion Middleware Diagnostic Framework (DFW) Dump Sampling: (courtesy: Shashidhara Varamballi)

Method (1) Using WLST:

  1. run wlst.sh
  2. connect to the AdminServer
  3. execute command: enableDumpSampling (enable=0, server='<server_name>')

Method (2) Manual editing of config file:

  1. Edit $DOMAIN_HOME/config/fmwconfig/servers/<server_name>/dfw_config.xml
  2. Change the "enabled" attribute from "true" to "false".
    eg.,
    <dumpSampling enabled="false">
  3. Change the "useExternalCommands" attribute from "true" to "false".
    eg.,
    <threadDump useExternalCommands="false"/>
  4. Save the changes
--

SEE ALSO:

Fusion Middleware Diagnostics weblog


[Solaris 11] Virtual-to-physical link (NIC) mapping

Check the output of /sbin/dladm show-phys (any user). By default, only those physical links that are available on the running system are displayed. Option -P shows the physical device and attributes of all physical links.

eg.,

$ /sbin/dladm show-phys
LINK              MEDIA                STATE      SPEED  DUPLEX    DEVICE
net0              Ethernet             up         1000   full      ixgbe0
net5              Infiniband           down       0      unknown   ibp2
net1              Ethernet             up         1000   full      ixgbe1
net6              Infiniband           down       0      unknown   ibp3
net4              Ethernet             up         10     full      usbecm2

$ /sbin/dladm show-phys -P
LINK              DEVICE       MEDIA                FLAGS
net8              ibp1         Infiniband           r----
net0              ixgbe0       Ethernet             -----
net7              ibp0         Infiniband           r----
net3              ixgbe3       Ethernet             r----
net5              ibp2         Infiniband           -----
net1              ixgbe1       Ethernet             -----
net6              ibp3         Infiniband           -----
net4              usbecm2      Ethernet             -----
net2              vsw0         Ethernet             r----

Saturday Aug 31, 2013

[Oracle Database] Unreliable AWR reports on T5 & Redo logs on F40 PCIe Cards

(1) AWR report shows bogus wait events and times on SPARC T5 servers

Here is a sample from one of the Oracle 11g R2 databases running on a SPARC T5 server with Solaris 11.1 SRU 7.5

Top 5 Timed Foreground Events

Event Waits Time(s) Avg wait (ms) % DB time Wait Class
latch: cache buffers chains 278,727 812,447,335 2914850 13307324.15Concurrency
library cache: mutex X212,595449,966,33021165427370136.56Concurrency
buffer busy waits219,844349,975,25115919255732352.01Concurrency
latch: In memory undo latch25,46837,496,8001472310614171.59Concurrency
latch free2,60224,998,5839607449409459.46Other

Reason:
Unknown. There is a pending bug 17214885 - Implausible top foreground wait times reported in AWR report.

Tentative workaround:
Disable power management as shown below.

# poweradm set administrative-authority=none

# svcadm disable power
# svcadm enable power

Verify the setting by running poweradm list.

Also disable NUMA I/O object binding by setting the following parameter in /etc/system (requires a system reboot).

set numaio_bind_objects=0

Oracle Solaris 11 added support for NUMA I/O architecture. Here is a brief explanation of NUMA I/O from Solaris 11 : What's New web page.

Non-Uniform Memory Access (NUMA) I/O : Many modern systems are based on a NUMA architecture, where each CPU or set of CPUs is associated with its own physical memory and I/O devices. For best I/O performance, the processing associated with a device should be performed close to that device, and the memory used by that device for DMA (Direct Memory Access) and PIO (Programmed I/O) should be allocated close to that device as well. Oracle Solaris 11 adds support for this architecture by placing operating system resources (kernel threads, interrupts, and memory) on physical resources according to criteria such as the physical topology of the machine, specific high-level affinity requirements of I/O frameworks, actual load on the machine, and currently defined resource control and power management policies.

Do not forget to rollback these changes after applying the fix for the database bug 17214885, when available.

(2) Redo logs on F40 PCIe cards (non-volatile flash storage)

Per the F40 PCIe card user's guide, the Sun Flash Accelerator F40 PCIe Card is designed to provide best performance for data transfers that are multiples of 8k size, and using addresses that are 8k aligned. To achieve optimal performance, the size of the read/write data should be an integer multiple of this block size and the data transferred should be block aligned. I/O operations that are not block aligned and that do not use sizes that are a multiple of the block size may suffer performance degration, especially for write operations.

Oracle redo log files default to a block size that is equal to the physical sector size of the disk, typically 512 bytes. And most of the time, database writes to the redo log in a normal functioning environment. Oracle database supports a maximum block size of 4K for redo logs. Hence to achieve optimal performance for redo write operations on F40 PCIe cards, tune the environment as shown below.

  1. Configure the following init parameters
    _disk_sector_size_override=TRUE
    _simulate_disk_sectorsize=4096
    
  2. Create redo log files with 4K block size
    eg.,
    SQL> ALTER DATABASE ADD LOGFILE '/REDO/redo.dbf' size 20G blocksize 4096;
    
  3. [Solaris only] Append the following line to /kernel/drv/sd.conf (requires a reboot)
    sd-config-list="ATA     3E128-TS2-550B01","disksort:false,\
                 cache-nonvolatile:true, physical-block-size:4096";
    
  4. [Solaris only][F20] To enable maximum throughput from the MPT driver, append the following line to /kernel/drv/mpt.conf and reboot the system.
    mpt_doneq_thread_n_prop=8;
    

This tip is applicable to all kinds of flash storage that Oracle sells or sold including F20/F40 PCIe cards and F5100 storage array. sd-config-list in sd.conf may need some adjustment to reflect the correct vendor id and product id.

Tuesday Jul 30, 2013

Oracle Tips : Solaris lgroups, CT optimization, Data Pump, Recompilation of Objects, ..

1. [Re]compiling all objects in a schema
exec DBMS_UTILITY.compile_schema(schema => 'SCHEMA');

To recompile only the invalid objects in parallel:

exec UTL_RECOMP.recomp_parallel(<NUM_PARALLEL_THREADS>, 'SCHEMA');

A NULL value for SCHEMA recompiles all invalid objects in the database.


2. SGA breakdown in Solaris Locality Groups (lgroup)

To find the breakdown, execute pmap -L | grep shm. Then separate the lines that are related to each locality group and sum up the value in 2nd column to arrive at a number that shows the total SGA memory allocated in that locality group.

(I'm pretty sure there will be a much easier way that I am not currently aware of.)


3. Default values for shared pool, java pool, large pool, ..

If the *pool parameters were not set explicitly, executing the following query is one way to find out what are they currently set to.

eg.,
SQL> select * from v$sgainfo;

NAME                                  BYTES RES
-------------------------------- ---------- ---
Fixed SGA Size                      2171296 No
Redo Buffers                      373620736 No
Buffer Cache Size                8.2410E+10 Yes
Shared Pool Size                 1.7180E+10 Yes
Large Pool Size                   536870912 Yes
Java Pool Size                   1879048192 Yes
Streams Pool Size                 268435456 Yes
Shared IO Pool Size                       0 Yes
Granule Size                      268435456 No
Maximum SGA Size                 1.0265E+11 No
Startup overhead in Shared Pool  2717729536 No
Free SGA Memory Available                 0
12 rows selected.

4. Fix to PLS-00201: identifier 'GV$SESSION' must be declared error

Grant select privilege on gv_$SESSION to the owner of the database object that failed to compile.

eg.,
SQL> alter package OWF_MGR.FND_SVC_COMPONENT compile body;
Warning: Package Body altered with compilation errors.

SQL> show errors
Errors for PACKAGE BODY OWF_MGR.FND_SVC_COMPONENT:

LINE/COL ERROR
-------- -----------------------------------------------------------------
390/22   PL/SQL: Item ignored
390/22   PLS-00201: identifier 'GV$SESSION' must be declared

SQL> grant select on gv_$SESSION to OWF_MGR;
Grant succeeded.

SQL> alter package OWF_MGR.FND_SVC_COMPONENT compile body;
Package body altered.

5. Solaris Critical Thread (CT) optimization for Oracle logwriter (lgrw)

Critical Thread is a new scheduler optimization available in Oracle Solaris releases Solaris 10 Update 10 and later versions. Latency sensitive single threaded components of software such as Oracle database's logwriter benefit from CT optimization.

On a high level, LWPs marked as critical will be granted more exclusive access to the hardware. For example, on SPARC T4 and T5 systems, such a thread will be assigned exclusive access to a core as much as possible. CT optimization won't delay scheduling of any runnable thread in the system.

Critical Thread optimization is enabled by default. However the users of the system have to hint the OS by marking a thread or two "critical" explicitly as shown below.

priocntl -s -c FX -m 60 -p 60 -i pid <pid_of_critical_single_threaded_process>

From database point of view, logwriter (lgwr) is one such process that can benefit from CT optimization on Solaris platform. Oracle DBA's can either make the lgwr process 'critical' once the database is up and running, or can simply patch the 11.2.0.3 database software by installing RDBMS patch 12951619 to let the database take care of it automatically. I believe Oracle 12c does it by default. Future releases of 11g software may make lgwr critical out of the box.

Those who install the database patch 12951619 need to carefully follow the post installation steps documented in the patch README to avoid running into unwanted surprises.


6. ORA-14519 error while importing a table from a Data Pump export dump
ORA-14519: Conflicting tablespace blocksizes for table : Tablespace XXX block \
size 32768 [partition specification] conflicts with previously specified/implied \
tablespace YYY block size 8192
 [object-level default]
Failing sql is:
CREATE TABLE XYZ
..

All partitions in table XYZ are using 32K blocks whereas the implicit default partition is pointing to a 8K block tablespace. Workaround is to use the REMAP_TABLESPACE option in Data Pump impdp command line to remap the implicit default tablespace of the partitioned table to the tablespace where the rest of partitions are residing.


7. Index building task in Data Pump import process

When Data Pump import process is running, by default, index building is performed with just one thread, which becomes a bottleneck and causes the data import process to take a long time especially if many large tables with millions of rows are being imported into the target database. One way to speed up the import process execution is by skipping index building as part of data import task with the help of EXCLUDE=INDEX impdp command line option. Extract the index definitions for all the skipped indexes from the Data Pump dump file as shown below.

impdp <userid>/<password> directory=<directory> dumpfile=<dump_file>.dmp \
    sqlfile=<index_def_file>.sql INCLUDE=INDEX

Edit <index_def_file>.sql to set the desired number of parallel threads to build each index. And finally execute the <index_def_file>.sql to build the indexes once the data import task is complete.

About

Benchmark announcements, HOW-TOs, Tips and Troubleshooting

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today