Saturday Aug 31, 2013

[Oracle Database] Unreliable AWR reports on T5 & Redo logs on F40 PCIe Cards

(1) AWR report shows bogus wait events and times on SPARC T5 servers

Here is a sample from one of the Oracle 11g R2 databases running on a SPARC T5 server with Solaris 11.1 SRU 7.5

Top 5 Timed Foreground Events

Event Waits Time(s) Avg wait (ms) % DB time Wait Class
latch: cache buffers chains 278,727 812,447,335 2914850 13307324.15Concurrency
library cache: mutex X212,595449,966,33021165427370136.56Concurrency
buffer busy waits219,844349,975,25115919255732352.01Concurrency
latch: In memory undo latch25,46837,496,8001472310614171.59Concurrency
latch free2,60224,998,5839607449409459.46Other

Reason:
Unknown. There is a pending bug 17214885 - Implausible top foreground wait times reported in AWR report.

Tentative workaround:
Disable power management as shown below.

# poweradm set administrative-authority=none

# svcadm disable power
# svcadm enable power

Verify the setting by running poweradm list.

Also disable NUMA I/O object binding by setting the following parameter in /etc/system (requires a system reboot).

set numaio_bind_objects=0

Oracle Solaris 11 added support for NUMA I/O architecture. Here is a brief explanation of NUMA I/O from Solaris 11 : What's New web page.

Non-Uniform Memory Access (NUMA) I/O : Many modern systems are based on a NUMA architecture, where each CPU or set of CPUs is associated with its own physical memory and I/O devices. For best I/O performance, the processing associated with a device should be performed close to that device, and the memory used by that device for DMA (Direct Memory Access) and PIO (Programmed I/O) should be allocated close to that device as well. Oracle Solaris 11 adds support for this architecture by placing operating system resources (kernel threads, interrupts, and memory) on physical resources according to criteria such as the physical topology of the machine, specific high-level affinity requirements of I/O frameworks, actual load on the machine, and currently defined resource control and power management policies.

Do not forget to rollback these changes after applying the fix for the database bug 17214885, when available.

(2) Redo logs on F40 PCIe cards (non-volatile flash storage)

Per the F40 PCIe card user's guide, the Sun Flash Accelerator F40 PCIe Card is designed to provide best performance for data transfers that are multiples of 8k size, and using addresses that are 8k aligned. To achieve optimal performance, the size of the read/write data should be an integer multiple of this block size and the data transferred should be block aligned. I/O operations that are not block aligned and that do not use sizes that are a multiple of the block size may suffer performance degration, especially for write operations.

Oracle redo log files default to a block size that is equal to the physical sector size of the disk, typically 512 bytes. And most of the time, database writes to the redo log in a normal functioning environment. Oracle database supports a maximum block size of 4K for redo logs. Hence to achieve optimal performance for redo write operations on F40 PCIe cards, tune the environment as shown below.

  1. Configure the following init parameters
    _disk_sector_size_override=TRUE
    _simulate_disk_sectorsize=4096
    
  2. Create redo log files with 4K block size
    eg.,
    SQL> ALTER DATABASE ADD LOGFILE '/REDO/redo.dbf' size 20G blocksize 4096;
    
  3. [Solaris only] Append the following line to /kernel/drv/sd.conf (requires a reboot)
    sd-config-list="ATA     3E128-TS2-550B01","disksort:false,\
                 cache-nonvolatile:true, physical-block-size:4096";
    
  4. [Solaris only][F20] To enable maximum throughput from the MPT driver, append the following line to /kernel/drv/mpt.conf and reboot the system.
    mpt_doneq_thread_n_prop=8;
    

This tip is applicable to all kinds of flash storage that Oracle sells or sold including F20/F40 PCIe cards and F5100 storage array. sd-config-list in sd.conf may need some adjustment to reflect the correct vendor id and product id.

Tuesday Jul 30, 2013

Oracle Tips : Solaris lgroups, CT optimization, Data Pump, Recompilation of Objects, ..

1. [Re]compiling all objects in a schema
exec DBMS_UTILITY.compile_schema(schema => 'SCHEMA');

To recompile only the invalid objects in parallel:

exec UTL_RECOMP.recomp_parallel(<NUM_PARALLEL_THREADS>, 'SCHEMA');

A NULL value for SCHEMA recompiles all invalid objects in the database.


2. SGA breakdown in Solaris Locality Groups (lgroup)

To find the breakdown, execute pmap -L | grep shm. Then separate the lines that are related to each locality group and sum up the value in 2nd column to arrive at a number that shows the total SGA memory allocated in that locality group.

(I'm pretty sure there will be a much easier way that I am not currently aware of.)


3. Default values for shared pool, java pool, large pool, ..

If the *pool parameters were not set explicitly, executing the following query is one way to find out what are they currently set to.

eg.,
SQL> select * from v$sgainfo;

NAME                                  BYTES RES
-------------------------------- ---------- ---
Fixed SGA Size                      2171296 No
Redo Buffers                      373620736 No
Buffer Cache Size                8.2410E+10 Yes
Shared Pool Size                 1.7180E+10 Yes
Large Pool Size                   536870912 Yes
Java Pool Size                   1879048192 Yes
Streams Pool Size                 268435456 Yes
Shared IO Pool Size                       0 Yes
Granule Size                      268435456 No
Maximum SGA Size                 1.0265E+11 No
Startup overhead in Shared Pool  2717729536 No
Free SGA Memory Available                 0
12 rows selected.

4. Fix to PLS-00201: identifier 'GV$SESSION' must be declared error

Grant select privilege on gv_$SESSION to the owner of the database object that failed to compile.

eg.,
SQL> alter package OWF_MGR.FND_SVC_COMPONENT compile body;
Warning: Package Body altered with compilation errors.

SQL> show errors
Errors for PACKAGE BODY OWF_MGR.FND_SVC_COMPONENT:

LINE/COL ERROR
-------- -----------------------------------------------------------------
390/22   PL/SQL: Item ignored
390/22   PLS-00201: identifier 'GV$SESSION' must be declared

SQL> grant select on gv_$SESSION to OWF_MGR;
Grant succeeded.

SQL> alter package OWF_MGR.FND_SVC_COMPONENT compile body;
Package body altered.

5. Solaris Critical Thread (CT) optimization for Oracle logwriter (lgrw)

Critical Thread is a new scheduler optimization available in Oracle Solaris releases Solaris 10 Update 10 and later versions. Latency sensitive single threaded components of software such as Oracle database's logwriter benefit from CT optimization.

On a high level, LWPs marked as critical will be granted more exclusive access to the hardware. For example, on SPARC T4 and T5 systems, such a thread will be assigned exclusive access to a core as much as possible. CT optimization won't delay scheduling of any runnable thread in the system.

Critical Thread optimization is enabled by default. However the users of the system have to hint the OS by marking a thread or two "critical" explicitly as shown below.

priocntl -s -c FX -m 60 -p 60 -i pid <pid_of_critical_single_threaded_process>

From database point of view, logwriter (lgwr) is one such process that can benefit from CT optimization on Solaris platform. Oracle DBA's can either make the lgwr process 'critical' once the database is up and running, or can simply patch the 11.2.0.3 database software by installing RDBMS patch 12951619 to let the database take care of it automatically. I believe Oracle 12c does it by default. Future releases of 11g software may make lgwr critical out of the box.

Those who install the database patch 12951619 need to carefully follow the post installation steps documented in the patch README to avoid running into unwanted surprises.


6. ORA-14519 error while importing a table from a Data Pump export dump
ORA-14519: Conflicting tablespace blocksizes for table : Tablespace XXX block \
size 32768 [partition specification] conflicts with previously specified/implied \
tablespace YYY block size 8192
 [object-level default]
Failing sql is:
CREATE TABLE XYZ
..

All partitions in table XYZ are using 32K blocks whereas the implicit default partition is pointing to a 8K block tablespace. Workaround is to use the REMAP_TABLESPACE option in Data Pump impdp command line to remap the implicit default tablespace of the partitioned table to the tablespace where the rest of partitions are residing.


7. Index building task in Data Pump import process

When Data Pump import process is running, by default, index building is performed with just one thread, which becomes a bottleneck and causes the data import process to take a long time especially if many large tables with millions of rows are being imported into the target database. One way to speed up the import process execution is by skipping index building as part of data import task with the help of EXCLUDE=INDEX impdp command line option. Extract the index definitions for all the skipped indexes from the Data Pump dump file as shown below.

impdp <userid>/<password> directory=<directory> dumpfile=<dump_file>.dmp \
    sqlfile=<index_def_file>.sql INCLUDE=INDEX

Edit <index_def_file>.sql to set the desired number of parallel threads to build each index. And finally execute the <index_def_file>.sql to build the indexes once the data import task is complete.

Sunday Jun 30, 2013

Solaris Tips : Assembler, Format, File Descriptors, Ciphers & Mount Points

1. Most Oracle software installers need assembler

Assembler (as) is not installed by default on Solaris 11.
     Find and install

eg.,
# pkg search assembler
INDEX       ACTION VALUE                           PACKAGE        
pkg.fmri    set    solaris/developer/assembler     pkg:/developer/assembler@0.5.11-0.175.1.5.0.3.0

# pkg install pkg:/developer/assembler

Assembler binary used to be under /usr/ccs/bin directory on Solaris 10 and prior versions.
     There is no /usr/ccs/bin on Solaris 11. Contents were moved to /usr/bin



2. Non-interactive retrieval of the entire list of disks that format reports

If the format utility cannot show the entire list of disks in a single screen on stdout, it shows some and prompts user to - hit space for more or s to select - to move to the next screen to show few more disks. Run the following command(s) to retrieve the entire list of disks in a single shot.

format < /dev/null

	-or-

echo "\n" | format



3. Finding system wide file descriptors/handles in use

Run the following kstat command as any user (privileged or non-privileged).

kstat -n file_cache -s buf_inuse

Going through /proc (process filesystem) is less efficient and may lead to inaccurate results due to the inclusion of duplicate file handles.



4. ssh connection to a Solaris 11 host fails with error Couldn't agree a client-to-server cipher (available: aes128-ctr,aes192-ctr,aes256-ctr,arcfour128,arcfour256,arcfour)

Solution: add 3des-cbc to the list of accepted ciphers to sshd configuration file.

Steps:

  1. Append the following line to /etc/ssh/sshd_config
    Ciphers aes128-ctr,aes192-ctr,aes256-ctr,arcfour128,arcfour256,\
       arcfour,3des-cbc
  2. Restart ssh daemon
    svcadm -v restart ssh



5. UFS: Finding the last mount point for a device

fsck utility reports the last mountpoint on which the filesystem was mounted (it won't show the mount options though). The filesystem should be unmounted when running fsck.

eg.,
# fsck -n /dev/dsk/c0t5000CCA0162F7BC0d0s6
** /dev/rdsk/c0t5000CCA0162F7BC0d0s6 (NO WRITE)
** Last Mounted on /export/oracle
** Phase 1 - Check Blocks and Sizes
...
...

Tuesday Mar 05, 2013

SuperCluster Best Practices : Deploying Oracle 11g Database in Zones

To be clear, this post is about a white paper that's been out there for more than two months. Access it through the following url.

  Best Practices for Deploying Oracle Solaris Zones with Oracle Database 11g on SPARC SuperCluster

The focus of the paper is on databases and zones. On SuperCluster, customers have the choice of running their databases in logical domains that are dedicated to running Oracle Database 11g R2. With exclusive access to Exadata Storage Servers, those domains are aptly called "Database" domains. If the requirement mandates, it is possible to create and use all logical domains as "database domains" or "application domains" or a mix of those. Since the focus is on databases, the paper talks only about the database domains and how zones can be created, configured and used within each database domain for fine grained control over multiple databases consolidated in a SuperCluster environment.

When multiple databases are being consolidated (including RAC databases) in database logical domains, zones are one of the options that fulfill requirements such as the fault, operation, network, security and resource isolation, multiple RAC instances in a single logical domain, separate identity and independent manageability for database instances.

The best practices cover the following topics. Some of those are applicable to standalone, non-engineered environments as well.

Solaris Zones

  • CPU, memory and disk space allocation
  • Zone Root on Sun ZFS Storage Appliance
  • Network configuration
  • Use of DISM
  • Use of ZFS filesystem
  • SuperCluster specific zone deployment tool, ssc_exavm
  • ssctuner utility

Oracle Database

  • Exadata Storage Grid (Disk Group) Configuration
  • Disk Group Isolation
    • Shared Storage approach
    • Dedicated Storage Server approach
  • Resizing Grid Disks

Oracle RAC Configuration
Securing the Databases, and

Example Database Consolidation Scenarios

  • Consolidation example using Half-Rack SuperCluster
  • Consolidation example using Full-Rack SuperCluster

Acknowledgements

A large group of experts reviewed the material and provided quality feedback. Hence they deserve credit for their work and time. Listed below are some of those reviewers (sincere apologies if I missed listing any major contributors).

Kesari Mandyam, Binoy Sukumaran, Gowri Suserla, Allan Packer, Jennifer Glore, Hazel Alabado, Tom Daly, Krishnan Shankar, Gurubalan T, Rich long, Prasad Bagal, Lawrence To, Rene Kundersma, Raymond Dutcher, David Brean, Jeremy Ward, Suzi McDougall, Ken Kutzer, Larry Mctintosh, Roger Bitar, Mikel Manitius

Tuesday Feb 12, 2013

OBIEE 11g Benchmark on SPARC T4

Just like the Siebel 8.1.x/SPARC T4 benchmark post, this one too was overdue for at least four months. In any case, I hope the Oracle BI customers already knew about the OBIEE 11g/SPARC T4 benchmark effort. In here I will try to provide few additional / interesting details that aren't covered in the following Oracle PR that was posted on oracle.com on 09/30/2012.

    SPARC T4 Server Delivers Outstanding Performance on Oracle Business Intelligence Enterprise Edition 11g


Benchmark Details

System Under Test

The entire BI middleware stack including the WebLogic 11g Server, OBI Server, OBI Presentation Server and Java Host was installed and configured on a single SPARC T4-4 server consisting four 8-Core 3.0 GHz SPARC T4 processors (total #cores: 32) and 128 GB physical memory. Oracle Solaris 10 8/11 is the operating system.

BI users were authenticated against Oracle Internet Directory (OID) in this benchmark - hence OID software which was part of Oracle Identity Management 11.1.1.6.0 was also installed and configured on the system under test (SUT). Oracle BI Server's Query Cache was turned on, and as a result, most of the query results were cached in OBIS layer, that resulted in minimal database activity making it ideal to have the Oracle 11g R2 database server with the OBIEE database running on the same box as well.

Oracle BI database was hosted on a Sun ZFS Storage 7120 Appliance. The BI Web Catalog was under a ZFS/zpool on a couple of SSDs.


Test Scenario

In this benchmark, 25000 concurrent users assumed five different business user roles -- Marketing Executive, Sales Representative, Sales Manager, Sales Vice-president, and Service Manager. The load was distributed equally among those five business user roles. Each of those different BI users accessed five different pre-built dashboards with each dashboard having an average of five reports - a mix of charts, tables and pivot tables - and returning 50-500 rows of aggregated data. The benchmark test scenario included drilling down into multiple levels from a table or chart within a dashboard. There is a 60 second think time between requests, per user.


BI Setup & Test Results

OBIEE 11g 11.1.1.6.0 was deployed on SUT in a vertical scale-out fashion. Two Oracle BI Presentation Server processes, one Oracle BI Server process, one Java Host process and two instances of WebLogic Managed Servers handled 25,000 concurrent user sessions smoothly. This configuration resulted in a sub-second overall average transaction response time (average of averages over a duration of 120 minutes or 2 hours). On average, 450 business transactions were executed per second, which triggered 750 SQL executions per second.

It took only 52% of CPU on average (~5% system CPU and rest in user land) to do all this work to achieve the throughput outlined above. Since 25,000 unique test/BI users hammered different dashboards consistently, not so surprisingly bulk of the CPU was spent in Oracle BI Presentation Server layer, which took a whopping 29%. BI Server consumed about 10-11% and the rest was shared by Java Host, OID, WebLogic Managed Server instances and the Oracle database.


So, what is the key take away from this whole exercise?

SPARC T4 rocks Oracle BI world. OBIEE 11g/SPARC T4 is an ideal combination that may work well for majority of OBIEE deployments on Solaris platform. Or in marketing jargon - The excellent vertical and horizontal scalability of the SPARC T4 server gives customer the option to scale up as well as scale out growth, to support large BI EE installations, with minimal hardware investment.

Evaluate and decide for yourself.

[Credit to our colleagues in Oracle FMW PSR, ISVe teams and SCA lab support engineers]

Friday Dec 28, 2012

Solaris Tips : CPU Cache Sizes, Changing System Date

Tip #1: Finding the CPU cache sizes from Solaris operating environment

Use the prtpicl utility to list out system configuration, and look for the cache sizes within that output.

eg.,

$ /usr/sbin/prtpicl -v |grep cache
              :l1-icache-size    0x10000
              :l1-icache-line-size       0x40
              :l1-icache-associativity   0x2
              :l1-dcache-size    0x10000
              :l1-dcache-line-size       0x40
              :l1-dcache-associativity   0x2
              :l2-cache-size     0x500000
              :l2-cache-line-size        0x100
              :l2-cache-associativity    0xa

[Updated 01/14/13] The above output was gathered from an M4000 system that has SPARC64 VII processors.

Recent update releases of Solaris 10 and 11 show the prtpicl reported cache sizes in decimal numbers.

Here is a slightly improved prtpicl command that filters out unwanted output. (Courtesy: Georg)

/usr/sbin/prtpicl -v -c cpu | egrep "^ +cpu|ID|cache"

Tip #2: Changing the System Date

Use date to change the system date. For example, to set the system date to March 9, 2008 08:15 AM, run the following command. Syntax: date mmddHHMMyy

#date 0309081508

Sun Mar 9 08:15:03 PST 2008

Friday Aug 03, 2012

Enabling 2 GB Large Pages on Solaris 10

Few facts:

  • - 8 KB is the default page size on Solaris 10 and 11 as of this writing
  • - both hardware and software must have support for 2 GB large pages
  • - SPARC T4 hardware is capable of supporting 2 GB pages
  • - Solaris 11 kernel has in-built support for 2 GB pages
  • - Solaris 10 has no default support for 2 GB pages
  • - Memory intensive 64-bit applications may benefit the most from using 2 GB pages

Prerequisites:

OS: Solaris 10 8/11 (Update 10) or later
Hardware: SPARC T4. eg., SPARC T4-1, T4-2 or T4-4

Steps to enable 2 GB large pages on Solaris 10:

  1. Install the latest kernel patch or ensure that 147440-04 or later was installed

  2. Add the following line to /etc/system and reboot
    • set max_uheap_lpsize=0x80000000

  3. Finally check the output of the following command when the system is back online
    • pagesize -a

    eg.,
    % pagesize -a
    8192		<-- 8K
    65536		<-- 64K
    4194304		<-- 4M
    268435456	<-- 256M
    2147483648	<-- 2G
    
    % uname -a
    SunOS jar-jar 5.10 Generic_147440-21 sun4v sparc sun4v
    

Also See:

Friday Apr 27, 2012

Solaris Volume Manager (SVM) on Solaris 11

SVM is not installed on Solaris 11 by default.

# metadb
-bash: metadb: command not found

# /usr/sbin/metadb
-bash: /usr/sbin/metadb: No such file or directory

Install it using pkg utility.

# pkg info svm
pkg: info: no packages matching the following patterns you specified are
installed on the system.  Try specifying -r to query remotely:

        svm

# pkg info -r svm
          Name: storage/svm
       Summary: Solaris Volume Manager
   Description: Solaris Volume Manager commands
      Category: System/Core
         State: Not installed
     Publisher: solaris
       Version: 0.5.11
 Build Release: 5.11
        Branch: 0.175.0.0.0.2.1
Packaging Date: October 19, 2011 06:42:14 AM 
          Size: 3.48 MB
          FMRI: pkg://solaris/storage/svm@0.5.11,5.11-0.175.0.0.0.2.1:20111019T064214Z

# pkg install storage/svm
           Packages to install:   1
       Create boot environment:  No
Create backup boot environment: Yes
            Services to change:   1

DOWNLOAD                                  PKGS       FILES    XFER (MB)
Completed                                  1/1     104/104      1.6/1.6

PHASE                                        ACTIONS
Install Phase                                168/168 

PHASE                                          ITEMS
Package State Update Phase                       1/1 
Image State Update Phase                         2/2 

# which metadb
/usr/sbin/metadb

This time metadb may fail with a different error.

# metadb
metadb: <HOST>: /dev/md/admin: No such file or directory

Check if md.conf exists.

# ls -l  /kernel/drv/md.conf 
-rw-r--r--   1 root     sys          295 Apr 26 15:07 /kernel/drv/md.conf

Dynamically re-scan md.conf so the device tree gets updated.

# update_drv -f md

# ls -l  /dev/md/admin
lrwxrwxrwx   1 root root 31 Apr 20 10:12 /dev/md/admin -> ../../devices/pseudo/md@0:admin

# metadb
metadb: <HOST>: there are no existing databases

Now Solaris Volume Manager is ready to use.

eg.,
#  metadb -f -a c0t5000CCA00A5A7878d0s0

# metadb
        flags           first blk       block count
     a        u         16              8192          /dev/dsk/c0t5000CCA00A5A7878d0s0

Tuesday Feb 28, 2012

Oracle RDBMS & Solaris : Few Random Tips (Feb 2012)

These tips are just some quick solutions or workarounds. Use these quickies at your own risk.

[#1] Oracle Data Pump

Q: How to exclude the table definition while importing a table using Oracle Data Pump import utility?

A: Use EXCLUDE=TABLE/TABLE option.

eg.,

impdp login/password DUMPFILE=<DUMP_FILENAME> LOGFILE=<LOGFILE_NAME> \
 DIRECTORY=<DB_DIR_NAME> TABLES=<TABLE_NAME> EXCLUDE=TABLE/TABLE



[#2] Workaround to ORA-01089: immediate shutdown in progress - no operations are permitted

When the database is in the middle of an instance shutdown, if another shutdown or startup was attempted, Oracle RDBMS may throw the above ORA-01089 error. The workaround is to force Oracle to start the database instance using startup force option. This option will shutdown the database instance (if running) using the abort command and then starts it up.

eg.,

SQL> STARTUP FORCE



[#3] Quick steps to upgrade the Oracle database from version 11.2.0.[1 or 2] to 11.2.0.3

Execute the following in the same sequence as sysdba.

startup upgrade
!cd $ORACLE_HOME/rdbms/admin
@utlu112i.sql		/* pre-upgrade information tool */
exec dbms_stats.gather_dictionary_stats (DEGREE => 64);
@catupgrd.sql		/* create/modify data dictionary tables */
@utlu112s 		/* all components should be in VALID state */
shutdown immediate
startup
@catuppst.sql		/* upgrade actions that do not require DB in UPGRADE mode */
@utlrp.sql		/* recompile stored PL/SQL and Java code */
SELECT count(*) FROM dba_invalid_objects;		
                        /* verify that all packages and classes are valid */
exit



[#4] Q: Solaris: how to get rid of zombie processes?

A: Run the following with appropriate user privileges.

ps -eaf | grep defunct | grep -v grep | preap `awk '{ print $2 }'`

Alternative way: (not as good as the previous one - still may work as expected)

prstat -n 500 1 1 | grep zombie | preap `awk '{ print $1 }'`



[Added on 03/01/2012]

[#5] Solaris: Many TCP listen drops

eg.,

# netstat -sP tcp | grep tcpListenDrop
        tcpListenDrop       =2442553     tcpListenDropQ0     =     0

To alleviate numerous TCP listen drops, bump up the value for the tunable tcp_conn_req_max_q

# ndd -set /dev/tcp tcp_conn_req_max_q <value>



[Added on 03/02/2012]

[#6] Solaris ZFS: listing all properties and values for a zpool

Run: zfs get all <zpool_name> as any OS user

eg.,

% zpool list
NAME    SIZE  ALLOC   FREE    CAP  HEALTH  ALTROOT
rpool   276G   167G   109G    60%  ONLINE  -
spec    556G   168G   388G    30%  ONLINE  -

% zfs get all rpool
NAME   PROPERTY              VALUE                  SOURCE
rpool  type                  filesystem             -
rpool  creation              Fri May 27 17:06 2011  -
...
rpool  compressratio         1.00x                  -
rpool  mounted               yes                    -
rpool  quota                 none                   default
rpool  reservation           none                   default
rpool  recordsize            128K                   default
...
rpool  checksum              on                     default
rpool  compression           off                    default
...
rpool  logbias               latency                default
rpool  sync                  standard               default
rpool  rstchown              on                     default



[#7] Solaris: listing all ZFS tunables

Run: echo "::zfs_params" | mdb -k with root/super-user privileges

eg.,

# echo "::zfs_params" | mdb -k
arc_reduce_dnlc_percent = 0x3
zfs_arc_max = 0x10000000
zfs_arc_min = 0x10000000
arc_shrink_shift = 0x5
zfs_mdcomp_disable = 0x0
zfs_prefetch_disable = 0x0
..
..
zio_injection_enabled = 0x0
zvol_immediate_write_sz = 0x8000

Tuesday Dec 13, 2011

Solaris Tip: Resolving "statd: cannot talk to statd at <target_host>, RPC: Timed out(5)"

Symptom:

System log shows a bunch of RPC timed out messages such as the following:


Dec 13 09:23:23 gil08 last message repeated 1 time
Dec 13 09:29:14 gil08 statd[19858]: [ID 766906 daemon.warning] statd: cannot talk to statd at ssc23, RPC: Timed out(5)
Dec 13 09:35:05 gil08 last message repeated 1 time
Dec 13 09:40:56 gil08 statd[19858]: [ID 766906 daemon.warning] statd: cannot talk to statd at ssc23, RPC: Timed out(5)
..

Those messages are the result of an apparent communication failure between the status daemons (statd) of both local and remote hosts using RPC calls.

Workaround/Solution:

If the target_host is reachable, execute the following to stop the system from generating those warning messages --- stop the network status monitor, remove the target host entry from /var/statmon/sm.bak file and start the network status monitor process. Removing the target host entry from sm.bak file keeps that machine from being aware that it may have to participate in locking recovery.

eg.,


# ps -eaf | fgrep statd 
  daemon 14304 19622   0 09:47:16 ?           0:00 /usr/lib/nfs/statd
    root 14314 14297   0 09:48:03 pts/15      0:00 fgrep statd

# svcs -a | grep "nfs/status"
online          9:52:41 svc:/network/nfs/status:default

# svcadm -v disable nfs/status
svc:/network/nfs/status:default disabled.

# ls /var/statmon/sm.bak
ssc23

# rm /var/statmon/sm.bak/ssc23

# svcadm -v enable nfs/status
svc:/network/nfs/status:default enabled.

About

Benchmark announcements, HOW-TOs, Tips and Troubleshooting

Search

Archives
« August 2015
SunMonTueWedThuFriSat
      
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
     
Today