Tuesday Dec 27, 2011

Oracle Application Testing Suite (OATS): Few Tips & Tricks

OATS is a suite of applications that can be used for performance and scalability testing, functional and regression testing. It is a thin client application that runs within a web browser - so, it is easy to use the tool from anywhere as long as the web server running on the host node is accessible. Hopefully the following tips and tricks will benefit some of the users of the Oracle Application Testing Suite.

Few technical details first - OATS is a 32-bit Java application that runs in a WebLogic container (WLS) with Oracle XE database being the backend store for test session data.



[Trick] Issue : OATS software fails to install on 64-bit Windows systems

Resolution:
Download and install 64-bit .NET framework manually before installing the OATS software. Look for .NET framework on Microsoft's downloads website.




[Trick] Issue : OATS software fails to install on systems with large number of [virtual] CPUs

Resolution:
On systems with many cores/vCPUs, Oracle database in general requires large amounts of memory to be configured for SGA - so, one solution would be to allocate as much memory as required. However Oracle XE limits the memory utilization within the database to 1 GB. Besides, XE uses only one CPU even if there are multiple CPUs available on a system. Hence one workaround is to limit the number of vCPUs that the system exposes during the installation of OATS software. The steps are shown below.

  • Start button -> Run -> type "msconfig"
  • Click on Boot tab -> Advanced Options
  • Check "Number of processors" and set appropriate value (I believe we can go up to 16)
  • Reboot Windows
  • Uninstall failed OATS installation and try installing again
  • Undo the above made changes after the successful installation of OATS
  • Reboot Windows one final time

Thanks to my colleague Bao Doan for providing this workaround.




[Trick] Issue : During runtime, OATS drive the load and executes the test as expected but fails to collect runtime statistics

Resolution:
This is another limitation of Oracle XE database. Until 10g, XE limits the maximum amount of user data in the database to 4 GB. This limit was raised to 11 GB in release Oracle 11g XE. OATS 9.x releases bundle Oracle 10g XE. To take advantage of the larger limit for data, install Oracle 11g XE manually before installing OATS software. OATS installer gives the option to use an existing installation of Oracle XE. Besides, it is not possible to have multiple Oracle XE installations on a single box anyway (that's another XE limitation).

For existing installations, one workaround is to remove old and unwanted sessions to make room for new sessions in the database. Listed below are the steps.

  • Connect to the Oracle Load Testing (OLT) tool
  • Click on "Manage" top-level menu (upper right corner) -> Sessions
  • Click on any unwanted session and press "Delete" button (I recommend deleting one session at a time)



[Trick] Issue : Under load, there are many network timeouts with ton of sockets in TIME_WAIT state on OATS agent systems including the OATS Controller node

Resolution:
Tune TCP/IP parameters on Windows as shown below.

  • Launch Windows registry
  • Navigate to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\TcpIP\parameters
  • Configure the following two parameters. If not found, create those parameters by selecting Edit -> New -> DWORD Value from the menu bar. Select "Decimal" under Base.
      TcpTimedWaitDelay : 30 [seconds]
      MaxUserPort : 65534
  • Reboot Windows

Thanks to my colleagues Dino and Vishnu for sharing this workaround.




[Trick] Issue : OATS Controller does not show any graphs or analysis reports

Resolution:
Install Adobe Flash Plugin and try again.




[Trick] Issue : Under load, OATS Controller stops collecting runtime statistics at some random point

Resolution:
Check Oracle database alert log for some clue(s). If there is an error message such as "ORA-12516: TNS:listener could not find available handler with matching protocol stack", connect to the database, query v$resource_limit view and compare the values reported under CURRENT_UTILIZATION and MAX_UTILIZATION for the resource "processes". If the current utilization is pretty close to the configured maximum value, raise the value for processes parameter in [S]PFILE.




[Tip] Balancing the load among multiple OATS agent systems

One simple way is to create a VU Agent System Group based on the available agent systems. Steps listed below.

  • Connect to the Oracle Load Testing (OLT) tool
  • Click on "Manage" top-level menu (upper right corner) -> Systems
  • Click on "VU Agent System Group" in the left hand side
  • On the right hand side, click on "New" option
  • Select all the agent systems that you want to be part of the "VU Agent System Group"
  • Finally name the newly created system group and save

Note that it is not possible to attach weights to the agent systems - so, it is suggested to have agent systems with similar hardware configurations in the VU Agent System Group.



[Tip] Balancing the load among multiple web servers using OATS Controller

If there are multiple web server instances running in a enterprise application deployment; and OATS software is being used to test the performance and scalability of the application, parameterizing the web server hostname and port number in OATS test script will take care of the web server load balancing problem. Of course there are many alternatives to this approach such as using a hardware load balancer, using web server Reverse Proxy etc.,



[Added on 01/19/2012]

[Tip] How-To check the available space in USERS tablespace?

Run the following on OATS Controller node:

Start -> All Programs -> Oracle Database XX Express Edition -> Run SQL Command Line

SQL> connect / as sysdba

SQL> SELECT /* + RULE */  df.tablespace_name "Tablespace",
       df.bytes / (1024 * 1024) "Size (MB)",
       SUM(fs.bytes) / (1024 * 1024) "Free (MB)",
       Nvl(Round(SUM(fs.bytes) * 100 / df.bytes),1) "% Free",
       Round((df.bytes - SUM(fs.bytes)) * 100 / df.bytes) "% Used"
  FROM dba_free_space fs,
       (SELECT tablespace_name,SUM(bytes) bytes
          FROM dba_data_files
         GROUP BY tablespace_name) df
 WHERE fs.tablespace_name (+)  = df.tablespace_name
 GROUP BY df.tablespace_name,df.bytes
UNION ALL
SELECT /* + RULE */ df.tablespace_name tspace,
       fs.bytes / (1024 * 1024),
       SUM(df.bytes_free) / (1024 * 1024),
       Nvl(Round((SUM(fs.bytes) - df.bytes_used) * 100 / fs.bytes), 1),
       Round((SUM(fs.bytes) - df.bytes_free) * 100 / fs.bytes)
  FROM dba_temp_files fs,
       (SELECT tablespace_name,bytes_free,bytes_used
          FROM v$temp_space_header
         GROUP BY tablespace_name,bytes_free,bytes_used) df
 WHERE fs.tablespace_name (+)  = df.tablespace_name
 GROUP BY df.tablespace_name,fs.bytes,df.bytes_free,df.bytes_used
 ORDER BY 4 DESC;

Copy/paste the above SQL code in a text file with sql extension and execute that SQL statement by calling the SQL script from SQL> command prompt. eg., assuming the above code was saved in a plain text file called chktblspcusg.sql under C:\ drive, execute the SQL script as shown below:

SQL> @C:\chktblspcusg.sql




[Added on 06/27/2012]

[Trick] Issue : An attempt to open a test script in OpenScript fails with error

'Failed to open script' has encountered a problem.
Failed to open <script_name>. See error log for details.

Clicking on "Details" button provides the following clue.

The project description file (.project) for '<script_name>' is missing"

In addition the title bar shows "Relocating Eclipse Projects: The project description file (.project) for XXX is missing".

Resolution:
Navigate to C:\Documents and Settings\Administrator\osworkspace\.metadata\.plugins\org.eclipse.core.resources\.projects\

Look for the directory by name "<failing_script_name>" and remove it



[Added on 08/03/2012]

[Trick] Issue: Unexpected Agent exit. Code = 51 in the middle of an OLT load test

When running a load scenario in Oracle Load Testing (OLT) that uses a databank, the scenario runs fine for some time and then all of a sudden fails with the following error: Unexpected Agent exit. Code = 51.

Workaround:

The following settings may alleviate the issue.

  • - toggle/experiment with the settings for "Clear cache between iterations" and "Clear cache before playing back"
    • those settings can be found under the test script preferences -> Playback -> Web Functional -> Miscellaneous
  • - experiment with different values for "Maximum users per process" setting
    • this setting is under OLT -> Configure all parameters -> Advanced
  • - increase the Java heap size (both min & max) in file <OATS_HOME>\agentmanager\bin\AgentManagerService.conf
    • default values: min heap size: 16 MB; max heap size: 64 MB

Contributors: John Snyder, Richard Barry

[Added 02/25/13]

Another colleague Dave Suri has an alternate tip to resolve the Agent 51 issue.

Edit <OATS_HOME>\agentmanager\processDescriptors\JavaAgent.properties

Change the following lines:

#process.debug=y
#process.debug.suspend=y
#process.debug.port=8123
#process.debug.custom=

To:

process.debug=y
#process.debug.suspend=y
#process.debug.port=8123
process.debug.custom=-verbose:gc -XX:+HeapDumpOnOutOfMemoryError -Xms512M 
-Xmx1536M -jrockit -Xrs -XgcPrio:deterministic -XpauseTarget=50ms 
-XX:+UseCallProfiling -XX:+UseAdaptiveFatSpin -XX:+ExitOnOutOfMemoryError 
-XXnoSystemGC -XX:+UseFastTime


PS:
I'm no OATS expert - merely documented what we found along with solutions and workarounds. Feel free to post your question(s) in the comments section. Hopefully someone will respond.

See Also:

Tuesday Dec 13, 2011

Solaris Tip: Resolving "statd: cannot talk to statd at <target_host>, RPC: Timed out(5)"

Symptom:

System log shows a bunch of RPC timed out messages such as the following:


Dec 13 09:23:23 gil08 last message repeated 1 time
Dec 13 09:29:14 gil08 statd[19858]: [ID 766906 daemon.warning] statd: cannot talk to statd at ssc23, RPC: Timed out(5)
Dec 13 09:35:05 gil08 last message repeated 1 time
Dec 13 09:40:56 gil08 statd[19858]: [ID 766906 daemon.warning] statd: cannot talk to statd at ssc23, RPC: Timed out(5)
..

Those messages are the result of an apparent communication failure between the status daemons (statd) of both local and remote hosts using RPC calls.

Workaround/Solution:

If the target_host is reachable, execute the following to stop the system from generating those warning messages --- stop the network status monitor, remove the target host entry from /var/statmon/sm.bak file and start the network status monitor process. Removing the target host entry from sm.bak file keeps that machine from being aware that it may have to participate in locking recovery.

eg.,


# ps -eaf | fgrep statd 
  daemon 14304 19622   0 09:47:16 ?           0:00 /usr/lib/nfs/statd
    root 14314 14297   0 09:48:03 pts/15      0:00 fgrep statd

# svcs -a | grep "nfs/status"
online          9:52:41 svc:/network/nfs/status:default

# svcadm -v disable nfs/status
svc:/network/nfs/status:default disabled.

# ls /var/statmon/sm.bak
ssc23

# rm /var/statmon/sm.bak/ssc23

# svcadm -v enable nfs/status
svc:/network/nfs/status:default enabled.

Friday Nov 18, 2011

Siebel Troubleshooting : An ODBC error occurred; SBL-GEN-03006: Error calling function: DICFindTable m_pReqTbl

Symptom:

A newly installed Siebel application server fails to start despite successful ODBC connectivity to the database. SRProc process logs ODBC error messages similar to the following:


Message: GEN-13,
 Additional Message: dict-ERR-1109: 
       Unable to read value from export file (Data length (32) > Column definition (3)).

Message: GEN-13,
 Additional Message: dict-ERR-1107: Unable to read row 0 from export file (UTLDataValRead pBuf, col 4 ).

GenericLog  GenericError  1     0002157..  11-11-18 13:28  Message: Generated SQL statement:,
 Additional Message: SQLFetch:
   SELECT RDOBJ.DOCK_ID, RDOBJ.RELATED_DOCK_ID, RDOBJ.SQL_STATEMENT, RDOBJ.CHECK_VISIBILITY,
          'N', RDOBJ.COMMENTS, RDOBJ.ACTIVE, RDOBJ.SEQUENCE, RDOBJ.VIS_STRENGTH,
          RDOBJ.REL_VIS_STRENGTH, RDOBJ.VIS_EVT_COLS
     FROM ORAPERF.S_DOCK_REL_DOBJ RDOBJ, ORAPERF.S_DOCK_OBJECT DOBJ
    WHERE RDOBJ.REPOSITORY_ID = (SELECT ROW_ID FROM ORAPERF.S_REPOSITORY WHERE NAME = ?)
      AND DOBJ.ROW_ID = RDOBJ.DOCK_ID
      AND (DOBJ.INACTIVE_FLG = 'N' OR DOBJ.INACTIVE_FLG IS NULL)
      AND (RDOBJ.INACTIVE_FLG = 'N' OR RDOBJ.INACTIVE_FLG IS NULL)

Message: Error: An ODBC error occurred,
 Additional Message: Function: DICGetRDObjects; ODBC operation: SQLFetch

Message: GEN-13,
 Additional Message: dict-ERR-1109: Unable to read value from export file (UTLCompressFRead (fseek)).

Message: GEN-13,
 Additional Message: dict-ERR-1107: Unable to read row 0 from export file (UTLDataValRead pBuf, col 0 ).

Message: GEN-10,
 Additional Message: Calling Function: DICLoadDObjectInfo; Called Function: Calling DICGetRDObjects

Message: GEN-10,
 Additional Message: Calling Function: DICLoadDict; Called Function: DICLoadDObjectInfo

GenericError
(srpdb.cpp (860) err=3006 sys=2) SBL-GEN-03006: Error calling function: DICFindTable m_pReqTbl
(srpsmech.cpp (74) err=3006 sys=0) SBL-GEN-03006: Error calling function: DICFindTable m_pReqTbl
(srpmtsrv.cpp (107) err=3006 sys=0) SBL-GEN-03006: Error calling function: DICFindTable m_pReqTbl
(smimtsrv.cpp (1203) err=3006 sys=0) SBL-GEN-03006: Error calling function: DICFindTable m_pReqTbl
SmiLayerLog Error       Terminate process due to unrecoverable error: 3006. (Main Thread)

An inconsistent or corrupted dictionary file "diccache.dat" is likely the cause.

Solution:

  • Stop the application server and manually kill the remaining Siebel application specific processes

    eg.,

    stop_server all
    
    pkill siebmtsh
    pkill siebproc
    ..
    
  • Remove $SIEBEL_HOME/bin/diccache.dat file. It will be re-generated during the application server startup

  • Start the application server
    start_server all
    

Monday Oct 10, 2011

Oracle Database on NFS : Resolving "ORA-27086: unable to lock file - already in use" Error

Some Context

Oracle database was hosted on ZFS Storage Appliance (NAS). The database files are accessible from the database server node via NFS mounted filesystems. Solaris 10 is the operating system on DB node.

Someone forgets to shutdown the database instance and unmount the remote filesystems before rebooting the database server node. After the system boots up, Oracle RDBMS fails to bring up the database due to locked-out data files.

eg.,

SQL> startup
ORACLE instance started.

Total System Global Area 1.7108E+10 bytes
Fixed Size		    2165208 bytes
Variable Size		 9965671976 bytes
Database Buffers	 6845104128 bytes
Redo Buffers		  295329792 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 1 - see DBWR trace file
ORA-01110: data file 1: '/orclvol4/entDB/system01.dbf'

======================
Extract from alert log:
======================

...
ALTER DATABASE OPEN
Fri Aug 05 21:30:54 2011
Errors in file /oracle112/diag/rdbms/entdb/entDB/trace/entDB_dbw0_7235.trc:
ORA-01157: cannot identify/lock data file 1 - see DBWR trace file
ORA-01110: data file 1: '/orclvol4/entDB/system01.dbf'
ORA-27086: unable to lock file - already in use
SVR4 Error: 11: Resource temporarily unavailable
Additional information: 8
Additional information: 21364
Errors in file /oracle112/diag/rdbms/entdb/entDB/trace/entDB_dbw0_7235.trc:
ORA-01157: cannot identify/lock data file 2 - see DBWR trace file
ORA-01110: data file 2: '/orclvol4/entDB/sysaux01.dbf'
ORA-27086: unable to lock file - already in use
SVR4 Error: 11: Resource temporarily unavailable
Additional information: 8
Additional information: 21364
...

Reason for the lock failure:

Because of the sudden ungraceful shutdown of the database, file locks on data files were not released by the NFS server (ZFS SA in this case). NFS server held on to the file locks even after the NFS client (DB server node in this example) was restarted. Due to this, Oracle RDBMS is not able to lock those data files residing on NFS server (ZFS SA). As a result, database instance was failed to start up in exclusive mode.

Workaround

Manually clear the NFS locks as outlined below.

On NFS Client (database server node):

  1. Shutdown the mounted database
  2. Unmount remote (NFS) filesystems
  3. Execute: clear_locks -s <nfs_server_host>

    eg.,

    # clear_locks -s sup16
    Clearing locks held for NFS client ipsedb1 on server sup16
    clear of locks held for ipsedb1 on sup16 returned success
    

On NFS Server (ZFS SA):
    (this step may not be necessary but wouldn't hurt to perform)

  1. Execute: clear_locks <nfs_client_host>

    eg.,

    sup16# clear_locks 10.129.207.93
    Clearing locks held for NFS client 10.129.207.93 on server sup16
    clear of locks held for 10.129.207.93 on sup16 returned success
    

Again back on NFS Client (database server node):

  1. Restart NFS client
        (this step may not be necessary but wouldn't hurt to perform)
    # svcadm -v disable nfs/client
    # svcadm -v enable nfs/client
    
  2. Mount remote/NFS filesystems
  3. Finally start the database

Also see:
Listing file locks on Solaris 10

Thursday Oct 06, 2011

Siebel Connection Broker Load Balancing Algorithm

Siebel server architecture supports spawning multiple application object manager processes. The Siebel Connection Broker, SCBroker, tries to balance the load (incoming requests) across different object manager processes running in a single Siebel server.

Least Loaded or Round Robin?

By default, SCBroker forwards the incoming request to any object manager process that is least loaded - meaning the process with the least number of running tasks. In Siebel terminology, this behavior is referred as "least-loaded" or "LL" connection forwarding algorithm. While the default LL algorithm provides the optimal behavior in the best case scenarios, it may lead to serious availability problems if one of several object manager prcesses running in a Siebel server stops responding in a timely fashion [for some reason]. Such an object manager may still accept requests though it may timeout. At some point, the unresponsive/hung or erroneous object manager will have the least number of tasks that may prompt SCBroker component to forward new incoming requests to that object manager process - which in turn leads to a stalemate. To avoid such situations, it is recommended to configure "round-robin" or "RR" algorithm in SCBroker component. When round-robin algorithm is configured, SCBroker ignores the number of running tasks per object manager process and routes all requests to all object managers in a round robin fashion.

While both algorithms have their strengths and weaknesses, customers must weigh both options and choose the one that fits best in their deployment.

eg.,

Find the current load balancing algorithm:

srvrmgr>  list advanced param ConnForwardAlgorithm for comp SCBroker \
             show PA_ALIAS, PA_VALUE, PA_NAME

PA_ALIAS              PA_VALUE  PA_NAME                                    
--------------------  --------  -----------------------------------------  
ConnForwardAlgorithm  LL        Connection Forward algorithm for SCBroker

Configure SCBroker to use round-robin algorithm:

srvrmgr> change param ConnForwardAlgorithm=RR for comp SCBroker server SERVER_NAME
Command completed successfully.

srvrmgr> list advanced param ConnForwardAlgorithm for comp SCBroker \
            show PA_ALIAS, PA_VALUE, PA_NAME

PA_ALIAS              PA_VALUE  PA_NAME                                    
--------------------  --------  -----------------------------------------  
ConnForwardAlgorithm  RR        Connection Forward algorithm for SCBroker

Other SCBroker parameters of interest: ConnForwardTimeout and ConnRequestTimeout

Saturday Sep 10, 2011

Oracle RDBMS : Generic Large Object (LOB) Performance Guidelines

This blog post is generic in nature and based on my recent experience with a content management system where securefile BLOBs are critical in storing and retrieving the checked in content. It is stro ngly suggested to check the official documentation in addition to these brief guidelines. In general, Oracle Database SecureFiles and Large Objects Developer's Guide 11g Release 2 (11.2) is a good starting point when creating tables involving SecureFiles and LOBs.

Guidelines

  • Tablespace: create the LOB in a different tablespace isolated from the rest of the database
  • Block size: consider larger block size (default 8 KB) if the expected size of the LOB is big
  • Chunk size: consider larger chunk size (default 8 KB) if larger LOBs are expected to be stored and retrieved
  • Inline or Out-of-line: choose "DISABLE STORAGE IN ROW" (out-of-line) if the average LOB size is expected to be > 4 KB. The default inlining is fine for smaller LOBs
  • CACHE or NOCACHE: consider bypassing the database buffer cache (NOCACHE) if large number of LOBs are stored and not expected to be retrieved frequently
  • COMPRESS or NOCOMPRESS: choose COMPRESS option if storage capacity is a concern and a constraint. It saves disk space at the expense of some performance overhead. In a RAC database environment, it is recommended to compress the LOBs to reduce the interconnect traffic
  • De-duplication: by default, duplicate LOBs are stored as a separate copy in the database. Choosing DEDUPLICATE option enables sharing the same data blocks for similar files thus reducing storage overhead and simplifying storage management
  • Partitioning: consider partitioning the parent table to maximize application performance. Hash partitioning is one of the options if there is no potential partition key in the table
  • Zero-Copy I/O protocol: turned on by default. Turning it off in a RAC database environment could be beneficial. Set the initialization parameter _use_zero_copy_io=FALSE to turn o ff the Zero-Copy I/O protocol
  • Shared I/O pool: database uses the shared I/O pool to perform large I/O operations on securefile LOBs. The shared I/O pool uses shared memory segments. If this pool is not large enough or if there is not enough memory available in this pool for a securefile LOB I/O operation, Oracle uses a portion of PGA until there is sufficient memory available in the shared I/O pool. Hence it is recommen ded to size the shared I/O pool appropriately by monitoring the database during the peak activity. Relevant initialization parameters: _shared_io_pool_size and _shared_iop_max_size

Also see:
Oracle Database Documentation : LOB Performance Guidelines

Saturday Aug 27, 2011

Oracle 11g: Travel back in time with the Database Flashback

Error recovery, historical reporting, trend analysis, data forensics and fraud detection are just some of the business problems that can be solved by using the Flashback Data Archive feature in Oracle 11g database. The Flashback option can be enabled for the entire database or for a selected set of tables. It can be enabled in the database with no application changes.

At work I usually run performance tests by starting with a clean copy of the database. I analyze the test results at the end of the test, determine the next course of action (tuning), restore the clean copy of the database from a backup, apply the tuning and re-run the performance test. It goes on in a cycle until I'm happy with the overall test result. In some cases especially with large data sets, restoring the database from a backup becomes one of the time consuming tasks. In such situations, using the database flashback to go back to a previously saved restore point saves quite a bit of time. Rest of this blog post demonstrates how to enable database flashback and to go back to a specified restore point. Check Oracle Total Recall with Oracle Database 11g Release 2 white paper for more information on Flashback Data Archive (FDA).

Objective

Revert the entire database to a previously saved state at will

Steps to perform

  • Configure the following initialization parameters: db_recovery_file_dest & db_recovery_file_dest_size
  • Enable Archive Log mode
  • Enable database Flashback option
  • Create a restore point. Decide whether to create a normal or a guaranteed restore point
    --------------------------------------------------------------------------------------------------------
  • Finally flashback database to the created restore point when required

Be aware that there will be some performance and storage overhead in using the database flashback. Evaluate all your options carefully before configuring database flashback.

Example

The following example uses guaranteed restore point to flashback the database in a two-node RAC environment. Most of the example is self-explanatory.

% srvctl status database -d DEMO
Instance DEMO1 is running on node racnode01
Instance DEMO2 is running on node racnode02

/* stop all the database instances except one (anyone) in RAC config */

% srvctl stop instance -d DEMO -i DEMO2

% export ORACLE_SID=DEMO1

/* put one of the instances in non-cluster mode */

% sqlplus / as sysdba
SQL> alter system set cluster_database=false scope=spfile;

% srvctl stop instance -d DEMO -i DEMO1

% sqlplus / as sysdba
SQL> startup mount

/* enable archive log mode */

SQL> alter database archivelog;

SQL> archive log list
Database log mode	       Archive Mode
Automatic archival	       Enabled
Archive destination	       USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     2
Next log sequence to archive   4
Current log sequence	       4

SQL> show parameter db_recovery_file

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
db_recovery_file_dest		     string	 +FRA
db_recovery_file_dest_size	     big integer 512G

/* enable flashback option */

SQL> alter database flashback on;

SQL> select flashback_on from v$database;

FLASHBACK_ON
------------------
YES

/* put the instance back in cluster mode and restart the database */

SQL> alter system set cluster_database=true scope=spfile;

SQL> alter database open;

% srvctl stop instance -d DEMO -i DEMO1

% srvctl start database -d DEMO

/* create a guaranteed restore point */

% sqlplus / as sysdba
SQL> create restore point demo_clean_before_test guarantee flashback database;

Restore point created.

SQL> column NAME format A25
SQL> column TIME format A40
SQL> set lines 120
SQL> select NAME, SCN, TIME, GUARANTEE_FLASHBACK_DATABASE, STORAGE_SIZE 
  2  from V$RESTORE_POINT where GUARANTEE_FLASHBACK_DATABASE='YES';

NAME				 SCN TIME		              GUA STORAGE_SIZE
------------------------- ---------- -------------------------------- --- ------------
DEMO_CLEAN_BEFORE_TEST     17460960 21-AUG-11 01.01.20.000 AM	      YES     67125248

/* flashback database to the saved restore point */

% srvctl stop database -d DEMO

% export ORACLE_SID=DEMO1

% rman TARGET /

RMAN> STARTUP MOUNT;
RMAN> FLASHBACK DATABASE TO RESTORE POINT 'DEMO_CLEAN_BEFORE_TEST';

Starting flashback at 21-AUG-11
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed time: 00:00:25

Finished flashback at 21-AUG-11

RMAN> ALTER DATABASE OPEN RESETLOGS;

database opened

RMAN> SHUTDOWN IMMEDIATE;

% srvctl start database -d DEMO

/* ============================================================================== */

/* alternatively run the following RMAN script as shown below */

% cat restore.rman
RUN {
        STARTUP MOUNT;
        FLASHBACK DATABASE TO RESTORE POINT 'DEMO_CLEAN_BEFORE_TEST';
        ALTER DATABASE OPEN RESETLOGS;
        SHUTDOWN IMMEDIATE;
}

EXIT

% rman TARGET / cmdfile=restore.rman

Note:
It is not mandatory to enable logging for flashback database in order to create and use restore points. The requirement in such a case is to put the database in ARCHIVELOG mode and creating the first guaranteed restore point when the database is in mounted state.

Friday May 27, 2011

PeopleSoft Application Server : Binding JSL Port to Multiple IP Addresses

(Pardon the formatting. Legible copy of this blog post is available at:
http://technopark02.blogspot.com/2011/05/peoplesoft-application-server-binding.html)

For the impatient:

On any multi-homed1 host, replace %PS_MACH% variable in "Jolt Listener" section of the application server domains' psappsrv.cfg file wih the special IP address "0.0.0.0" to get the desired effect. It enables TCP/IP stack to listen on all available network interfaces on the system. In other words, if JSL is listening on 0.0.0.0 on a multi-homed system, PIA traffic can flow using any of the IP addresses assigned to that system.

For the rest:

A little background first.

PeopleSoft application server relies on Jolt, a companion product that co-exists with Tuxedo, to handle all web requests. That is, Jolt is the bridge between PeopleSoft application server and the web server (any supported one) that facilitates web communication. Tuxedo helps schedule PeopleSoft application server processes to perform the actual transactions. When the application server is booted up, Jolt server listener (JSL) is bound to a pre-configured port number and is actively monitored for incoming web requests. On the other hand, web server instance(s) are made aware of the existence of all Jolt listeners in a PeopleSoft Enterprise by configuring the hostname:port# pairs in each of the web domain's configuration.properties file.

By default the variable %PS_MACH% in each of the application server domain configuration file, psappsrv.cfg, gets resolved to the hostname of the system during application server boot-up time. The following example demonstrates that.

eg.,

/* Application server configuration file */
% cat psappsrv.cfg
..
[JOLT Listener]
Address=%PS_MACH%
Port=9000
..

/* Boot up the application server domain */
% psadmin -c boot -d HRHX
..
exec JSL -A -- -d /dev/tcp -n //ben01:9000 -m 100 -M 100 -I 5 -j ANY -x 20 -S 10 -c 1000000 -w JSH :
        process id=20077 ... Started.
..

% hostname
ben01

% netstat -a | grep 9000
ben01.9000                 *.*                0      0 49152      0 LISTEN

% netstat -an | grep 9000
17.16.221.106.9000          *.*                0      0 49152      0 LISTEN

% ifconfig -a
lo0: flags=2001000849 mtu 8232 index 1
 inet 127.0.0.1 netmask ff000000 
bge0: flags=1000843 mtu 1500 index 2
 inet 17.16.221.106 netmask ffffff00 broadcast 17.16.221.255
bge1: flags=1000843 mtu 1500 index 3
 inet 18.1.1.1 netmask ffffff00 broadcast 18.1.1.255
e1000g0: flags=1000843 mtu 1500 index 4
 inet 18.1.1.201 netmask ffffff00 broadcast 18.1.1.255

% telnet 17.16.221.106 9000
Trying 17.16.221.106...
Connected to 17.16.221.106.
Escape character is '^]'.

% telnet 18.1.1.1 9000
Trying 18.1.1.1...
telnet: Unable to connect to remote host: Connection refused

% telnet 18.1.1.201 9000
Trying 18.1.1.201...
telnet: Unable to connect to remote host: Connection refused

Notice that %PS_MACH% was replaced by the actual hostname and the Jolt listener created the server socket using the IP address 17.16.221.106 and port number 9000. From the outputs of netstat, ifconfig and telnet, it is apparent that "bge0" is the only network interface that is being used by the Jolt listener. It means web server can communicate to JSL using the IP address 17.16.221.106 over port 9000 but not using any of the other two IP addresses 18.1.1.1 or 18.1.1.201. This is the default behavior.

However some customers may wish to have the ability to connect to the application services from different/multiple networks. This is possible in case of multi-homed systems -- servers with multiple network interfaces that are connected to a single or multiple networks. For example, such a host could be part of a public network, a private network where only those clients that can communicate over private links can connect or an InfiniBand network, a low latency high throughput network. The default behavior of JSL can be changed by using a special IP address "0.0.0.0" in place of the variable %PS_MACH% in application server domains' configuration file. The IP address 0.0.0.0 hints the Jolt listener (JSL) to listen on all available IPv4 network interfaces on the system. (I read somewhere that "::0" is the equivalent for IPv6 interfaces. Didn't get a chance to test it out yet). The following example demonstrates how the default behavior changes with the IP address 0.0.0.0.

% cat psappsrv.cfg
..
[JOLT Listener]
Address=0.0.0.0
Port=9000
..

/* Update the binary configuration by reloading the config file */
% psadmin -c configure -d HRHX

% psadmin -c boot -d HRHX
..
exec JSL -A -- -d /dev/tcp -n //0.0.0.0:9000 -m 100 -M 100 -I 5 -j ANY -x 20 -S 10 -c 1000000 -w JSH :
 process id=20874 ... Started.

% netstat -a | grep 9000
      *.9000               *.*                0      0 49152      0 LISTEN

% telnet 17.16.221.106 9000
Trying 17.16.221.106...
Connected to 17.16.221.106.
..

% telnet 18.1.1.1 9000
Trying 18.1.1.1...
Connected to 18.1.1.1.
..

% telnet 18.1.1.201 9000
Trying 18.1.1.201...
Connected to 18.1.1.201.

Footnote:
[1] Any system with more than one interface is considered a multi-homed host

Wednesday May 25, 2011

Oracle Database: How to Figure Out if a Tablespace is Empty

It is not uncommon for Oracle DBAs to create, drop tablespaces for various reasons throughout the life of a database management system. It is a good practice to double check whether a tablespace is really empty before dropping it. One way is to visually check the "Tablespaces" section in Oracle Enterprise Manager (OEM) database console. However sometimes the graphical interface may return false positives. Another way is to rely on data dictionary views to obtain relevant information as accurately as possible. In the latter case, DBA_SEGMENTS / USER_SEGMENTS views are useful in mapping segments such as table, index, table/index partition etc., to a tablespace.

eg.,

The following example queries USER_SEGMENTS view to list the segments and their types stored in a tablespace called "TS_SALES_DATA32K".

SQL> COLUMN SEGMENT_NAME FORMAT A30
SQL> COLUMN SEGMENT_TYPE FORMAT A30
SQL> 
SQL> SELECT SEGMENT_NAME, SEGMENT_TYPE
  2  FROM USER_SEGMENTS
  3  WHERE TABLESPACE_NAME = 'TS_SALES_DATA32K';

SEGMENT_NAME                   SEGMENT_TYPE
------------------------------ ------------------------------
SALES_DATA                     TABLE

The tablespace "TS_SALES_DATA32K" is holding one table called "SALES_DATA". The following query returns no rows meaning tablespace "TS_SALES_DATA" is empty - hence it can be dropped with no hesitation.

SQL> SELECT SEGMENT_NAME, SEGMENT_TYPE         
  2  FROM USER_SEGMENTS       
  3  WHERE TABLESPACE_NAME = 'TS_SALES_DATA';

no rows selected

Here is another query that lists out all the tablespaces in a database along with the number of segments/objects stored in each of those tablespaces. Note that it is possible to extract similar information in different ways using more efficient queries.

SQL> COLUMN TABLESPACE FORMAT A40
SQL> 
SQL> SELECT UT.TABLESPACE_NAME "TABLESPACE", COUNT (US.SEGMENT_NAME) "NUM SEGMENTS"
  2  FROM USER_TABLESPACES UT, USER_SEGMENTS US
  3  WHERE UT.TABLESPACE_NAME = US.TABLESPACE_NAME
  4  GROUP BY (UT.TABLESPACE_NAME)
  5  ORDER BY COUNT (US.SEGMENT_NAME) DESC;

TABLESPACE                               NUM SEGMENTS
---------------------------------------- -----------
TS_DP                                         114989
TS_DP_X                                          306
..
TS_SALES_DATA32K                                   1
TS_SALES_DATA                                      0

13 rows selected.

SEE ALSO:
Oracle Database Concepts : Data Blocks, Extents, and Segments



(Copy of this blog post is also available at:
http://technopark02.blogspot.com/2011/05/oracle-database-how-to-figure-out-if.html)

Thursday Apr 14, 2011

Oracle Solaris: Show Me the CPU, vCPU, Core Counts and the Socket-Core-vCPU Mapping

[Replaced old code with new code on 10/03/11]

It should be easy to find this information just by running an OS command. However for some reason it ain't the case as of today. The user must know few details about the underlying hardware and run multiple commands to figure out the exact number of physical processors, cores etc.,

For the benefit of our customers, here is a simple shell script that displays the number of physical processors, cores, virtual processors, cores per physical processor, number of hardware threads (vCPUs) per core and the virtual CPU mapping for all physical processors and cores on a Solaris system (SPARC or x86/x64). This script showed valid output on recent T-series, M-series hardware as well as on some older hardware - Sun Fire 4800, x4600. Due to the changes in the output of cpu_info over the years, it is possible that the script may return incorrect information in some cases. Since it is just a shell script, tweak the code as you like. The script can be executed by any OS user.

Download the script : showcpucount


% cat showcpucount

--------------------------------------- CUT HERE -------------------------------------------
#!/bin/bash

/usr/bin/kstat -m cpu_info | egrep "chip_id|core_id|module: cpu_info" > /var/tmp/cpu_info.log

nproc=`(grep chip_id /var/tmp/cpu_info.log | awk '{ print $2 }' | sort -u | wc -l | tr -d ' ')`
ncore=`(grep core_id /var/tmp/cpu_info.log | awk '{ print $2 }' | sort -u | wc -l | tr -d ' ')`
vproc=`(grep 'module: cpu_info' /var/tmp/cpu_info.log | awk '{ print $4 }' | sort -u | wc -l | tr -d ' ')`

nstrandspercore=$(($vproc/$ncore))
ncoresperproc=$(($ncore/$nproc))

speedinmhz=`(/usr/bin/kstat -m cpu_info | grep clock_MHz | awk '{ print $2 }' | sort -u)`
speedinghz=`echo "scale=2; $speedinmhz/1000" | bc`

echo "Total number of physical processors: $nproc"
echo "Number of virtual processors: $vproc"
echo "Total number of cores: $ncore"
echo "Number of cores per physical processor: $ncoresperproc"
echo "Number of hardware threads (strands or vCPUs) per core: $nstrandspercore"
echo "Processor speed: $speedinmhz MHz ($speedinghz GHz)"

# now derive the vcpu-to-core mapping based on above information #

echo -e "\n** Socket-Core-vCPU mapping **"
let linenum=2

for ((i = 1; i <= ${nproc}; ++i ))
do
        chipid=`sed -n ${linenum}p /var/tmp/cpu_info.log | awk '{ print $2 }'`
        echo -e "\nPhysical Processor $i (chip id: $chipid):"

        for ((j = 1; j <= ${ncoresperproc}; ++j ))
        do
                let linenum=($linenum + 1)
                coreid=`sed -n ${linenum}p /var/tmp/cpu_info.log | awk '{ print $2 }'`
                echo -e "\tCore $j (core id: $coreid):"

                let linenum=($linenum - 2)
                vcpustart=`sed -n ${linenum}p /var/tmp/cpu_info.log | awk '{ print $4 }'`

                let linenum=(3 * $nstrandspercore + $linenum - 3)
                vcpuend=`sed -n ${linenum}p /var/tmp/cpu_info.log | awk '{ print $4 }'`

                echo -e "\t\tvCPU ids: $vcpustart - $vcpuend"
                let linenum=($linenum + 4)
        done
done

rm /var/tmp/cpu_info.log
--------------------------------------- CUT HERE -------------------------------------------

# prtdiag | head -1
System Configuration:  Sun Microsystems  sun4u SPARC Enterprise M4000 Server

# ./showcpucount
Total number of physical processors: 4
Number of virtual processors: 32
Total number of cores: 16
Number of cores per physical processor: 4
Number of hardware threads (strands or vCPUs) per core: 2
Processor speed: 2660 MHz (2.66 GHz)

** Socket-Core-vCPU mapping **

Physical Processor 1 (chip id: 1024):
        Core 1 (core id: 0):
                vCPU ids: 0 - 1
        Core 2 (core id: 2):
                vCPU ids: 2 - 3
        Core 3 (core id: 4):
                vCPU ids: 4 - 5
        Core 4 (core id: 6):
                vCPU ids: 6 - 7

Physical Processor 2 (chip id: 1032):
        Core 1 (core id: 8):
                vCPU ids: 8 - 9
        Core 2 (core id: 10):
                vCPU ids: 10 - 11
        Core 3 (core id: 12):
                vCPU ids: 12 - 13
        Core 4 (core id: 14):
                vCPU ids: 14 - 15

Physical Processor 3 (chip id: 1040):
        Core 1 (core id: 16):
                vCPU ids: 16 - 17
        Core 2 (core id: 18):
                vCPU ids: 18 - 19
        Core 3 (core id: 20):
                vCPU ids: 20 - 21
        Core 4 (core id: 22):
                vCPU ids: 22 - 23

Physical Processor 4 (chip id: 1048):
        Core 1 (core id: 24):
                vCPU ids: 24 - 25
        Core 2 (core id: 26):
                vCPU ids: 26 - 27
        Core 3 (core id: 28):
                vCPU ids: 28 - 29
        Core 4 (core id: 30):
                vCPU ids: 30 - 31
About

Benchmark announcements, HOW-TOs, Tips and Troubleshooting

Search

Archives
« May 2015
SunMonTueWedThuFriSat
     
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
      
Today