Wednesday Jan 04, 2012

Unwanted Software Installers

After all these years of software evolution, it is odd to see not much improvement in the area of software installation. Customers do not seem to mind dealing with different, complex installers. Nevertheless this whole process can be simplified to save time, effort and energy.

In an ideal world, a software installer is supposed to have just one function - copying the software bits to a designated location and nothing else. However today we interact with different installers that does variety of things -- some install the pre-compiled binaries, few come in ready-to-extract zip archives, few others compile the binary on-the-fly and install the binary. Most of the enterprise software installers configure the software as part of the installation process where as few installers install the software and simply quit leaving the configuration step for the experts. Some of the installers hard-code the hostname, IP address, absolute paths of certain files etc., into some of the files on target system, which makes it hard to re-use the software home directory on a different server. Few installers do sensible job by not tying anything to the host system where the software is being installed.

Here is my partial wish list of features for a software installer. I think it is enough to make a point.

Idempotent installations : install the software once, run anywhere. Customers should be able to move the resulting home directories from one host to any location on another host without worrying about the underlying changes to the location of the home directory, hostname, IP address etc., One example is the Oracle RDBMS installation. Once installed, the ORACLE_HOME can be zipped up, moved to another host, extracted and used right away. ORACLE_HOME usually contains the binaries. Installation specific configuration is stored outside of ORACLE_HOME. Optional Oracle Grid Control configuration appears to be saved under ORACLE_HOME, which is an aberration though it can be easily reconfigured once moved to another host.

Simplicity : providing the entire directory structure in an extractable compressed archive file will remove one or more layers of dependency that the software installer has. For example, some of the installers require Java run-time to show the graphical interface for the installer. I recently encountered an installer executable that has private/unsupported symbols statically linked to it. When those private interfaces were removed in a later version of the operating environment, installer crashed and failed to make any progress. Had the software been provided in an extractable archive, software would have been readily available in the latter case. It appears that Oracle Corporation is moving in the right direction by releasing WebLogic 12c software as a zip file.

De-couple software installation from configuration : there should be clear separation between the installation and configuration. Once the software is in place, relevant folks can always configure the software as directed and needed. The customer just needs a simple tool or script to configure the software.

        => Off-topic: providing a web interface is even better. It gives the flexibility to configure the software from anywhere in the same network.

Contain everything in a single top-level directory : it makes patching easier even if the top-level directory was moved to a different location or host. No point in spreading the pieces of software into multiple locations anyway. Going back to the example of ORACLE_HOME, one shortcoming in Oracle RDBMS installation is that few directories/files such as oraInventory reside outside of ORACLE_HOME - so, when moving ORACLE_HOME to another host, it is necessary to move all relevant files that are outside of ORACLE_HOME as well for successful database software patching.

With careful planning/design, Release Engineering can be as creative and innovative as the rest of the teams in delivering a software product out of the door --- but I guess it is up to the customers to demand that attitude.

PS:
This blog post can be improved a lot. However since it is mostly about an opinion and a wish list, there is not much motivation to put more effort into it. And of course it is a generic discussion - nothing specific to a particular software or corporation.

Tuesday Dec 27, 2011

Oracle Application Testing Suite (OATS): Few Tips & Tricks

OATS is a suite of applications that can be used for performance and scalability testing, functional and regression testing. It is a thin client application that runs within a web browser - so, it is easy to use the tool from anywhere as long as the web server running on the host node is accessible. Hopefully the following tips and tricks will benefit some of the users of the Oracle Application Testing Suite.

Few technical details first - OATS is a 32-bit Java application that runs in a WebLogic container (WLS) with Oracle XE database being the backend store for test session data.



[Trick] Issue : OATS software fails to install on 64-bit Windows systems

Resolution:
Download and install 64-bit .NET framework manually before installing the OATS software. Look for .NET framework on Microsoft's downloads website.




[Trick] Issue : OATS software fails to install on systems with large number of [virtual] CPUs

Resolution:
On systems with many cores/vCPUs, Oracle database in general requires large amounts of memory to be configured for SGA - so, one solution would be to allocate as much memory as required. However Oracle XE limits the memory utilization within the database to 1 GB. Besides, XE uses only one CPU even if there are multiple CPUs available on a system. Hence one workaround is to limit the number of vCPUs that the system exposes during the installation of OATS software. The steps are shown below.

  • Start button -> Run -> type "msconfig"
  • Click on Boot tab -> Advanced Options
  • Check "Number of processors" and set appropriate value (I believe we can go up to 16)
  • Reboot Windows
  • Uninstall failed OATS installation and try installing again
  • Undo the above made changes after the successful installation of OATS
  • Reboot Windows one final time

Thanks to my colleague Bao Doan for providing this workaround.




[Trick] Issue : During runtime, OATS drive the load and executes the test as expected but fails to collect runtime statistics

Resolution:
This is another limitation of Oracle XE database. Until 10g, XE limits the maximum amount of user data in the database to 4 GB. This limit was raised to 11 GB in release Oracle 11g XE. OATS 9.x releases bundle Oracle 10g XE. To take advantage of the larger limit for data, install Oracle 11g XE manually before installing OATS software. OATS installer gives the option to use an existing installation of Oracle XE. Besides, it is not possible to have multiple Oracle XE installations on a single box anyway (that's another XE limitation).

For existing installations, one workaround is to remove old and unwanted sessions to make room for new sessions in the database. Listed below are the steps.

  • Connect to the Oracle Load Testing (OLT) tool
  • Click on "Manage" top-level menu (upper right corner) -> Sessions
  • Click on any unwanted session and press "Delete" button (I recommend deleting one session at a time)



[Trick] Issue : Under load, there are many network timeouts with ton of sockets in TIME_WAIT state on OATS agent systems including the OATS Controller node

Resolution:
Tune TCP/IP parameters on Windows as shown below.

  • Launch Windows registry
  • Navigate to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\TcpIP\parameters
  • Configure the following two parameters. If not found, create those parameters by selecting Edit -> New -> DWORD Value from the menu bar. Select "Decimal" under Base.
      TcpTimedWaitDelay : 30 [seconds]
      MaxUserPort : 65534
  • Reboot Windows

Thanks to my colleagues Dino and Vishnu for sharing this workaround.




[Trick] Issue : OATS Controller does not show any graphs or analysis reports

Resolution:
Install Adobe Flash Plugin and try again.




[Trick] Issue : Under load, OATS Controller stops collecting runtime statistics at some random point

Resolution:
Check Oracle database alert log for some clue(s). If there is an error message such as "ORA-12516: TNS:listener could not find available handler with matching protocol stack", connect to the database, query v$resource_limit view and compare the values reported under CURRENT_UTILIZATION and MAX_UTILIZATION for the resource "processes". If the current utilization is pretty close to the configured maximum value, raise the value for processes parameter in [S]PFILE.




[Tip] Balancing the load among multiple OATS agent systems

One simple way is to create a VU Agent System Group based on the available agent systems. Steps listed below.

  • Connect to the Oracle Load Testing (OLT) tool
  • Click on "Manage" top-level menu (upper right corner) -> Systems
  • Click on "VU Agent System Group" in the left hand side
  • On the right hand side, click on "New" option
  • Select all the agent systems that you want to be part of the "VU Agent System Group"
  • Finally name the newly created system group and save

Note that it is not possible to attach weights to the agent systems - so, it is suggested to have agent systems with similar hardware configurations in the VU Agent System Group.



[Tip] Balancing the load among multiple web servers using OATS Controller

If there are multiple web server instances running in a enterprise application deployment; and OATS software is being used to test the performance and scalability of the application, parameterizing the web server hostname and port number in OATS test script will take care of the web server load balancing problem. Of course there are many alternatives to this approach such as using a hardware load balancer, using web server Reverse Proxy etc.,



[Added on 01/19/2012]

[Tip] How-To check the available space in USERS tablespace?

Run the following on OATS Controller node:

Start -> All Programs -> Oracle Database XX Express Edition -> Run SQL Command Line

SQL> connect / as sysdba

SQL> SELECT /* + RULE */  df.tablespace_name "Tablespace",
       df.bytes / (1024 * 1024) "Size (MB)",
       SUM(fs.bytes) / (1024 * 1024) "Free (MB)",
       Nvl(Round(SUM(fs.bytes) * 100 / df.bytes),1) "% Free",
       Round((df.bytes - SUM(fs.bytes)) * 100 / df.bytes) "% Used"
  FROM dba_free_space fs,
       (SELECT tablespace_name,SUM(bytes) bytes
          FROM dba_data_files
         GROUP BY tablespace_name) df
 WHERE fs.tablespace_name (+)  = df.tablespace_name
 GROUP BY df.tablespace_name,df.bytes
UNION ALL
SELECT /* + RULE */ df.tablespace_name tspace,
       fs.bytes / (1024 * 1024),
       SUM(df.bytes_free) / (1024 * 1024),
       Nvl(Round((SUM(fs.bytes) - df.bytes_used) * 100 / fs.bytes), 1),
       Round((SUM(fs.bytes) - df.bytes_free) * 100 / fs.bytes)
  FROM dba_temp_files fs,
       (SELECT tablespace_name,bytes_free,bytes_used
          FROM v$temp_space_header
         GROUP BY tablespace_name,bytes_free,bytes_used) df
 WHERE fs.tablespace_name (+)  = df.tablespace_name
 GROUP BY df.tablespace_name,fs.bytes,df.bytes_free,df.bytes_used
 ORDER BY 4 DESC;

Copy/paste the above SQL code in a text file with sql extension and execute that SQL statement by calling the SQL script from SQL> command prompt. eg., assuming the above code was saved in a plain text file called chktblspcusg.sql under C:\ drive, execute the SQL script as shown below:

SQL> @C:\chktblspcusg.sql




[Added on 06/27/2012]

[Trick] Issue : An attempt to open a test script in OpenScript fails with error

'Failed to open script' has encountered a problem.
Failed to open <script_name>. See error log for details.

Clicking on "Details" button provides the following clue.

The project description file (.project) for '<script_name>' is missing"

In addition the title bar shows "Relocating Eclipse Projects: The project description file (.project) for XXX is missing".

Resolution:
Navigate to C:\Documents and Settings\Administrator\osworkspace\.metadata\.plugins\org.eclipse.core.resources\.projects\

Look for the directory by name "<failing_script_name>" and remove it



[Added on 08/03/2012]

[Trick] Issue: Unexpected Agent exit. Code = 51 in the middle of an OLT load test

When running a load scenario in Oracle Load Testing (OLT) that uses a databank, the scenario runs fine for some time and then all of a sudden fails with the following error: Unexpected Agent exit. Code = 51.

Workaround:

The following settings may alleviate the issue.

  • - toggle/experiment with the settings for "Clear cache between iterations" and "Clear cache before playing back"
    • those settings can be found under the test script preferences -> Playback -> Web Functional -> Miscellaneous
  • - experiment with different values for "Maximum users per process" setting
    • this setting is under OLT -> Configure all parameters -> Advanced
  • - increase the Java heap size (both min & max) in file <OATS_HOME>\agentmanager\bin\AgentManagerService.conf
    • default values: min heap size: 16 MB; max heap size: 64 MB

Contributors: John Snyder, Richard Barry

[Added 02/25/13]

Another colleague Dave Suri has an alternate tip to resolve the Agent 51 issue.

Edit <OATS_HOME>\agentmanager\processDescriptors\JavaAgent.properties

Change the following lines:

#process.debug=y
#process.debug.suspend=y
#process.debug.port=8123
#process.debug.custom=

To:

process.debug=y
#process.debug.suspend=y
#process.debug.port=8123
process.debug.custom=-verbose:gc -XX:+HeapDumpOnOutOfMemoryError -Xms512M 
-Xmx1536M -jrockit -Xrs -XgcPrio:deterministic -XpauseTarget=50ms 
-XX:+UseCallProfiling -XX:+UseAdaptiveFatSpin -XX:+ExitOnOutOfMemoryError 
-XXnoSystemGC -XX:+UseFastTime


See Also:

Tuesday Dec 13, 2011

Solaris Tip: Resolving "statd: cannot talk to statd at <target_host>, RPC: Timed out(5)"

Symptom:

System log shows a bunch of RPC timed out messages such as the following:


Dec 13 09:23:23 gil08 last message repeated 1 time
Dec 13 09:29:14 gil08 statd[19858]: [ID 766906 daemon.warning] statd: cannot talk to statd at ssc23, RPC: Timed out(5)
Dec 13 09:35:05 gil08 last message repeated 1 time
Dec 13 09:40:56 gil08 statd[19858]: [ID 766906 daemon.warning] statd: cannot talk to statd at ssc23, RPC: Timed out(5)
..

Those messages are the result of an apparent communication failure between the status daemons (statd) of both local and remote hosts using RPC calls.

Workaround/Solution:

If the target_host is reachable, execute the following to stop the system from generating those warning messages --- stop the network status monitor, remove the target host entry from /var/statmon/sm.bak file and start the network status monitor process. Removing the target host entry from sm.bak file keeps that machine from being aware that it may have to participate in locking recovery.

eg.,


# ps -eaf | fgrep statd 
  daemon 14304 19622   0 09:47:16 ?           0:00 /usr/lib/nfs/statd
    root 14314 14297   0 09:48:03 pts/15      0:00 fgrep statd

# svcs -a | grep "nfs/status"
online          9:52:41 svc:/network/nfs/status:default

# svcadm -v disable nfs/status
svc:/network/nfs/status:default disabled.

# ls /var/statmon/sm.bak
ssc23

# rm /var/statmon/sm.bak/ssc23

# svcadm -v enable nfs/status
svc:/network/nfs/status:default enabled.

Friday Nov 18, 2011

Siebel Troubleshooting : An ODBC error occurred; SBL-GEN-03006: Error calling function: DICFindTable m_pReqTbl

Symptom:

A newly installed Siebel application server fails to start despite successful ODBC connectivity to the database. SRProc process logs ODBC error messages similar to the following:


Message: GEN-13,
 Additional Message: dict-ERR-1109: 
       Unable to read value from export file (Data length (32) > Column definition (3)).

Message: GEN-13,
 Additional Message: dict-ERR-1107: Unable to read row 0 from export file (UTLDataValRead pBuf, col 4 ).

GenericLog  GenericError  1     0002157..  11-11-18 13:28  Message: Generated SQL statement:,
 Additional Message: SQLFetch:
   SELECT RDOBJ.DOCK_ID, RDOBJ.RELATED_DOCK_ID, RDOBJ.SQL_STATEMENT, RDOBJ.CHECK_VISIBILITY,
          'N', RDOBJ.COMMENTS, RDOBJ.ACTIVE, RDOBJ.SEQUENCE, RDOBJ.VIS_STRENGTH,
          RDOBJ.REL_VIS_STRENGTH, RDOBJ.VIS_EVT_COLS
     FROM ORAPERF.S_DOCK_REL_DOBJ RDOBJ, ORAPERF.S_DOCK_OBJECT DOBJ
    WHERE RDOBJ.REPOSITORY_ID = (SELECT ROW_ID FROM ORAPERF.S_REPOSITORY WHERE NAME = ?)
      AND DOBJ.ROW_ID = RDOBJ.DOCK_ID
      AND (DOBJ.INACTIVE_FLG = 'N' OR DOBJ.INACTIVE_FLG IS NULL)
      AND (RDOBJ.INACTIVE_FLG = 'N' OR RDOBJ.INACTIVE_FLG IS NULL)

Message: Error: An ODBC error occurred,
 Additional Message: Function: DICGetRDObjects; ODBC operation: SQLFetch

Message: GEN-13,
 Additional Message: dict-ERR-1109: Unable to read value from export file (UTLCompressFRead (fseek)).

Message: GEN-13,
 Additional Message: dict-ERR-1107: Unable to read row 0 from export file (UTLDataValRead pBuf, col 0 ).

Message: GEN-10,
 Additional Message: Calling Function: DICLoadDObjectInfo; Called Function: Calling DICGetRDObjects

Message: GEN-10,
 Additional Message: Calling Function: DICLoadDict; Called Function: DICLoadDObjectInfo

GenericError
(srpdb.cpp (860) err=3006 sys=2) SBL-GEN-03006: Error calling function: DICFindTable m_pReqTbl
(srpsmech.cpp (74) err=3006 sys=0) SBL-GEN-03006: Error calling function: DICFindTable m_pReqTbl
(srpmtsrv.cpp (107) err=3006 sys=0) SBL-GEN-03006: Error calling function: DICFindTable m_pReqTbl
(smimtsrv.cpp (1203) err=3006 sys=0) SBL-GEN-03006: Error calling function: DICFindTable m_pReqTbl
SmiLayerLog Error       Terminate process due to unrecoverable error: 3006. (Main Thread)

An inconsistent or corrupted dictionary file "diccache.dat" is likely the cause.

Solution:

  • Stop the application server and manually kill the remaining Siebel application specific processes

    eg.,

    stop_server all
    
    pkill siebmtsh
    pkill siebproc
    ..
    
  • Remove $SIEBEL_HOME/bin/diccache.dat file. It will be re-generated during the application server startup

  • Start the application server
    start_server all
    

Monday Oct 10, 2011

Oracle Database on NFS : Resolving "ORA-27086: unable to lock file - already in use" Error

Some Context

Oracle database was hosted on ZFS Storage Appliance (NAS). The database files are accessible from the database server node via NFS mounted filesystems. Solaris 10 is the operating system on DB node.

Someone forgets to shutdown the database instance and unmount the remote filesystems before rebooting the database server node. After the system boots up, Oracle RDBMS fails to bring up the database due to locked-out data files.

eg.,

SQL> startup
ORACLE instance started.

Total System Global Area 1.7108E+10 bytes
Fixed Size		    2165208 bytes
Variable Size		 9965671976 bytes
Database Buffers	 6845104128 bytes
Redo Buffers		  295329792 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 1 - see DBWR trace file
ORA-01110: data file 1: '/orclvol4/entDB/system01.dbf'

======================
Extract from alert log:
======================

...
ALTER DATABASE OPEN
Fri Aug 05 21:30:54 2011
Errors in file /oracle112/diag/rdbms/entdb/entDB/trace/entDB_dbw0_7235.trc:
ORA-01157: cannot identify/lock data file 1 - see DBWR trace file
ORA-01110: data file 1: '/orclvol4/entDB/system01.dbf'
ORA-27086: unable to lock file - already in use
SVR4 Error: 11: Resource temporarily unavailable
Additional information: 8
Additional information: 21364
Errors in file /oracle112/diag/rdbms/entdb/entDB/trace/entDB_dbw0_7235.trc:
ORA-01157: cannot identify/lock data file 2 - see DBWR trace file
ORA-01110: data file 2: '/orclvol4/entDB/sysaux01.dbf'
ORA-27086: unable to lock file - already in use
SVR4 Error: 11: Resource temporarily unavailable
Additional information: 8
Additional information: 21364
...

Reason for the lock failure:

Because of the sudden ungraceful shutdown of the database, file locks on data files were not released by the NFS server (ZFS SA in this case). NFS server held on to the file locks even after the NFS client (DB server node in this example) was restarted. Due to this, Oracle RDBMS is not able to lock those data files residing on NFS server (ZFS SA). As a result, database instance was failed to start up in exclusive mode.

Workaround

Manually clear the NFS locks as outlined below.

On NFS Client (database server node):

  1. Shutdown the mounted database
  2. Unmount remote (NFS) filesystems
  3. Execute: clear_locks -s <nfs_server_host>

    eg.,

    # clear_locks -s sup16
    Clearing locks held for NFS client ipsedb1 on server sup16
    clear of locks held for ipsedb1 on sup16 returned success
    

On NFS Server (ZFS SA):
    (this step may not be necessary but wouldn't hurt to perform)

  1. Execute: clear_locks <nfs_client_host>

    eg.,

    sup16# clear_locks 10.129.207.93
    Clearing locks held for NFS client 10.129.207.93 on server sup16
    clear of locks held for 10.129.207.93 on sup16 returned success
    

Again back on NFS Client (database server node):

  1. Restart NFS client
        (this step may not be necessary but wouldn't hurt to perform)
    # svcadm -v disable nfs/client
    # svcadm -v enable nfs/client
    
  2. Mount remote/NFS filesystems
  3. Finally start the database

Also see:
Listing file locks on Solaris 10

Thursday Oct 06, 2011

Siebel Connection Broker Load Balancing Algorithm

Siebel server architecture supports spawning multiple application object manager processes. The Siebel Connection Broker, SCBroker, tries to balance the load (incoming requests) across different object manager processes running in a single Siebel server.

Least Loaded or Round Robin?

By default, SCBroker forwards the incoming request to any object manager process that is least loaded - meaning the process with the least number of running tasks. In Siebel terminology, this behavior is referred as "least-loaded" or "LL" connection forwarding algorithm. While the default LL algorithm provides the optimal behavior in the best case scenarios, it may lead to serious availability problems if one of several object manager prcesses running in a Siebel server stops responding in a timely fashion [for some reason]. Such an object manager may still accept requests though it may timeout. At some point, the unresponsive/hung or erroneous object manager will have the least number of tasks that may prompt SCBroker component to forward new incoming requests to that object manager process - which in turn leads to a stalemate. To avoid such situations, it is recommended to configure "round-robin" or "RR" algorithm in SCBroker component. When round-robin algorithm is configured, SCBroker ignores the number of running tasks per object manager process and routes all requests to all object managers in a round robin fashion.

While both algorithms have their strengths and weaknesses, customers must weigh both options and choose the one that fits best in their deployment.

eg.,

Find the current load balancing algorithm:

srvrmgr>  list advanced param ConnForwardAlgorithm for comp SCBroker \
             show PA_ALIAS, PA_VALUE, PA_NAME

PA_ALIAS              PA_VALUE  PA_NAME                                    
--------------------  --------  -----------------------------------------  
ConnForwardAlgorithm  LL        Connection Forward algorithm for SCBroker

Configure SCBroker to use round-robin algorithm:

srvrmgr> change param ConnForwardAlgorithm=RR for comp SCBroker server SERVER_NAME
Command completed successfully.

srvrmgr> list advanced param ConnForwardAlgorithm for comp SCBroker \
            show PA_ALIAS, PA_VALUE, PA_NAME

PA_ALIAS              PA_VALUE  PA_NAME                                    
--------------------  --------  -----------------------------------------  
ConnForwardAlgorithm  RR        Connection Forward algorithm for SCBroker

Other SCBroker parameters of interest: ConnForwardTimeout and ConnRequestTimeout

Saturday Sep 10, 2011

Oracle RDBMS : Generic Large Object (LOB) Performance Guidelines

This blog post is generic in nature and based on my recent experience with a content management system where securefile BLOBs are critical in storing and retrieving the checked in content. It is stro ngly suggested to check the official documentation in addition to these brief guidelines. In general, Oracle Database SecureFiles and Large Objects Developer's Guide 11g Release 2 (11.2) is a good starting point when creating tables involving SecureFiles and LOBs.

Guidelines

  • Tablespace: create the LOB in a different tablespace isolated from the rest of the database
  • Block size: consider larger block size (default 8 KB) if the expected size of the LOB is big
  • Chunk size: consider larger chunk size (default 8 KB) if larger LOBs are expected to be stored and retrieved
  • Inline or Out-of-line: choose "DISABLE STORAGE IN ROW" (out-of-line) if the average LOB size is expected to be > 4 KB. The default inlining is fine for smaller LOBs
  • CACHE or NOCACHE: consider bypassing the database buffer cache (NOCACHE) if large number of LOBs are stored and not expected to be retrieved frequently
  • COMPRESS or NOCOMPRESS: choose COMPRESS option if storage capacity is a concern and a constraint. It saves disk space at the expense of some performance overhead. In a RAC database environment, it is recommended to compress the LOBs to reduce the interconnect traffic
  • De-duplication: by default, duplicate LOBs are stored as a separate copy in the database. Choosing DEDUPLICATE option enables sharing the same data blocks for similar files thus reducing storage overhead and simplifying storage management
  • Partitioning: consider partitioning the parent table to maximize application performance. Hash partitioning is one of the options if there is no potential partition key in the table
  • Zero-Copy I/O protocol: turned on by default. Turning it off in a RAC database environment could be beneficial. Set the initialization parameter _use_zero_copy_io=FALSE to turn o ff the Zero-Copy I/O protocol
  • Shared I/O pool: database uses the shared I/O pool to perform large I/O operations on securefile LOBs. The shared I/O pool uses shared memory segments. If this pool is not large enough or if there is not enough memory available in this pool for a securefile LOB I/O operation, Oracle uses a portion of PGA until there is sufficient memory available in the shared I/O pool. Hence it is recommen ded to size the shared I/O pool appropriately by monitoring the database during the peak activity. Relevant initialization parameters: _shared_io_pool_size and _shared_iop_max_size

Also see:
Oracle Database Documentation : LOB Performance Guidelines

Saturday Aug 27, 2011

Oracle 11g: Travel back in time with the Database Flashback

Error recovery, historical reporting, trend analysis, data forensics and fraud detection are just some of the business problems that can be solved by using the Flashback Data Archive feature in Oracle 11g database. The Flashback option can be enabled for the entire database or for a selected set of tables. It can be enabled in the database with no application changes.

At work I usually run performance tests by starting with a clean copy of the database. I analyze the test results at the end of the test, determine the next course of action (tuning), restore the clean copy of the database from a backup, apply the tuning and re-run the performance test. It goes on in a cycle until I'm happy with the overall test result. In some cases especially with large data sets, restoring the database from a backup becomes one of the time consuming tasks. In such situations, using the database flashback to go back to a previously saved restore point saves quite a bit of time. Rest of this blog post demonstrates how to enable database flashback and to go back to a specified restore point. Check Oracle Total Recall with Oracle Database 11g Release 2 white paper for more information on Flashback Data Archive (FDA).

Objective

Revert the entire database to a previously saved state at will

Steps to perform

  • Configure the following initialization parameters: db_recovery_file_dest & db_recovery_file_dest_size
  • Enable Archive Log mode
  • Enable database Flashback option
  • Create a restore point. Decide whether to create a normal or a guaranteed restore point
    --------------------------------------------------------------------------------------------------------
  • Finally flashback database to the created restore point when required

Be aware that there will be some performance and storage overhead in using the database flashback. Evaluate all your options carefully before configuring database flashback.

Example

The following example uses guaranteed restore point to flashback the database in a two-node RAC environment. Most of the example is self-explanatory.

% srvctl status database -d DEMO
Instance DEMO1 is running on node racnode01
Instance DEMO2 is running on node racnode02

/* stop all the database instances except one (anyone) in RAC config */

% srvctl stop instance -d DEMO -i DEMO2

% export ORACLE_SID=DEMO1

/* put one of the instances in non-cluster mode */

% sqlplus / as sysdba
SQL> alter system set cluster_database=false scope=spfile;

% srvctl stop instance -d DEMO -i DEMO1

% sqlplus / as sysdba
SQL> startup mount

/* enable archive log mode */

SQL> alter database archivelog;

SQL> archive log list
Database log mode	       Archive Mode
Automatic archival	       Enabled
Archive destination	       USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     2
Next log sequence to archive   4
Current log sequence	       4

SQL> show parameter db_recovery_file

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
db_recovery_file_dest		     string	 +FRA
db_recovery_file_dest_size	     big integer 512G

/* enable flashback option */

SQL> alter database flashback on;

SQL> select flashback_on from v$database;

FLASHBACK_ON
------------------
YES

/* put the instance back in cluster mode and restart the database */

SQL> alter system set cluster_database=true scope=spfile;

SQL> alter database open;

% srvctl stop instance -d DEMO -i DEMO1

% srvctl start database -d DEMO

/* create a guaranteed restore point */

% sqlplus / as sysdba
SQL> create restore point demo_clean_before_test guarantee flashback database;

Restore point created.

SQL> column NAME format A25
SQL> column TIME format A40
SQL> set lines 120
SQL> select NAME, SCN, TIME, GUARANTEE_FLASHBACK_DATABASE, STORAGE_SIZE 
  2  from V$RESTORE_POINT where GUARANTEE_FLASHBACK_DATABASE='YES';

NAME				 SCN TIME		              GUA STORAGE_SIZE
------------------------- ---------- -------------------------------- --- ------------
DEMO_CLEAN_BEFORE_TEST     17460960 21-AUG-11 01.01.20.000 AM	      YES     67125248

/* flashback database to the saved restore point */

% srvctl stop database -d DEMO

% export ORACLE_SID=DEMO1

% rman TARGET /

RMAN> STARTUP MOUNT;
RMAN> FLASHBACK DATABASE TO RESTORE POINT 'DEMO_CLEAN_BEFORE_TEST';

Starting flashback at 21-AUG-11
using channel ORA_DISK_1

starting media recovery
media recovery complete, elapsed time: 00:00:25

Finished flashback at 21-AUG-11

RMAN> ALTER DATABASE OPEN RESETLOGS;

database opened

RMAN> SHUTDOWN IMMEDIATE;

% srvctl start database -d DEMO

/* ============================================================================== */

/* alternatively run the following RMAN script as shown below */

% cat restore.rman
RUN {
        STARTUP MOUNT;
        FLASHBACK DATABASE TO RESTORE POINT 'DEMO_CLEAN_BEFORE_TEST';
        ALTER DATABASE OPEN RESETLOGS;
        SHUTDOWN IMMEDIATE;
}

EXIT

% rman TARGET / cmdfile=restore.rman

Note:
It is not mandatory to enable logging for flashback database in order to create and use restore points. The requirement in such a case is to put the database in ARCHIVELOG mode and creating the first guaranteed restore point when the database is in mounted state.

Friday Jul 01, 2011

Be Eco-Responsible. Use Resource Management features in Solaris, Oracle Database and Go Green

This blog post is about a white paper that introduces the resource management features in Oracle Solaris and Oracle Database.

Who needs more hardware when under-utilized systems are sitting idle in datacenters consuming valuable space, energy and polluting environment with carbon emissions. These days virtualization and con solidation are more popular than ever before. And then there is Oracle Corporation with great products consisting rich features for resource manageability in consolidated, virtualized and isolated envir onments. Put together all these pieces what do we get? - a complete green solution that helps make this planet a better place.

In an attempt to increase the awareness of the resource management features available in Oracle Solaris and Oracle Database, we just published a four-part white paper surrounding Oracle Solaris Resource Manager and Oracle Database Resource Manager. Those resource managers are not really a product, process or an add-on to Solaris and Oracle Database. "Resource Manager" is the abstract term for the set of software modules that are integrated into the Solaris operating system and Oracle database management system to facilitate resource management from the OS and RDBMS perspective.

The first part of the series introduces: the concept of resource management, Solaris Resource Manager and Oracle Database Resource Manager; compares and contrasts both of those resource managers andends the discussion with a brief overview of resource management in high-availability (HA) environments. Access this paper from the following URL:

        Part 1: Introduction to Resource Management in Oracle Solaris and Oracle Database

The second part is all about the available resource management features in Oracle Solaris. This includes resource management in virtualized environments too. The range of topics vary from the simple process binding to the dynamic reconfiguration of CPU and memory resources in a logical domain (LDom). The value of this paper lies in the volume of examples that follow each of the introduced feature. Access the second part from the following URL:

        Part 2: Effective Resource Management Using Oracle Solaris Resource Manager

Similar to the second part, the third part talks about the available resource management features in Oracle Database Management System. Majority of the discussion revolves around creating resource pl ans in a Oracle database. This part also contains plenty of examples. Access the third part from the following URL:

        Part 3: Effective Resource Management Using Oracle Database Resource Manager

The final part briefly introduces various hardware/software products that makes Oracle Corporation the ideal consolidation platform. Also an imaginary case study was presented to demonstrate how to consolidate multiple applications and databases onto a single server using Oracle virtualization technologies and the resource management features found in Oracle Solaris and Oracle Database. Description of the consolidation plan and the implementation of the plan takes up major portion of this final part. Access the fourth and final part of this series from the following URL:

        Part 4: Resource Management Case Study for Mixed Workloads and Server Sharing

Happy reading.

Friday May 27, 2011

PeopleSoft Application Server : Binding JSL Port to Multiple IP Addresses

(Pardon the formatting. Legible copy of this blog post is available at:
http://technopark02.blogspot.com/2011/05/peoplesoft-application-server-binding.html)

For the impatient:

On any multi-homed1 host, replace %PS_MACH% variable in "Jolt Listener" section of the application server domains' psappsrv.cfg file wih the special IP address "0.0.0.0" to get the desired effect. It enables TCP/IP stack to listen on all available network interfaces on the system. In other words, if JSL is listening on 0.0.0.0 on a multi-homed system, PIA traffic can flow using any of the IP addresses assigned to that system.

For the rest:

A little background first.

PeopleSoft application server relies on Jolt, a companion product that co-exists with Tuxedo, to handle all web requests. That is, Jolt is the bridge between PeopleSoft application server and the web server (any supported one) that facilitates web communication. Tuxedo helps schedule PeopleSoft application server processes to perform the actual transactions. When the application server is booted up, Jolt server listener (JSL) is bound to a pre-configured port number and is actively monitored for incoming web requests. On the other hand, web server instance(s) are made aware of the existence of all Jolt listeners in a PeopleSoft Enterprise by configuring the hostname:port# pairs in each of the web domain's configuration.properties file.

By default the variable %PS_MACH% in each of the application server domain configuration file, psappsrv.cfg, gets resolved to the hostname of the system during application server boot-up time. The following example demonstrates that.

eg.,

/* Application server configuration file */
% cat psappsrv.cfg
..
[JOLT Listener]
Address=%PS_MACH%
Port=9000
..

/* Boot up the application server domain */
% psadmin -c boot -d HRHX
..
exec JSL -A -- -d /dev/tcp -n //ben01:9000 -m 100 -M 100 -I 5 -j ANY -x 20 -S 10 -c 1000000 -w JSH :
        process id=20077 ... Started.
..

% hostname
ben01

% netstat -a | grep 9000
ben01.9000                 *.*                0      0 49152      0 LISTEN

% netstat -an | grep 9000
17.16.221.106.9000          *.*                0      0 49152      0 LISTEN

% ifconfig -a
lo0: flags=2001000849 mtu 8232 index 1
 inet 127.0.0.1 netmask ff000000 
bge0: flags=1000843 mtu 1500 index 2
 inet 17.16.221.106 netmask ffffff00 broadcast 17.16.221.255
bge1: flags=1000843 mtu 1500 index 3
 inet 18.1.1.1 netmask ffffff00 broadcast 18.1.1.255
e1000g0: flags=1000843 mtu 1500 index 4
 inet 18.1.1.201 netmask ffffff00 broadcast 18.1.1.255

% telnet 17.16.221.106 9000
Trying 17.16.221.106...
Connected to 17.16.221.106.
Escape character is '^]'.

% telnet 18.1.1.1 9000
Trying 18.1.1.1...
telnet: Unable to connect to remote host: Connection refused

% telnet 18.1.1.201 9000
Trying 18.1.1.201...
telnet: Unable to connect to remote host: Connection refused

Notice that %PS_MACH% was replaced by the actual hostname and the Jolt listener created the server socket using the IP address 17.16.221.106 and port number 9000. From the outputs of netstat, ifconfig and telnet, it is apparent that "bge0" is the only network interface that is being used by the Jolt listener. It means web server can communicate to JSL using the IP address 17.16.221.106 over port 9000 but not using any of the other two IP addresses 18.1.1.1 or 18.1.1.201. This is the default behavior.

However some customers may wish to have the ability to connect to the application services from different/multiple networks. This is possible in case of multi-homed systems -- servers with multiple network interfaces that are connected to a single or multiple networks. For example, such a host could be part of a public network, a private network where only those clients that can communicate over private links can connect or an InfiniBand network, a low latency high throughput network. The default behavior of JSL can be changed by using a special IP address "0.0.0.0" in place of the variable %PS_MACH% in application server domains' configuration file. The IP address 0.0.0.0 hints the Jolt listener (JSL) to listen on all available IPv4 network interfaces on the system. (I read somewhere that "::0" is the equivalent for IPv6 interfaces. Didn't get a chance to test it out yet). The following example demonstrates how the default behavior changes with the IP address 0.0.0.0.

% cat psappsrv.cfg
..
[JOLT Listener]
Address=0.0.0.0
Port=9000
..

/* Update the binary configuration by reloading the config file */
% psadmin -c configure -d HRHX

% psadmin -c boot -d HRHX
..
exec JSL -A -- -d /dev/tcp -n //0.0.0.0:9000 -m 100 -M 100 -I 5 -j ANY -x 20 -S 10 -c 1000000 -w JSH :
 process id=20874 ... Started.

% netstat -a | grep 9000
      *.9000               *.*                0      0 49152      0 LISTEN

% telnet 17.16.221.106 9000
Trying 17.16.221.106...
Connected to 17.16.221.106.
..

% telnet 18.1.1.1 9000
Trying 18.1.1.1...
Connected to 18.1.1.1.
..

% telnet 18.1.1.201 9000
Trying 18.1.1.201...
Connected to 18.1.1.201.

Footnote:
[1] Any system with more than one interface is considered a multi-homed host

About

Benchmark announcements, HOW-TOs, Tips and Troubleshooting

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today