« April 2007 | Main | June 2007 »

May 2007 Archives

May 2, 2007

RAC with ASM on Linux, Crash Scenario: Data Disk Group Loss

On this 2nd scenario we experience a total database loss.

The test database is built with 2 disk groups, datadg that contains all tablespaces, redo logs and control files, and Archdg that contains all archived logs.

To be able to simulate the disk group loss we need to shutdown the database, otherwise deleting the files inside ASM is not possible, once the database is down all files on the Disk Group are deleted.

The recovery steps include:

  • Invoke rman, startup and restore the spfile
  • Shutdown and restart with the restored spfile
  • Restore the control file and mount the database
  • Restore and recover the database
  • Open the database first instance
  • Open the second instance and all RAC services
Crash scripts and full details of the test: Crash Scenario 2 - Data Disk Group Loss

Important! : Before running a crash scenario be sure to Backup the Database: Rman RAC on ASM Backup Script


May 6, 2007

RAC with ASM on Linux, Crash Scenario: OCR Loss

On this 3rd scenario we lose the OCR.

The OCR in this case is located on a Raw Device, so backup and recovery from and to a Raw Device is explained.

To simulate the OCR loss we write zeros to the raw device, this will cause CRS to fail.

The recovery steps include:

  • Stop all cluster components that remain up after the failure.
  • Restore the OCR from backup
  • Restart CRS
  • Restart all cluster components
  • Check
Crash scripts and full details of the test: Crash Scenario 3 - OCR Loss

Important! : Before running a crash scenario be sure to Backup OCR, details to backup or use the automatic generated backups are included in the same attached file



May 7, 2007

RAC with ASM on Linux, Crash Scenario: Voting Disk Loss

On this 4th scenario we lose the Voting Disk.

The Voting Disk in this case is located on a Raw Device, so backup and recovery from and to a Raw Device is explained.

To simulate the Voting Disk loss we write zeros to the raw device, this will cause CRS to fail, the database to crash and the server to get stuck.

The Voting Disk and the OCR can and should be multiplexed.

To multiplex a voting disk execute as root:

crsctl add css votedisk <path>

The recovery steps include:

  • Reboot both servers
  • Restore the Voting Disk from backup
  • Restart CRS
  • Restart all cluster components
  • Check
Crash script and full details of the test: Crash Scenario 4 - Voting Disk Loss

Important! : Before running a crash scenario be sure to Backup the Voting Disk, details to backup it are included on the attached file.

RAC with ASM on Linux, Crash Scenario: ASM Spfile Loss

On this 5th scenario we lose the ASM Spfile.

The ASM spfile  in this case is located on a Raw Device, so backup and recovery from and to a Raw Device is explained.

Backup and restore using create pfile from spfile and create spfile from pfile can also be used. In this case it will be necessary to shutdown the database and ASM instances, then start the ASM instance with the pfile and recreate the spfile.

To simulate the ASM spfile loss we write zeros to it's raw device; because the ASM spfile is accessed only at startup this loss does not affect online work with the database or ASM instances.

The recovery steps include:
  • Restore the ASM spfile  from backup using dd
  • Check
Crash script and full details of the test: Crash Scenario 4 - ASM Spfile Loss

Important! : Before running a crash scenario be sure to Backup the ASM Spfile, details to backup it are included on the attached file.

May 16, 2007

RAC with ASM on Linux, Crash Scenario: All Oracle Homes Loss

On this 6th scenario we lose all Oracle Homes on one node.

On this environment Oracle Homes are installed on this path: /oradisk/app01/oracle/product

These are the installed Oracle Homes:
  • ASM Home
  • RDBMS Home
  • CRS Home
The scenario is triggered using the following script:

cd /oradisk/app01
rm -rf * 


Recovery from such a loss is greatly simplified by having a good backup of the Oracle Homes.
In many sites you will not find such a backup, and if reinstalling the software is not an issue restoring all parameter files may be sometimes difficult.

The recovery steps include:
  • Check available backups
  • Restore ASM and RDBMS homes from a good backup
  • Reinstall Oracle Clusterware
  • Restore OCR from a backup from before the crash
  • Restart all RAC components
  • Check
Crash script and full details of the test: Crash Scenario 5 - Oracle Homes Loss

Important! : Before running this crash scenario be sure to Backup the Oracle Homes directories and OCR details to backup and recover them are included on the attached file.

May 17, 2007

Enterprise Manager Database Control Reconfiguration

Enterprise Manager Database Control is of great help for RAC environments administration.

For production environments it is advisable to setup a Grid Control dedicated server and have on it the EM repository.

For test and training purposes Database Control is excellent.

This link has a Guide to re-configure EM using Enterprise Manager Configuration Assistant (emca) when some misconfiguration makes Database Control incomplete or unavailable, removing it and reconfiguriong it is a quick option to get it back on work.

 

May 21, 2007

Clusterware Install hang when running root.sh on second node

Lately I've seen this behavior  twice, one on HPUX and another on Linux:

Oracle clusterware 10.2.0.1 install went on smoothly and then hang when running root.sh on the 2nd node, while displaying this message:

Startup will be queued to init within 90 seconds.

There were no errors logged on the second server, truss or strace showed the processes waiting...

On the first server we saw errors logged on $ORA_CRS_HOME/log/<servername>/alert<servername>.log pointing to a problem accessing the OCR raw device.

on HP the problem was solved configuring the raw devices using a whole LUN as described on the install guide.
on Linux the problem was solved pointing directly to the raw devices instead of soft links to them.


About May 2007

This page contains all entries posted to Alejandro Vargas' Blog in May 2007. They are listed from oldest to newest.

April 2007 is the previous archive.

June 2007 is the next archive.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type and Oracle