And I had no answer for him. I couldn't blame him. CIOs want to know this kind of information. Better still, he would have liked a pie chart depicting it like this:
I wish..
Well, for once, it would have been nice if Oracle 9i or 10g kept the historical startup or shutdown information in the v$ or some dba_* tables. A simple query off the table would have got me the answer.Anyways, it set me thinking. There are always other ways to get the same information. Statspack or AWR was one possibility, but I was not sure if it really gathers the details information about when the instance was shutdown or started up (historically) -- they sure are aware if there was an instance restart between two snaps.
An Aha moment..
But wait, the database alert log has information about each startup and shutdown! So if we could mine the alert log for the right regular expressions, and then find the time difference between time stamps, it could be done.This method would not give you the overall downtime for the production instance, including the downtime for the middle tiers or Apache web server, but the same idea could probably be extended for the other services, but in this article, the scope is just the database. There is an auxiliary script (get_epoch.sh) supplied here that would be useful in this quest.
Auxiliary script: get_epoch.sh
Also available for download here.#!/usr/bin/perl
#
# The format of input date is: Thu Jun 5 21:15:48 2008
#
# NOTE: The format of `date` command is: Thu Jun 5 21:15:48 EDT 2008
# -- it has the timezone in the output as well
# BUT this script does not assume that since the timestamps in
# alert log dont have the timezone in it
#
# This script heavily uses this function to convert a UTC timestamp
# into seconds after 1 jan 1970:
# timelocal($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst);
use Time::Local;
my $wday = $ARGV[0];
my $month = $ARGV[1];
# convert the month shortname into 0-11 number
if ( $month eq "Jan" ) { $mon = 0 }
elsif ( $month eq "Feb" ) { $mon = 1 }
elsif ( $month eq "Mar" ) { $mon = 2 }
elsif ( $month eq "Apr" ) { $mon = 3 }
elsif ( $month eq "May" ) { $mon = 4 }
elsif ( $month eq "Jun" ) { $mon = 5 }
elsif ( $month eq "Jul" ) { $mon = 6 }
elsif ( $month eq "Aug" ) { $mon = 7 }
elsif ( $month eq "Sep" ) { $mon = 8 }
elsif ( $month eq "Oct" ) { $mon = 9 }
elsif ( $month eq "Nov" ) { $mon = 10 }
elsif ( $month eq "Dec" ) { $mon = 11 };
my $mday = $ARGV[2];
# initialize time varialble and split hours (24 hr format), minutes, seconds into an array
my $time = $ARGV[3];
@time = split /:/, $time;
# if the timezone is left out of the input, the position of year becomes 5th in ARGV
my $year = $ARGV[4];
#######################################################################################
# I found that by excluding $wday, the seconds results (EPOCH) is more
# accurate, so $wday parameter has been omitted from the call to
# timelocal() function.
#######################################################################################
$epoch= timelocal($time[2], $time[1], $time[0], $mday, $mon, $year);
print "$epoch\n";
The main script..
Due to formatting issues, the main script is available for download here.Sample Usage and output..
I realized that it would probably make more sense to have an optional cutoff date to calculate the downtime from, so that was added to the version 2 of the script. The version 1 which calculates the downtime from the first database startup time is uploaded here.sandbox:sandbox> ./calculate_downtime.sh
Correct Usage: calculate_downtime.sh alertlogfilepath [cutoff_date in format Sat Jun 7 08:49:34 2008]
sandbox:sandbox> ./calculate_downtime.sh $DATA_DIR/admin/bdump/alert*.log Fri Mar 28 15:20:59 2008
Cutoff date is : Fri Mar 28 15:20:59 2008
Shutdown times:
Timestamp -- epoch (seconds)
Wed Jan 9 17:53:08 2008 - 1199919188
Wed Jan 16 12:05:09 2008 - 1200503109
Fri Jan 18 11:19:42 2008 - 1200673182
Thu Jan 24 17:34:15 2008 - 1201214055
Fri Feb 15 09:00:44 2008 - 1203084044
Wed Feb 20 16:50:14 2008 - 1203544214
Wed Mar 12 12:43:26 2008 - 1205340206
Fri Mar 28 15:21:59 2008 - 1206732119
Thu Apr 3 11:03:52 2008 - 1207235032
Thu Apr 3 11:10:20 2008 - 1207235420
Thu Apr 3 11:15:44 2008 - 1207235744
Thu Apr 3 11:22:38 2008 - 1207236158
Thu Apr 3 11:27:36 2008 - 1207236456
Thu Apr 3 11:34:35 2008 - 1207236875
Thu Apr 3 11:41:36 2008 - 1207237296
Mon May 12 14:17:13 2008 - 1210616233
Thu Jun 5 10:36:58 2008 - 1212676618
shutdown_counter=17
Startup times:
Timestamp -- epoch (seconds)
Wed Jan 9 17:50:42 2008 -- 1199919042
Thu Jan 10 09:43:18 2008 -- 1199976198
Thu Jan 17 12:00:03 2008 -- 1200589203
Fri Jan 18 11:26:13 2008 -- 1200673573
Wed Jan 30 12:19:21 2008 -- 1201713561
Tue Feb 19 22:57:38 2008 -- 1203479858
Wed Mar 12 12:39:03 2008 -- 1205339943
Mon Mar 24 13:44:20 2008 -- 1206380660
Thu Apr 3 11:00:33 2008 -- 1207234833
Thu Apr 3 11:07:12 2008 -- 1207235232
Thu Apr 3 11:14:01 2008 -- 1207235641
Thu Apr 3 11:20:54 2008 -- 1207236054
Thu Apr 3 11:25:25 2008 -- 1207236325
Thu Apr 3 11:31:53 2008 -- 1207236713
Thu Apr 3 11:40:18 2008 -- 1207237218
Tue Apr 29 16:50:49 2008 -- 1209502249
Mon Jun 2 14:20:38 2008 -- 1212430838
Thu Jun 5 10:38:39 2008 -- 1212676719
startup_counter=18
As per the alert log, The instance is currently up
Here are the downtime windows ...
Wed Jan 9 17:50:42 2008 -- Wed Jan 9 17:53:08 2008
Thu Jan 10 09:43:18 2008 -- Wed Jan 16 12:05:09 2008
Thu Jan 17 12:00:03 2008 -- Fri Jan 18 11:19:42 2008
Fri Jan 18 11:26:13 2008 -- Thu Jan 24 17:34:15 2008
Wed Jan 30 12:19:21 2008 -- Fri Feb 15 09:00:44 2008
Tue Feb 19 22:57:38 2008 -- Wed Feb 20 16:50:14 2008
Wed Mar 12 12:39:03 2008 -- Wed Mar 12 12:43:26 2008
Mon Mar 24 13:44:20 2008 -- Fri Mar 28 15:21:59 2008
Thu Apr 3 11:00:33 2008 -- Thu Apr 3 11:03:52 2008
Thu Apr 3 11:07:12 2008 -- Thu Apr 3 11:10:20 2008
Thu Apr 3 11:14:01 2008 -- Thu Apr 3 11:15:44 2008
Thu Apr 3 11:20:54 2008 -- Thu Apr 3 11:22:38 2008
Thu Apr 3 11:25:25 2008 -- Thu Apr 3 11:27:36 2008
Thu Apr 3 11:31:53 2008 -- Thu Apr 3 11:34:35 2008
Thu Apr 3 11:40:18 2008 -- Thu Apr 3 11:41:36 2008
Tue Apr 29 16:50:49 2008 -- Mon May 12 14:17:13 2008
Mon Jun 2 14:20:38 2008 -- Thu Jun 5 10:36:58 2008
Thu Jun 5 10:38:39 2008 --
Downtime 1 : Wed Jan 9 17:53:08 2008 (1199919188) --> Thu Jan 10 09:43:18 2008 (1199976198) = 57010 seconds
the cutoff date Fri Mar 28 15:20:59 2008 is < than shutdown time Wed Jan 9 17:53:08 2008 - so not accruing
Running Cumulative downtime = 0 seconds
Downtime 2 : Wed Jan 16 12:05:09 2008 (1200503109) --> Thu Jan 17 12:00:03 2008 (1200589203) = 86094 seconds
the cutoff date Fri Mar 28 15:20:59 2008 is < than shutdown time Wed Jan 16 12:05:09 2008 - so not accruing
Running Cumulative downtime = 0 seconds
Downtime 3 : Fri Jan 18 11:19:42 2008 (1200673182) --> Fri Jan 18 11:26:13 2008 (1200673573) = 391 seconds
the cutoff date Fri Mar 28 15:20:59 2008 is < than shutdown time Fri Jan 18 11:19:42 2008 - so not accruing
Running Cumulative downtime = 0 seconds
Downtime 4 : Thu Jan 24 17:34:15 2008 (1201214055) --> Wed Jan 30 12:19:21 2008 (1201713561) = 499506 seconds
the cutoff date Fri Mar 28 15:20:59 2008 is < than shutdown time Thu Jan 24 17:34:15 2008 - so not accruing
Running Cumulative downtime = 0 seconds
Downtime 5 : Fri Feb 15 09:00:44 2008 (1203084044) --> Tue Feb 19 22:57:38 2008 (1203479858) = 395814 seconds
the cutoff date Fri Mar 28 15:20:59 2008 is < than shutdown time Fri Feb 15 09:00:44 2008 - so not accruing
Running Cumulative downtime = 0 seconds
Downtime 6 : Wed Feb 20 16:50:14 2008 (1203544214) --> Wed Mar 12 12:39:03 2008 (1205339943) = 1795729 seconds
the cutoff date Fri Mar 28 15:20:59 2008 is < than shutdown time Wed Feb 20 16:50:14 2008 - so not accruing
Running Cumulative downtime = 0 seconds
Downtime 7 : Wed Mar 12 12:43:26 2008 (1205340206) --> Mon Mar 24 13:44:20 2008 (1206380660) = 1040454 seconds
the cutoff date Fri Mar 28 15:20:59 2008 is < than shutdown time Wed Mar 12 12:43:26 2008 - so not accruing
Running Cumulative downtime = 0 seconds
Downtime 8 : Fri Mar 28 15:21:59 2008 (1206732119) --> Thu Apr 3 11:00:33 2008 (1207234833) = 502714 seconds
the cutoff date Fri Mar 28 15:20:59 2008 is > shutdown time Fri Mar 28 15:21:59 2008 - greater than cutoff, so accruing
Running Cumulative downtime = 502714 seconds
Downtime 9 : Thu Apr 3 11:03:52 2008 (1207235032) --> Thu Apr 3 11:07:12 2008 (1207235232) = 200 seconds
the cutoff date Fri Mar 28 15:20:59 2008 is > shutdown time Thu Apr 3 11:03:52 2008 - greater than cutoff, so accruing
Running Cumulative downtime = 502914 seconds
Downtime 10 : Thu Apr 3 11:10:20 2008 (1207235420) --> Thu Apr 3 11:14:01 2008 (1207235641) = 221 seconds
the cutoff date Fri Mar 28 15:20:59 2008 is > shutdown time Thu Apr 3 11:10:20 2008 - greater than cutoff, so accruing
Running Cumulative downtime = 503135 seconds
Downtime 11 : Thu Apr 3 11:15:44 2008 (1207235744) --> Thu Apr 3 11:20:54 2008 (1207236054) = 310 seconds
the cutoff date Fri Mar 28 15:20:59 2008 is > shutdown time Thu Apr 3 11:15:44 2008 - greater than cutoff, so accruing
Running Cumulative downtime = 503445 seconds
Downtime 12 : Thu Apr 3 11:22:38 2008 (1207236158) --> Thu Apr 3 11:25:25 2008 (1207236325) = 167 seconds
the cutoff date Fri Mar 28 15:20:59 2008 is > shutdown time Thu Apr 3 11:22:38 2008 - greater than cutoff, so accruing
Running Cumulative downtime = 503612 seconds
Downtime 13 : Thu Apr 3 11:27:36 2008 (1207236456) --> Thu Apr 3 11:31:53 2008 (1207236713) = 257 seconds
the cutoff date Fri Mar 28 15:20:59 2008 is > shutdown time Thu Apr 3 11:27:36 2008 - greater than cutoff, so accruing
Running Cumulative downtime = 503869 seconds
Downtime 14 : Thu Apr 3 11:34:35 2008 (1207236875) --> Thu Apr 3 11:40:18 2008 (1207237218) = 343 seconds
the cutoff date Fri Mar 28 15:20:59 2008 is > shutdown time Thu Apr 3 11:34:35 2008 - greater than cutoff, so accruing
Running Cumulative downtime = 504212 seconds
Downtime 15 : Thu Apr 3 11:41:36 2008 (1207237296) --> Tue Apr 29 16:50:49 2008 (1209502249) = 2264953 seconds
the cutoff date Fri Mar 28 15:20:59 2008 is > shutdown time Thu Apr 3 11:41:36 2008 - greater than cutoff, so accruing
Running Cumulative downtime = 2769165 seconds
Downtime 16 : Mon May 12 14:17:13 2008 (1210616233) --> Mon Jun 2 14:20:38 2008 (1212430838) = 1814605 seconds
the cutoff date Fri Mar 28 15:20:59 2008 is > shutdown time Mon May 12 14:17:13 2008 - greater than cutoff, so accruing
Running Cumulative downtime = 4583770 seconds
Downtime 17 : Thu Jun 5 10:36:58 2008 (1212676618) --> Thu Jun 5 10:38:39 2008 (1212676719) = 101 seconds
the cutoff date Fri Mar 28 15:20:59 2008 is > shutdown time Thu Jun 5 10:36:58 2008 - greater than cutoff, so accruing
Running Cumulative downtime = 4583871 seconds
Calculating lifetime of instance as per criteria specified..
Starting time being used = Fri Mar 28 15:20:59 2008 -- 1206732059
Ending time epoch = Sat Jun 7 11:49:49 2008 -- 1212853789
Total lifetime in seconds = 6121730
Beginning Fri Mar 28 15:20:59 2008, The instance was down 74 % of the time

Comments (3)
And why not using grid control? You can even manage SLAs there.
Posted by Fernando Jose | June 7, 2008 9:40 AM
Posted on June 7, 2008 09:40
Grid Control 10g has a seeded report "Availability History (Target)" which shows the availability summary, chart and details for the target.
Posted by Binoy Chakraborty | June 11, 2008 10:51 AM
Posted on June 11, 2008 10:51
Good start Gaurav. Based on previous comments add Grid Monitoring for Applications (11i/R12) and show downtime for Apache, Forms or even CM (apart from DB)
Atul
http://onlineappsdba.com
Posted by Atul | November 24, 2008 7:40 AM
Posted on November 24, 2008 07:40