By Rene Kundersma on Jan 15, 2014
- The Exadata Database Machine Owners Guide (Chapter 7) has instructions to create backups stored outside of the dbserver, for example on an NFS mount (see 'Creating a Snapshot-Based Backup of Oracle Linux Database Server')
- dbserver_backup.sh - which creates a local copy of your active lvm.
In this post I will explain background and usage for both backups and how they integrate with dbnodeupdate.sh
For backing-up and rolling-back Exadata dbserver OS updates the dbserver_backup.sh script is used by dbnodeupdate.sh. For each upgrade by default the dbserver_backup.sh script is executed. When executed (either manually or via dbnodeupdate), the dbserver_backup.sh script creates a small snapshot of the 'active' sys lvm. The active sys lvm is the primary lvm that your current OS image is running on. For example:
[root@mynode ~]# imageinfo
Kernel version: 2.6.39-400.126.1.el5uek #1 SMP Fri Sep 20 10:54:38 PDT 2013 x86_64
Image version: 22.214.171.124.0.131014.1
Image activated: 2014-01-13 13:20:52 -0700
Image status: success
System partition on device: /dev/mapper/VGExaDb-LVDbSys2
In the above example the active lvm is /dev/mapper/VGExaDb-LVDbSys2.The snapshot is created to have a 'consistent' 'view' of the root filesystem while the backup is made. After the snapshot is created, it's mounted by the same script and then it's contents are copied over to the inactive lvm. For lvm enabled systems, there are always 2 'sys' lvm's "VGExaDb-LVDbSys1" and "VGExaDb-LVDbSys2". VGExaDb-LVDbSys2 will automatically be created (on lvm enabled system) if not existing yet. For the example above, the 'inactive' lvm will be VGExaDb-LVDbSys1
Now, depending on how many files there are in the root (/) filesystem (based on your active sys lvm) the backup times may vary. Previous Grid and Database home installation zip files in /opt/oracle.SupportTools/onecommand will make the backup take longer (not the restore, which I will explain why later). Same for those who have many small files (like mail messages in /var/spool) - the backup may take longer.
One of the first steps the dbnodeupdate.sh script will doing when executed is making a backup with this script. Now, if you want to shorten your downtime and make this backup before the start of your 'planned maintenance window' you have 2 options: Either execute the dbserver_backup.sh script yourself or use dbnodeupdate.sh with the "-b" flag to make a backup only before hand.
Example making a backup with dbnodeupdate.sh here (see 'Backup only' for 'Action')
When you then have the downtime for planned maintenance and already have the backup you can then let dbnodeupdate skip the backup using the "-n" flag.
Example skipping a backup with dbnodeupdate.sh here (See 'Create a backup: No')
Both Sys lvm's are 30GB each. The snapshot that will be created is ~1GB. It is recommended to keep this in mind when claiming the free space in the volume group to make your /u01 filesystem as big as possible. (the script checks for 2 GB free space in the volume group)
Now, when the update proceeds, the current active lvm will remain the active lvm. This is different than what happens on the cells where the active lvm becomes inactive with an update. Typically you will only switch active sys lvm's when a rollback needs to be done on a db server, for example, an upgrade from 126.96.36.199.0 to 188.8.131.52.0 needs to be rolled-back. What happens then is nothing more than 'switching' the filesystem label of the sys lvm's, updating grub (the bootloader) and restoring the /boot directory (backed up earlier also by dbnodeupdate.sh). Then, a next boot will now have the previous inactive lvm as active lvm.
Rolling back with dbnodepdate.sh as in the example here (a rollback from 184.108.40.206.1 to 220.127.116.11.2)
After booting the node, it's recommended to run dbnodeupdate.sh again with the "-c" flag to relink the oracle home's again.
- It's important, to make a new backup before attempting a new update.
- In the above example, there is only talk about the sys lvm's. This
means custom partitions including /u01 are not backed up. For regular
node updates this is enough to rollback to a previous release but it's
recommended to also have a backup of other filesystems inline to your
- Nodes deployed without lvm will not have this option available
- Rolling back db servers to previous Exadata releases with this procedure does not rollback the firmware
The backup made with the procedure in chapter 7 of the Oracle Exadatabase Database owners guide covers total node recovery. Like the dbserver_backup.sh procedure a snapshot is used for a consistent view, then in this scenario a copy is placed outside of the db server (via NFS in this example). This procedure allow you to backup every filesystem you require. In case of emergency - such as a non-bootable system, the node can be booted with the diagnostic iso. For non-customized partitions an interactive script will then question you to provide backup details and recover the node completely. For customized partitions steps (which are almost the same) can also be found in the owners guide.
Advantages / Disadvantages
Both type of backups serve another goal. Also, these are just examples - of course customized backup and restore scenario's are also possible.The procedure as described in the owners guide requires external storage, while the dbserver_backup.sh script uses space on the node - but that is also where the risk is. The backup made with dbserver_backup.sh works well for the purpose of rolling back upgrades. With the automation of dbnodeupdate.sh rollbacks can be done simple and quickly.
However - loss of critical partitions and/or filesystems will not be covered with this type of backup - so you may want to combine both types of OS backup. The general recommendation is to use the default built-in backup procedure when running dbnodeupdate to make easy rollback possible. But also backup the entire OS and customized filesystems outside of the database server with an interval based on your own requirements.