As well as providing a platform that delivers outstanding performance and efficient consolidation for both databases and applications, Oracle SuperCluster M8 offers a solid foundation on which highly available services can be deployed. How can such services best be architected to ensure continuous service delivery with a minimum of disruption from both planned and unplanned maintenance events?
The first step in architecting highly available services on Oracle SuperCluster M8 is to understand the building blocks of the platform and the ways in which they support redundancy and high availability (HA).
Oracle SuperCluster M8
is built around best of breed components. The mean time between failures on these components is typically extremely long. Nevertheless, even well designed and manufactured hardware can fail. With that in mind, Oracle SuperCluster M8 is architected to avoid single points of failure, thereby reducing the likelihood of outage due to hardware failure. The redundancy characteristics of some of the key components of Oracle SuperCluster M8 are described below.
- Compute servers. The compute servers used in Oracle SuperCluster M8 are robust SPARC M8-8 servers that boast many features designed to maximize reliability and availability.
Each SPARC M8-8 server in Oracle SuperCluster M8 also includes Physical Domains (PDoms) that are electrically isolated and function as independent servers. Either one or two SPARC M8-8 servers can be configured in an Oracle SuperCluster M8 rack, and each SPARC M8-8 server includes two PDoms. With multiple PDoms always present, it is possible to avoid single points of failure in compute resources.
- Exadata storage. Three or more Exadata X7 Storage Servers are configured in every Oracle SuperCluster M8 rack. A minimum of three Exadata Storage Servers allows a choice of normal redundancy (double mirroring) and high redundancy (triple mirroring). It is possible to achieve high redundancy with as few as three Exadata Storage Servers thanks to the included Exadata Quorum Disk Manager software.
Up to eleven Exadata Storage Servers can be accommodated in a rack that hosts a single SPARC M8-8 server, and up to six in a rack that hosts two SPARC M8-8 servers.
- Shared storage. A ZFS Storage Appliance (ZFSSA) that delivers 160TB of raw storage capacity is included in every Oracle SuperCluster M8 to provide shared storage, satisfying infrastructure storage needs and also providing limited capacity and throughput for user files such as application binaries and log files. Appliance controllers are delivered in a cluster configuration, with a pair of controllers set up in an active-active configuration. Two equally sized disk pools (zpools) are set up, with one associated with each controller.
Should a controller fail for any reason, the surviving controller takes over both of the disk pools and all services until the failed controller becomes available again. The result is that a controller failure need not lead to a shared storage outage.
Disks in the shared storage tray of the ZFS Storage Appliance are mirrored to provide redundancy in the event of disk failure, with hot spares that are automatically swapped into the configuration in the event of disk failure.
It’s worth noting that hardware failures typically result in a service request being raised automatically if Oracle Auto Service Request (ASR) is configured.
On Oracle SuperCluster M8, iSCSI devices are assigned for all types of system disks and for zone root file systems. It’s worth noting that all iSCSI devices for any specific PDom are stored in the same ZFS Storage Appliance zpool (as already noted, a single zpool is associated with each of the two ZFSSA controllers). The intent is that any ZFSSA controller failure will only affect half of the PDoms (although any affect is of very brief duration thanks to an automated failover). All iSCSI devices associated with other PDoms will be unaffected.
- InfiniBand Switches. All Oracle SuperCluster M8 configurations include two InfiniBand Leaf Switches for redundancy. Each dual-port InfiniBand HCA is connected to both leaf switches, allowing packet traffic to continue even if a switch outage occurs.
The entry Oracle SuperCluster M8 configuration consists of one CMIOU in each PDom of a single M8-8 server, plus three Exadata Storage Servers. All larger Oracle SuperCluster M8 configurations include a third InfiniBand Spine Switch as well. The spine switch, which is connected to each leaf switch, provides an alternative path for InfiniBand packets as well as additional redundancy.
- Ethernet Networking. Although Oracle SuperCluster M8 does not include 10GbE switches (the customer supplies these switches), 10GbE NICs in SPARC M8-8 servers and on the ZFS Storage Appliance are typically connected to two different 10GbE switches to ensure redundancy in the event of switch or cable failure. The operating system automatically detects any loss of connection, for example due to cable or switch failure, and routes traffic accordingly. Each quad-port 10GbE NICs used in Oracle SuperCluster M8 is configured as two dual-port NICs, allowing redundant connections to be established for each NIC.
- Other components. A number of other components, including the SPARC M8-8 Service Processor, power supply units, and fans are also designed and configured to provide redundancy in the event of component failure.
Oracle SuperCluster M8 is not totally reliant on the hardware redundancy outlined in the previous section, extensive as it is. The design of Oracle SuperCluster M8 also allows a number of other mechanisms to be leveraged, providing users with the opportunity to layer software redundancy on top of the hardware redundancy.
Oracle SuperCluster M8 leverages a number of mechanisms to achieve software redundancy.
- Oracle Database Real Application Clusters (RAC) has long provided a robust and scalable mechanism for delivering highly available database instances based around shared storage. On Oracle SuperCluster M8, RAC database nodes can be placed on different PDoms to build highly resilient clusters, with data files located on Exadata Storage Servers. The end result is database service that does not need to be impacted by either a PDom or a storage server outage.
- Oracle Solaris Cluster, an optional software add on for Oracle SuperCluster M8, provides a comprehensive HA and disaster recovery (DR) solution for applications and virtualized workloads. On Oracle SuperCluster M8, Oracle Solaris Cluster delivers zone clusters, virtual clusters based on Oracle Solaris Zones, to support clustering across PDoms with fine-grained fault monitoring and automatic failover. Zone clusters are ideal environments for consolidating multiple applications or multitiered workloads onto a single physical cluster configuration, providing service protection through fine-grained monitoring of applications, policy-based restart, and failover within a virtual cluster. In addition, Solaris 10 branded zone clusters can be used to provide high availability for legacy Solaris 10 workloads. Oracle Solaris Cluster Disaster Recovery framework, formerly known as Solaris Cluster Geographic Edition, supports clustering across geographically separate locations, facilitating the establishment of a Disaster Recovery solution. It is based on redundant clusters, with a redundant and secure infrastructure between them. When combined with data replication software, this option orchestrates the automatic migration of multitiered applications to a secondary cluster in the event of a localized disaster.
- Built in clustering support is inherently provided with some applications (Oracle’s WebLogic Server Clusters is an example). Such support delivers redundancy without the need for specialized cluster solutions.
Note that both RAC and Oracle Solaris Cluster use the redundant InfiniBand links in each domain when setting up cluster interconnects. For example, Oracle Solaris Cluster on Oracle SuperCluster M8 leverages redundant IB partitions, each in a separate IB switch, to configure redundant and independent cluster interconnects.
Architecting a Highly Available Solution for Oracle SuperCluster M8
Although considerable redundancy is provided in hardware components on Oracle SuperCluster M8 (for example, all 10GbE NICs and InfiniBand HCAs include two ports, which are connected to different switches), Oracle does not recommend putting the primary focus on low-level components when considering HA.
For example, Exadata Storage Servers use InfiniBand
to send and receive network packets associated with database access. The InfiniBand HCAs used in storage servers have two ports, thereby providing resilience in the event of switch or cables issues. But each storage server has a single HCA, which means that an HCA failure will take the storage server offline. While this might seem like a problem at first glance, there are a number of reasons why this design not only makes sense, but has proven enormously successful:
- Given the long mean time between failures of InfiniBand HCAs, replacement due to failure is extremely rare.
- Building redundancy into every possible failure point would not only add cost, it would increase both hardware and software complexity.
- Exadata Storage Servers are never installed as single entities. The key unit of redundancy is the storage server itself, not its components.
Another key factor is that outages are not solely caused by hardware failures. Planned maintenance, such as applying a Quarterly Full Stack Download Patch (QFSDP), may necessitate an outage of affected components. Other unplanned events, such as shutdowns caused by external issues (such as power or cooling problems), software or firmware errors, and even operator error, can sometimes lead to outages.
It is important to architect solutions that focus at a high level on solving real-world problems, rather than to place the focus on low level problems that may never occur. This design principle can usefully be applied to every configuration in your data center.
The most effective way to ensure continuous availability of mission critical services is to set up a configuration that is resilient to component outage, wherever it occurs, and whatever the cause. For such a strategy to be effective, it will need to include a Disaster Recovery element based on offsite replication and failover. An offsite mirror of the production environment is a necessary precaution against both natural and man-made disasters, and a key component in any highly available deployment. The simplest and safest strategy at the disaster recovery site is to deploy the same components that are in use at the primary site. Best practices for disaster recovery with Oracle SuperCluster are addressed in the Oracle Optimized Solution for Secure Disaster Recovery whitepaper
, subtitled Highest Application Availability with Oracle SuperCluster.
At the local level, clustering capabilities can be used to deliver automatic failover whenever required. The extensive hardware redundancy of Oracle SuperCluster M8 is not wasted—it will contribute by greatly reducing the likelihood of hardware failure that results in downtime.
Quarterly patches, and in particular the SuperCluster QFSDP, can be applied in the fastest and most efficient manner possible using a disruptive approach of shutting down the system and applying updates in parallel. One benefit of a highly available configuration, though, is that a QFSDP can be applied to the various components of a SuperCluster system in a rolling fashion without loss of service. Rolling updates take longer overall to complete since components are not updated in parallel. They are much less disruptive, though. Speak to an Oracle Services representative to understand whether rolling updates can applied to your SuperCluster system.
Backup and Recovery
A crucial element of any highly available environment is the ability to perform backups and restores as required. The Oracle Optimized Solution for Backup and Recovery of Oracle SuperCluster whitepaper
specifically addresses this requirement, documenting best practices for backup and recovery on Oracle SuperCluster.
Backup and restore must cover infrastructure and configuration metadata as well as customer data, and for SuperCluster, Oracle provides the osc-config-backup tool for this purpose. The tool stores its backups on the included ZFS Storage Appliance. Note, though, that the ZFS Storage Appliance itself and the Exadata Storage Servers must be backed up independently.
The SuperCluster platform includes multiple components of which the following are backed up by osc-config-backup:
- M8 Logical Domains (LDoms) configuration (note that the older SuperCluster M7, T5-8, M6-32, and T4-4 platforms are also supported)GbE management Switch
- Infiniband Switches
- iSCSI mirrors of rpool and u01-pool on dedicated domains
- ZFS snapshots of rpool and u01-pool on dedicated domains
- Explorer from each dedicated domain
- SuperCluster configuration information (OES-CU) data
- ZFS Storage Application configuration information
- For SuperCluster environments that include Root Domains and IO Domains, Root Domains can be treated like Dedicated Domains and backed up accordingly. IO Domains use iSCSI LUNs located on the included ZFS Storage Appliance for their boot disks, and these LUNs can be backed up simply by creating a ZFS snapshot. Redundancy is provided by the disk mirroring used with the ZFS Storage Appliance.
Applications and Optimized Solutions Using Oracle SuperCluster
For further information about application deployments in a highly available environment on Oracle SuperCluster, refer to the following links:
The hardware components of Oracle SuperCluster M8
provide a key set of ingredients for delivering highly available services. In combination with clustering software such as Oracle Solaris Cluster and Oracle Database RAC, services can continue without interruption during both planned and unplanned outages. An offsite configuration that replicates the main site can ensure that even a disaster need not lead to an extended loss of service.