Today's guest post is by Allan Packer, Senior Principal Software Engineer working for the Solaris Systems Engineering organization in the Operating Systems and Virtualization Engineering group at Oracle with a focus on Oracle SuperCluster.
Hardware upgrades have always been supported on Oracle SuperCluster, but how flexible are they? And will any benefits be outweighed by the disruption to service when a production system is upgraded?
Change is an ever-present reality for any enterprise. And with change comes an opportunity cost, unless IT infrastructure is flexible enough to satisfy the evolving demand for resources. From the very first release of Oracle SuperCluster, a key attraction of the platform has been the ability to upgrade the hardware as business needs change.
Modifying hardware can be very disruptive. Hardware configuration changes create a ripple effect that penetrates deep into the software layers of a system. For this reason, an important milestone in the upgrade landscape for both Oracle SuperCluster M8 and Oracle SuperCluster M7 has been the development of special purpose tools to automate the upgrade steps. These tools are able to reduce the necessary downtime associated with an upgrade, and also minimize the opportunity for misconfiguration during what can be a complex operation.
Compute resources on both Oracle SuperCluster M8 and Oracle SuperCluster M7 are delivered in the form of CPU, Memory, and I/O Unit (CMIOU) boards. Each SPARC M8 and SPARC M7 chassis supports up to eight of these boards, organized into two electrically isolated Physical Domains (PDoms) hosting four boards each.
Each CMIOU board includes:
- One processor with 32 cores—a SPARC M8 processor for Oracle SuperCluster M8, or a SPARC M7 processor for Oracle SuperCluster M7. Each core delivers 8 CPU hardware threads, so each processor presents 256 CPUs to the operating system.
- Sixteen memory slots, fully populated with DIMMs. Oracle SuperCluster M8 uses 64GB DIMMs, for a total of 1TB of memory. Oracle SuperCluster M7 uses 32GB DIMMs, for a total of 512GB of memory.
- Three PCIe slots. One slot hosts an InfiniBand HCA, and another hosts a 10GbE NIC. In the case of Oracle SuperCluster M8, the 10GbE NIC is a quad-port device. Oracle SuperCluster M7 provides a dual-port NIC. The third PCIe slot is empty on all except the first CMIOU in each PDom, where it hosts a quad-port GbE NIC. Optional Fiber Channel HBAs can be placed in empty slots.
Adding CMIOU boards
CMIOU boards can added to a PDom whenever more CPU and/or memory resource is required. Up to four CMIOU boards can be placed in each PDom. The diagram below illustrates a possible sequence of upgrades in a SPARC M8-8 chassis, from a quarter-populated configuration with two CMIOUs (one per PDom), to a half-populated configuration with four CMIOUs, to a fully-populated configuration with eight CMIOUs.
PDoms can be populated with as many CMIOUs as required—there is no requirement to use the same number of CMIOU boards in both PDoms on the same chassis. The illustration below shows two SPARC M8-8 chassis with different numbers of CMIOUs in each PDom.
Adding a second chassis
Many Oracle SuperCluster installations are initially configured with a single compute chassis. Every SPARC M8-8 and SPARC M7-8 chassis shipped with Oracle SuperCluster includes two electrically isolated PDoms, so highly available configurations begin with a single chassis. When the need for additional compute resources exceeds the capacity of a single chassis, a customer can add a second chassis with one or more CMIOUs, thereby allowing total compute resources to be increased by up to two times. Since each CMIOU board in the second chassis comes equipped with its own InfiniBand HCA, additional resources immediately become available on the InfiniBand fabric after the upgrade.
Note that both SPARC M8-8 and SPARC M7-8 chassis consume ten rack units. Provided no more than six Exadata Storage Servers have been added to an Oracle SuperCluster rack, sufficient space will be available to add a second chassis.
Where memory resources have become constrained, the simplest way to increase memory capacity is to add one or more additional CMIOU boards. Such upgrades come with the extra benefit of additional CPU resources as well as greater I/O connectivity.
Note that it is not supported to exchange existing memory DIMMs for higher density DIMMs. Adding additional CMIOUs achieves a similar effect in a more cost effective manner. The cost of a CMIOU populated with lower density DIMMs, a SPARC processor, an InfiniBand HCA, and a 10GbE NIC, compares favourably just with the cost of higher density DIMMs.
Exadata storage upgrades
Storage Servers can be added to existing Oracle SuperCluster configurations. Even early Oracle SuperCluster platforms can benefit from the addition of current model Exadata Storage Servers.
Customers adding Exadata Storage quickly discover that both the performance and available capacity of current Exadata Storage Servers far outstrips that of older models. Best practice information is available for such deployments, and should be followed to ensure effective integration of different storage server models into an existing Exadata Storage environment.
Note that Oracle SuperCluster racks can host eleven Exadata Storage Servers with one SPARC M8-8 or SPARC M7-8 compute chassis, or six Exadata Storage Servers with two compute chassis.
The graphic below illustrates an Oracle SuperCluster M8 rack before and after an upgrade that adds a second M8-8 chassis and three additional Exadata Storage Servers.
External storage upgrades
General-purpose storage capacity can be boosted by adding a suitably configured ZFS Storage Appliance that includes InfiniBand HCAs. This storage can then be made available via the InfiniBand fabric and used for application storage, backups, and other purposes.
Implications for domain configurations
Additional compute resources can be assigned in a number of different ways:
- Creating new root domains
- Root domains provide the resources needed by I/O domains, which can be created on demand using the SuperCluster Virtual Assistant. I/O domains provide a flexible and secure form of virtualization at the domain level. Although they share I/O devices using the efficient SR-IOV, each I/O domain has its own dedicated CPU and memory resources. Oracle Solaris Zones are also supported in I/O domains, providing nested virtualization.
A one-to-one relationship exists between CMIOU boards and root domains, which means that a root domain can be created for each new CMIOU that is added. Each root domain supports up to sixteen additional I/O domains.
Note that creating new I/O domains is not the only way of consuming the extra resources. CPU cores and memory provided by an additional CMIOU board can also be used to increase resources in existing I/O domains.
- Creating new dedicated domains
Dedicated domains provide CPU, memory, and I/O resources—specifically an InfiniBand HCA and a 10GbE NIC—that are not shared with other domains (and are therefore dedicated). Virtualization within dedicated domains is provided by Oracle Solaris Zones.
New CMIOU boards can be used to create new dedicated domains. Dedicated domains can be created from one or more CMIOU boards. If two CMIOU boards are added, for example, they could be used together to create a single dedicated domain, or they could be used individually to create two dedicated domains.
When multiple dedicated domains have been created in a PDom, CPU and memory resources do not need to be split evenly between the dedicated domains. These resources can be assigned to dedicated domains at a granularity of one core and 16GB of memory.
The largest possible dedicated domain on both Oracle SuperCluster M8 and Oracle SuperCluster M7 contains four CMIOU boards.
- Expanding existing dedicated domains
A new CMIOU board can be used to boost the resources of an existing dedicated domain, up to the maximum capacity of four CMIOU boards per dedicated domain.
The available upgrade options will depend on the specifics of an existing domain configuration as well as the number of CMIOU boards being added. Customers should consult their Oracle account team to explore possible options.
What is the required downtime for hardware upgrades?
Two deployment approaches are available for hardware upgrades:
- Rolling upgrades
Rolling upgrades allow service outages associated with a hardware upgrade to be minimized or eliminated. The reason is that only one PDom is affected at a time. Provided the Oracle SuperCluster configuration has been configured to be highly available, services need not be affected during a rolling upgrade. High availability can be achieved using clustering software, such as Oracle Database Real Application Cluster (RAC) for database instances and Oracle Solaris Cluster for applications.
The downside of rolling upgrades is that the overall period of disruption is greater. The reason is that PDoms are only upgraded one at a time, so the upgrade process takes longer.
- Non-rolling upgrades
The benefit of non-rolling upgrades is that the overall period of disruption is shorter, since PDoms are upgraded in parallel. The downside of non-rolling upgrades is that all services become unavailable during the upgrade, since a full system outage is required.
Before the hardware upgrade process can begin, a suitable Quarterly Full Stack Download Patch (QFSDP) must be applied to the existing system, and backups taken with the osc-config-backup tool.
For information about the expected period of time required to complete rolling or non-rolling upgrades for a particular configuration, the customer’s Oracle account team should be consulted.
Hardware upgrades allow the available resources of Oracle SuperCluster
to be extended as required to satisfy changing business requirements. Upgrades of varying complexity can be handled smoothly while minimizing downtime, thanks to tool-based automation of the upgrade process. The end result is that customers are able to realize the benefits of hardware upgrades without the need for extended periods of disruption to production systems
About the Author
Allan Packer is a Senior Principal Software Engineer working for the Solaris Systems Engineering organization in the Operating Systems and Virtualization Engineering group at Oracle. He has worked on issues related to server systems performance, sizing, availability, and resource management, developed performance and regression testing tools, published several TPC industry-standard benchmarks as technical lead, and developed a systems/database training curriculum. He has published articles in industry magazines, presented at international industry conferences, and his book "Configuring and Tuning Databases on the Solaris Platform" was published by Sun Press in December 2001. Allan is currently the technical lead and architect for Oracle SuperCluster.