X

An Oracle blog about Exadata

  • May 25, 2019

How Oracle Builds Maximum Availability into Exadata

This post offers an insider's view to how the Exadata team thinks about high availability with three examples of availability challenges and how Exadata addresses them. Each section below describes an availability challenge and includes a video where Michael Nowak (Architect, MAA) explains each solution and discusses how Oracle technical staff continually identify and address availability challenges.

Ensuring I/O Problems Do Not Affect Service Quality

Slow or hung I/Os and sick disks are fact of life, and Exadata implements a range of Machine Learning and other techniques to identify and remedy problematic I/Os. This enables Exadata to maintain service levels in the face of these real-life problems. 

 

In this 4 min video, Michael Nowak explains how storage servers detect and cancel or repair slow I/Os and hung I/Os, and confine sick disks. And how Database servers cooperate with Storage servers to deal with undetected issues via I/O latency capping.  

Data Availability Requires Protecting Storage and Its Software

When thinking about protecting and keeping data highly available, a first consideration is to introduce redundancy in how and where the data are stored. This addresses failure risks for individual (nonvolatile) storage devices (e.g., magnetic disks and flash memory). It is just as important to ensure the availability of the system managing the storage devices. As of Exadata X7, each storage server includes two redundant M.2 solid state drives to house the operating system and the Exadata storage server software.

When needed, an M.2 drive can be replaced online while the storage server continues to service the application, with redundancy via Intel RSTe RAID technology. In this 3 min video, Michael Nowak explains how this solution evolved and why it is an important improvement. 

Beyond Software and Hardware Redundancy: Operator Error

Operator errors create challenges to availability beyond hardware and software remedies, for example, if a data center operator mistakenly removes a disk at a time when its removal would compromise storage redundancy. Starting with Exadata X7, storage servers include a “do-not-service” LED to alert datacenter personnel that shutting down a storage cell when the redundancy of a storage cell would be compromised.

In this 4 min video, Michael Nowak explains ASM disk partnering and how it drives this LED warning light.

Exadata: Built-in High Availability

Exadata Built-in AvailabilityFor the Exadata product team at Oracle, high availability is a fundamental design principle and an ongoing commitment. Exadata embodies the leading edge of Oracle's Maximum Availability Architecture, a best practices blueprint based on proven Oracle high availability technologies, end-to-end validation, expert recommendations and customer experiences (see also the technical overview and MAA blog).

You can learn more about Exadata here and of course by perusing this blog

We are always interested in your feedback. You are welcome to engage with us via Twitter @ExadataPM and by comments here.

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.