All About Exadata Disk Scrubbing

November 6, 2023 | 8 minute read
Alex Blyth
Senior Principal Product Manager - Exadata
Text Size 100%:

In this post, we will talk about a fundamental yet commonly misunderstood feature on Exadata - Automatic Hard Disk Scrub and Repair. Sure, it sounds cool, but it's also a critical Maximum Availability Architecture (MAA) feature on Exadata to ensure that silent corruptions are detected and repaired!

Let's start at the beginning.

Oracle Database, no matter what platform you're running on, eventually writes data onto some form of persistent storage. Most commonly, this is likely to be a hard disk drive (HDD) or possibly a flash device. On Exadata, we offer two storage servers that broadly speaking align to these media - High Capacity (HC) storage servers, which in the X10M generation ship with 12 x 22 TB HDDs, and Extreme Flash (EF) storage servers, with 4 x 30.72 TB Flash drives. Focusing on the High Capacity storage exclusively as we do not need to scrub Flash drives, data is persisted to good 'ole spinning disks. When database blocks are written to the HDD, they will span multiple 4k sectors (on the latest HDDs, earlier had 512-byte sectors), and it is these on-disk sectors that we are concerned about.

Over time, HDDs age, and data are accessed less and less frequently - obviously, this depends on the data itself and its purpose, but for now, this is enough to continue with. As data is accessed less frequently, Oracle Database has less opportunity to inspect blocks (and implicitly the disk sectors) to check if the block is still valid and no corruptions have crept in over time.

It's worth pausing here and defining what I mean by corruption. Data corruption comes in multiple forms but can be broken down into the following basic categories - logical and physical.

Logical is when data is changed so that it is not logically valid in the context in which it is intended. For example, if I were to update a list of values table in a table and change all the countries to say, 'Australia,' I have corrupted the database logically. The data is technically valid - in this case, it's just text - but the context of the data no longer makes real-world sense. To fix this, use database features like flashback table, flashback transaction, or even flashback database to return the data to its earlier state.

Physical corruption is more insidious - it happens externally to the database and can go undetected until you try to access data. In essence, this is typically referred to as bit rot and can affect all drives over time. The drives that we ship in Exadata include the typical Error Correction Codes (ECC) and S.M.A.R.T features found in enterprise-grade disks. Still, as useful as these features are, they are not Oracle Database aware.

Enter Automatic Hard Disk Scrub and Repair! Automatic Hard Disk Scrub and Repair - scrubbing from here on - proactively inspects the sectors on the disk for physical errors and, in so doing, can detect issues that HDD ECC, etc, may not. Scrubbing is an automated process on Exadata that kicks in when the disks are idle ( less than 25% busy ) so as not to impact database performance and is set on a bi-weekly schedule by default.

If scrubbing detects a sector is corrupted, the storage server requests ASM to repair the sector from one of the mirrors on another storage server. This is another reason why multiple mirrors are essential - if one mirror is corrupt, there is at least one more copy available and can be used to repair the faulty one. Remember that Oracle's Maximum Availability Architecture recommendation employs High Redundancy, which uses triple-mirroring. Suppose bad sectors (corruptions) are detected on a disk. In that case, more bad sectors will likely develop on that disk, leading Exadata to adaptively and automatically increase the frequency of scrubbing on that disk until all corruptions are repaired.

Where scrubbing on Exadata differs from the scrubbing performed by ASM is that the sector being scrubbed (checked for errors) doesn't leave the Exadata storage server, eliminating unnecessary network traffic and avoiding CPU consumption on the database servers.

Scrubbing targets sectors that the database has not read recently. Data that is less and less frequently used needs to be checked to ensure that corruption is found early and repaired quickly rather than becoming an issue for you at 2 a.m. one day in the future.

Backups and mirrors are worth a special mention here. You may be thinking, I backup my databases regularly - wouldn't that mean all sectors are being routinely checked? It's a reasonable thought, but you are also probably using the Block Change Tracking and RMAN Unused Block Compression, which will skip blocks that haven't been modified since the last backup and skip empty blocks altogether. Reads also typically occur on the primary copy of data, not the secondary or tertiary copies, so there would be potential for bad sectors to appear on these mirrors if scrubbing of all disks did not take place. 

How do you see if scrubbing is in action? Well, Automatic Workload Repository (AWR) reports, Exadata Metrics and Real-Time Insights provide excellent visibility into scrubbing activity. For example, in AWR, you will find the following data in the Exadata section.

 

AWR Top IO Reasons by Request - example

 

 

As you can see from this example, this system reported a significant amount of I/O during this period as 'scrub I/O.' And when the storage server is idle, that's a good thing. When actual database I/Os increase, Exadata automatically backs off and stops scrubbing.

From the storage cells, we can use cellcli and execute 'list metriccurrent where name = 'CD_IO_BY_R_SCRUB_SEC''. In the example below, we can see that the 12 HDDs CD_00 - CD_11 are currently scrubbing sectors at a rate of around 115MB/s.

 

CellCLI> list metriccurrent where name = 'CD_IO_BY_R_SCRUB_SEC' and metricObjectName like 'CD.*'
     CD_IO_BY_R_SCRUB_SEC    CD_00_exadbm01celadm01  115 MB/sec
     CD_IO_BY_R_SCRUB_SEC    CD_01_exadbm01celadm01  118 MB/sec
     CD_IO_BY_R_SCRUB_SEC    CD_02_exadbm01celadm01  117 MB/sec
     CD_IO_BY_R_SCRUB_SEC    CD_03_exadbm01celadm01  113 MB/sec
     CD_IO_BY_R_SCRUB_SEC    CD_04_exadbm01celadm01  114 MB/sec
     CD_IO_BY_R_SCRUB_SEC    CD_05_exadbm01celadm01  119 MB/sec
     CD_IO_BY_R_SCRUB_SEC    CD_06_exadbm01celadm01  112 MB/sec
     CD_IO_BY_R_SCRUB_SEC    CD_07_exadbm01celadm01  120 MB/sec
     CD_IO_BY_R_SCRUB_SEC    CD_08_exadbm01celadm01  116 MB/sec
     CD_IO_BY_R_SCRUB_SEC    CD_09_exadbm01celadm01  115 MB/sec
     CD_IO_BY_R_SCRUB_SEC    CD_10_exadbm01celadm01  116 MB/sec
     CD_IO_BY_R_SCRUB_SEC    CD_11_exadbm01celadm01  113 MB/sec

 

The cell above represents an idle cell. If it were under load, the values on the right would drop to 0 MB/sec.

If you have Real-Time Insight set up (see here and here for more details), you can configure the storage servers to send CD_IO_BY_R_SCRUB_SEC metrics to your dashboard of choice for visual representation of scrubbing across all your Exadata cells.

For example, you may see a graph like the one below. In this example, the high I/O Requests for the Cell Disks (IOPs - CD) in the red area are all attributable to scrubbing. As database workload was increased on this server (beginning in the green area), the scrub I/Os were reduced and ultimately stopped to ensure database I/Os were not impacted.

Exadata Real-Time Insight graphs showing scrub I/O reducing as workload increases

Now that we can see scrub I/O let's cover a situation that has come up a few times recently that causes confusion.

If you look at the Exadata OS I/O Stats section, you may see the following. What's going to jump out at you immediately is the red highlights towards the bottom of the image. Even before you read the notes above the table, you know something is wrong. Right?

Exadata OS I/O statictics - abnormal IOPs on storage cells

Well, no, and yes. Let me explain.

Firstly, note that all the storage servers are less than 25% busy. Idle enough for scrubbing to be initiated.

Next, read the rest of the notes, and you will see this line - 'Maximum cell capacity for hard disks: IOPS: H/16.0T: 2,556 | IO MB/s: H/16.0T: 1,776'. Of particular importance here is the number of IOPS that each 'cell' (storage server) is capable of. In this example, the system is an X9M with both High Capacity (disk) and Extreme Flash (flash) storage servers - each having different characteristics, as you can see in the notes. If you do a little math, you will need each of those angry red data points to be around 3-6x more than the actual capacity of the storage server. Clearly, the storage servers can't do more I/O than they are physically capable of.

So what's the reason? Hard disk drives in the High Capacity storage servers connect to a disk controller, which includes a small cache. What AWR is reporting in the snippet above is the number of IOPS serviced by the disk controller cache, not the physical IOPS serviced by the disk itself.

But many folks look at this and think disabling scrubbing is the panacea to the problem. Stop doing scrub I/O, and the issue stops. It sounds like it makes sense. Such a decision is a placebo and, realistically, may be harmful in the long run.

Coming back to my no and yes answers from above:

No - the fact that scrubbing is active is good for the health of the database and system, and the AWR report is telling you that it's running.

Yes - something is not correct, but that is in the way we are reporting the IOPS in AWR. In other words, in some versions, the AWR over-reports the IOPS by exposing the IOs performed from the disk controller cache instead of the disk itself. 

The good news is that this behavior is fixed and included in the 19.16 and higher Database Release Updates. For earlier releases - like the system above, which runs DB 19.14 - if you see similar output in your AWR when scrubbing is active, raise an SR and have them confirm the issue and take the required steps to resolve it.

The final point I want to leave you with is Automatic Hard Disk Scrub and Repair is a crucial Exadata Maximum Availability Architecture feature. It detects and repairs data corruption before it becomes an issue and should not be disabled. Scrubbing is automatically throttled, so it does not impact real database workloads. As it performs the majority of its work within the storage servers (ASM only gets involved to repair corruptions), it is highly efficient.

Thanks to the awesome Jony Safi in the Maximum Availability Architecture team for providing the above data and screenshots, reviewing the blog, and being a pleasure to work with. Thanks also to Shaun Levey for the additional technical review. 

Alex Blyth

Senior Principal Product Manager - Exadata

Alex Blyth is a Product Manager for Oracle Exadata with over 25 years of IT experience mainly focused on Oracle Database, Engineered Systems, manageability tools such as Enterprise Manager and most recently Cloud. Prior to joining the product management team, Alex was a member of the Australia/New Zealand Oracle Presales community and before that a customer of Oracle's at a Financial Services organisation.

Show more

Previous Post

Exadata System Software Updates - October 2023

Alex Blyth | 2 min read

Next Post


Exadata System Software Updates - November 2023

Alex Blyth | 2 min read
Oracle Chatbot
Disconnected