An Oracle blog about ZFS Storage

  • January 11, 2016

Disk Scrub - Why and When?

Steve Tunstall
Principle Storage Engineer

What is a "Disk Scrub" and why should you do it?

To begin with, It's a good idea to do this about once a month. Have you ever done one, ever? Maybe you don't need to. We will talk about what this is, why you would or would not do it, and how you can automate it if you wish. We're talking about this button, seen here in the Configuration-->Storage page of your ZFSSA:


ZFS will automatically protect your data from "Bit rot", something that can happen to ALL forms of storage. Every time ZFS reads a block, it compares it to it's checksum, and automatically fixes it. That's great if you read your data a lot. However, what if you write the data, and then don't look at it again for years? Is it protected from bit rot?


That's where a disk scrub comes in. Disk scrub will read all the VDEVs in the pool, therefor fixing any and all bit rot errors. If you don't normally read your data in the pool, Oracle recommends a disk scrub about every month. This is a very low-priority background process, and I doubt you'll even notice it's happening. Because it's low-priority, it can take anywhere from 1 second to many weeks to complete, all depending on how much data and how busy your ZFSSA is. I can give you some examples. These both came from the Oracle Cloud, which uses many ZFS systems. A ZFSSA with 192 4TB drives, configured as a single RAIDz1 pool, with only 1TB of data currently in it, finished a disk scrub in less than 90 seconds. On the other extreme end, and older 7410 system, with only 256GB of DRAM, and 192 2TB drives, completely full and running a high IO, write-intensive workload around the clock, took many months to complete a scrub!

On his excellent blog, Matt Barnson gives some tips on who should use a scrub, and how often:

Should I do this?

  1. Is the pool formatted with either RAIDZ or Mirror2 configuration? Although these two options offer higher performance than RAIDZ2 or Mirror3, redundancy is lower. (No, I'm not going to talk about Stripe. That should only ever be used on a simulator; I don't even know why it exists on a ZFS appliance.)
  2. Are unable to absolutely 100% guarantee that every byte of data in the pool is read frequently?  Note that even databases that the DBAs think of as "very busy" often have blocks of data that go un-read for years and are at risk of bit rot. Ask me how I know...
  3. Do you run restore tests of your data less frequently than once per year?
  4. Do you back up every byte of data in your pool less frequently than once per quarter?

If you answer "Yes" to any of the above questions, then you probably want to scrub your pools from time to time to guarantee data consistency.

How often should I do this? 

This question is challenging for Support to answer, because as always the true answer is "It Depends".  So before I offer a general guideline, here are a few tips to help you create an answer more tailored to your use pattern.

  1. What is the expiration of your oldest backup? You should probably scrub your data at least as often as your oldest tapes expire so that you have a known-good restore point.
  2. How often are you experiencing disk failures? While the recruitment of a hot-spare disk invokes a "resilver" -- a targeted scrub of just the VDEV which lost a disk -- you should probably scrub at least as often as you experience disk failures on average in your specific environment.
  3. How often is the oldest piece of data on your disk read? You should scrub occasionally to prevent very old, very stale data from experiencing bit-rot and dying without you knowing it.

If any of your answers to the above are "I don't know", I'll provide a general guideline: you should probably be scrubbing your zpool at least once per month. It's a schedule that works well for most use cases, provides enough time for scrubs to complete before starting up again on all but the busiest & most heavily-loaded systems, and even on very large zpools (192+ disks) should complete fairly often between disk failures.

How can I automate this?

There is no easy button to automate this, but you can make a script to do it for you from the CLI. I don't want to put the script commands here and get in trouble with anyone, but you can ask your friendly, neighborhood SE for some help in making one.

In the meantime, just push the button in the BUI. Easy. 

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.