Testing using remote power
By user12625760 on May 03, 2006
I've blogged about the remote power systems we have in our labs before, here and here, however over the last few weeks I've been investigating how long it takes for the system to recover from a disk failure. The customer has a test case that involves pulling the drive and then seeing how long the application stalls for. Not a perfect test but a reasonable simulation for a drive failing. The goal is to have no more than a 30 second pause when the drive fails.
The trouble with this test is I need to pull the drive so I have to be in the lab and I'm not even in the same country as the test case.
If however I put the drive in a unipack and the arrange for that to be on remote power I can power off the drive remotely and automatically. This helps me as the test case in in Germany so I don't have to move the systems. It helps even more as I can now write a script that runs the test automatically in a loop. By doing this I can get this graph running the test over night while I sleep:
The 5 cases where we are over 30 seconds are a bit of a worry but the others show a nice curve giving some confidence that in the usual run of things the failure time is actually less than 20 seconds. The outlying results are on inspection of the logs the result of a Disconnected command time out on another target on the bus when the the target is failed.