Friday Oct 12, 2012

Chessin's principles of RAS design

In late 2001 I developed an internal talk on designing hardware for easier error injection, prevention, diagnosis, and correction. (This talk became the basis for my paper on injecting errors for fun and profit.)

In that talk (but not in the paper), I articulated 10 principles of RAS design, which I list for you here:

  1. Protect everything
  2. Correct where you can
  3. Detect where you can't
  4. Where protection not feasible (e.g., ALUs), duplicate and compare
  5. Report everything; never throw away RAS information
  6. Allow non-destructive inspection (logging/scrubbing)
  7. Allow non-destructive alteration (injection) (that is, only change the bits you want changed, and leave everything else as is)
  8. Allow observation of all the bits as they are (logging)
  9. Allow alteration of any particular bit or combination of bits (injection)
  10. Document everything
Of course, it isn't always feasible to follow these rules completely all the time, but I put them out there as a starting point.

Friday Mar 11, 2011

Injecting errors for fun and profit

I was invited last year by Mike Shapiro to submit an article to Communications of the ACM on error injection. I did so, and it was published in the September 2010 issue.

It is also available online at http://queue.acm.org/detail.cfm?id=1839574.

I hope you have as much fun reading it as I did writing it, not to mention the fun I had working on the Solaris Software Recovery Project.

(You had fun working on the Solaris Software Recovery Project?
I told you not to mention that!)

About

user12608173

Search

Categories
Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today