Fencing - yet again
Sheesh. It is amazing to me how this topic continues to spin. Clearly people just like speculate, and I suppose a little controversy can serve to energize, but this topic seems to have taken on a life of its own. The real question in my mind is why do you care? Fencing is a core functionality of the cluster infrastructure. You cant control it, or influence it in any way. It has to be there in some form, or bad things will happen (corruptions being one of them). And if the particular fencing implementation in Oracle clusterware was fundamentally flawed, it would have been exposed long ago over the course of the 5+ years of existence, and the thousands of deployments.
So any discussion of fencing and the Oracle implementation is purely theoretical, and largely academic, since it has more than proven itself. ok. I enjoy lively academic or theortical technical debate, particularly over a beer or 2, but not at the expense of ignoring reality. So lets pull apart the discussions Ive seen, and address them point by point. Note that this discussion is focused solely on Oracle Clusterware used in conjunction with RAC.
Oracle Clusterware uses the Stonith algorithm. This is only partially true. Oracles fencing mechanism is based on the Stonith Algorithm. However, there is no general design rule of how that algorithm should be implemented. Strict use of the algorithm is complicated, or perhaps even prevented, by the fact that there is no API on many platforms for doing a remote power-off reset of the system. So the current implementation is in fact a suicide, as opposed to an execution. As system/OS vendors makes such APIs available, Oracle will be able to make use of them.
Suicide is not reliable because you are expecting an already unhealthy system to respond to some other directive. Sure. There are corner cases where this is a possibility, but these have proven to be very rare, they have been fixed when they appeared, and the real underlying concern, which is exposure to data corruption, in non-existent (see next point). This issue is actually related to the FUD we often see about some cluster managers running in Kernel mode vs user space where Oracle Clusterware runs. Well ... If the OS kernel is misbehaving, then it doesnt really matter where the clusterware runs - bad things are going to happen. (Weve seen this occur in several situations.) If someone makes a programming error in the clusterware code and it is running in kernel mode, then the OS kernel is exposed. (This is theoretical since Oracle clusterware does not run in kernel mode, but its not like this hasnt happened before in other envrionments where user/application code is allowed to run in kernel space). And lastly, if running in userspace, and other user space programs misbehave, then the obvious concern in the sensitivity the cluster has to that misbehaving application - like not being able to get CPU time to communicate in a timely manner. We have certainly seen this kind of scenario many times, but in general it is easily mitigated by renicing or increasing the priority of the key background communication processes. Bottom line is that suicide has proven sufficiently reliable. Any claim to the contrary is pure speculation.
Because suicide is unreliable, you are exposed to data corruptions. Not true. Either in theory, or in practice. Its no secret RAC does unbuffered IO (bypasses the OS cache), and any IO done in a RAC environment is in complete coordination with the other nodes in the cluster. Cache fusion assures this. And this holds true in a split brain condition. If RAC cant coordinate the write with the other nodes as a result of interconnect failure, then that write is put on hold until communication is restored, or until the eviction protocol is invoked.
This is obviously over simplified, but frankly, so are the criticisms in this area. The challenge to any non-believer is the following: Find me a repeatable test case where interconnect failure, and the resulting fencing algorithm implemented in Oracle clusterware, results in database corruption. If you are successful, I will:
1. Fall off my chair in disbelief
2. Write : They were right, I was wrong, 1000 times in my blog, and apologize profusely to anyone who may have taken offence to the claims made in this posting.
3. File a bug, and get the damn thing fixed.
Now that I think about it, it would probably be prudent to reverse 2. and 3. Note however, that in the off chance you are successful, it is a bug, and will be fixed as such. As opposed to a fundamental architectural flaw.
So lets put this one to bed. Next topic.