Scaling, Fault Management, and Open Standards
By kgibson on Apr 06, 2005
Cars have gotten more complex over the years, but I've been happy to learn that they have also developed a great fault management architecture built into every new car. They have sensors all over the engine, transmission, and other critical components that send a stream of telemetry through an event bus to embedded service processors running fault management tasks. These tasks continuously analyze streams of events looking for a failure, or sensor readings consistently out of range, turn on the check engine light and provide a specific problem code without requiring you, or the mechanic to search through logs of raw events.
Now, what's really made this nice for me is combining it with open standards. Our garage is heterogeneous. All four of our cars come from different auto companies but, it turns out that every car sold in the U.S. since 1996 provides a standard interface to the fault management system - a connector usually located under the left side of the dashboard. For about $80 I bought an interface adapter that connects this to a standard serial port on my notebook and I downloaded an open-source program that retrieves and interprets the diagnostic codes. So, I can plug my notebook PC into any of our cars and communicate with the service processor.
I've used it a few times. Once it basically told me that we had to take our daughter's Honda in for a new catalytic converter. In other cases though, it saved trips to the shop. Once, a couple days after bringing my car home from the dealer after routine maintenance, it started running rough. The diagnostic code showed that one spark plug wasn't working and after a quick check under the hood, I found that a spark plug wire wasn't seated properly and reconnected it.
We've built some of these same concepts into Sun's Fault Management Architecture. Through our participation in the SNIA storage management standards group we are helping create similar open standards. Even better, since the vast majority of our servers are connected to the Internet, we have the ability to send codes back to a Sun support center. In many cases, we can send a support engineer to the customer's site before they even know they have a problem.