Amazon S3 Silent Data Corruption

While catching up on my reading, I came across an interesting article focused on the Amazon's Simple Storage Service (S3). The author points to a number of complaints where Amazon S3 customers had experienced silent data corruption. The author recommends calculating MD5 digital fingerprints of files before posting them to S3 and validating those fingerprints after later retrieving them from the service. More recently, Amazon has posted a best practices document for using S3 that includes:

Amazon S3’s REST PUT operation provides the ability to specify an MD5 checksum (http://en.wikipedia.org/wiki/Checksum) for the data being sent to S3. When the request arrives at S3, an MD5 checksum will be recalculated for the object data received and compared to the provided MD5 checksum. If there’s a mismatch, the PUT will be failed, preventing data that was corrupted on the wire from being written into S3. At that point, you can retry the PUT.

MD5 checksums are also returned in the response to REST GET requests and may be used client-side to ensure that the data returned by the GET wasn’t corrupted in transit. If you need to ensure that values returned by a GET request are byte-for-byte what was stored in the service, calculate the returned value’s MD5 checksum and compare it to the checksum returned along with the value by the service.

All in all - good advice, but it strikes me as unnecessarily "left as an exercise to the reader". Just as ZFS has revolutionized end-to-end data integrity within a single system, why can't we have similar protections at the Cloud level? While certainly it would help if Amazon was using ZFS on Amber Road as their storage back-end, even this would be insufficient...

Clearly, more is needed. For example, would it make sense to have an API layer be added that automates the calculation and validation of digital fingerprints? Most people don't think about silent data corruption and honestly they shouldn't have to! Integrity checks like these should be automated just as they are in ZFS and TCP/IP! As we move into 2009, we need to offer easy to use, approachable solutions to these problems, because if the future is in the clouds, it will revolve around the data.

Technorati Tag:

Comments:

Hi, how would zfs help if the data arrived corrupted? Wouldnt amber road just write the corrupt data? What insight would zfs have into the integrity of an S3 put operation?

Posted by Eugene on January 28, 2009 at 10:27 AM EST #

Eugene, you are correct to point out that ZFS would not have helped if the data were to be corrupted in transit over the network. That was the point I was trying to make at the end regarding ZFS and TCP/IP (which have end-to-end integrity mechanisms) and what is needed going forward. Perhaps the API extensions (to do automatic digital fingerprint calculation and verification) could help in some cases. That said, I believe that this is an area where we need more research and innovation. The amount of data being stored is only getting bigger and we need better ways of detecting and correcting (or ideally preventing) these kinds of data corruption issues. Thanks so much for your comment! -- Glenn

Posted by Glenn Brunette on January 28, 2009 at 01:05 PM EST #

[Trackback] Bookmarked your post over at Blog Bookmarker.com!

Posted by silent on January 28, 2009 at 08:00 PM EST #

This post seems pointless, to be honest. These mechanisms that you're talking about (TCP checksums) are the exact mechanisms that are intended to identify data corruption in Amazon S3 transmission. How often when doing HTTP operations such as reading a website or posting a blog message do you encounter data corruption?

Why then would S3 be any different? If the implication is that the data is corrupted during the write process, there's a lot more to worry about than a few checksums. Perhaps the data was written to several nodes simultaneously with different payloads to the same S3 key? That I could accept would result in corruption as the different payloads converge in the cloud.

It is and always has been the job of network layers, not application layers, to detect and correct data corruption. Additional cpu eating application layer checksums are redundant and identify a larger problem. Perhaps the issue is with the accuracy of checksums that we apply to Internet traffic?

Surely zfs is superfluous to this conversation. This is effectively another layer up and to presume that it is even nearly relevant in the scale of Amazon S3 is presumptuous.

Posted by Nate on January 28, 2009 at 09:28 PM EST #

Nate, thanks for your comment! The point I was trying to make, and perhaps not as well as I could have, is that we need to make it simpler and easier for people to use and (perhaps more importantly) trust the Cloud. To better serve people of varying levels of skill, we should look to ways to make it easier for people by automating best practices and factoring out complexity where we can so that end-users do not have to cope with these kinds of situations on their own. This is about identifying the use cases where things are more likely to go wrong and finding new or enhanced methods to reduce the likelihood of problems.
I am not specifically concerned with TCP/IP, ZFS or any other technology. These examples were used to show cases of end-to-end data integrity (without their specific realms) and to provoke people to think about what else we could do to prevent these kinds of issues going forward. -- Glenn

Posted by Glenn Brunette on January 29, 2009 at 01:24 AM EST #

Hi Glenn,

I still do not understand your point completely after reading the comments, I am afraid.

Why is data integrity a cloud specific problem, which should be solved by a new API on the application layer?

The demand for data integrity is much more higher in the market segment of enterprise applications - and is solved by the products of Sun microsystems?

Isn't this problem Amazon is having obviously an operations problem - i.e. they are using the wrong technologies or using the right technologies wrongly?

The data corruption could not have happened during transmission because of TCP/IP. If the corruption did not happen on the user's box, than it must have been at one of Amazons boxes. This could have been prevented reliably by using a ZFS 3-way RAID-1 mirror with regular ZFS scrubbing and the usage of ECC memory. Maybe the risk of data corruption is still significant at the cpus and memory controller, etc. Then it would have been preventable by using M-Series Sparc boxes.

So the problem seems to be merely sloppy system implementation or not having enough money from S3 bills to afford the right technology? Or am I missing other possibilities?

Posted by Michael Meier on January 29, 2009 at 04:54 AM EST #

I am sorry for my premature comment: I did not read the article to which you linked. So it happened because of a faulty load-balancer. And the reassembling of multiple tcp packets is not protected by the tcp checksumming algorithm: http://evanjones.ca/tcp-checksums.html

This is crazy. So there does not seem to be a transparent technology for reliably transmitting data via TCP/IP?!?!?
Since this sounds like a problem of the TCP/IP-Protocol specification: is there any standard mechanism by Java SE or EE that could protect transparently against these kind of errors?

I am writing a distributed software using Java SE+EE. So now I have to serialize manually all Objects and checksum the serialization, as I do not know, whether there is a super crazy load balancer in between, which assembles TCP-packets wrongly and in a way that this is not detected by the end points of the communication???

The horror.

Posted by Michael Meier on January 29, 2009 at 05:14 AM EST #

In SAN, there is a known idea (DIF) to use an API for an integrity-capable block I/O layer for OS, basically the API adds integrity metadata for each 512 bytes sector as it is written to system memory, then this very metadata is added to the IO request and eventually gets passed to the HBA driver. The HBA DMA the data to the board and verifies the data integrity, merges the data and intergrity metadata and send out 520 bytes sectors. On the other end, the storage subsystem's tgt front-end and initiator back-end do the similar things.

Posted by johnhl on January 29, 2009 at 07:02 AM EST #

Post a Comment:
Comments are closed for this entry.
About

This area of cyberspace is dedicated the goal of raising cybersecurity awareness. This blog will discuss cybersecurity risks, trends, news and best practices with a focus on improving mission assurance.

Search

Archives
« July 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Today