Storage Grid 2: Cleversafe (!?)
By pmonday on Aug 21, 2006
Really, its all about distributed storage.
BitTorrent eases the burden on computers where information is stored as well as the bandwidth limitations of distributors of information by dividing an uploaded file into pieces and then letting clients grab the "torrents". A torrent is an abstraction that, underneath the covers, can come from many different locations. The result is a very resilient network (since pieces can come from many locations) as well as a network that distributes the burden of feeding data to other computers (CPU and Bandwidth).
Here's what seems like a bit of a tangent, Seti@Home and its underlying grid implementation, BOINC. BOINC gives a framework for distributing CPU intensive jobs to a global computer grid. I'm not sure where the design feature is (BOINC or Seti@Home), but Seti@Home distributes the same block of work to many different computers and compares the results to ensure the integrity of the computations on the remote computers. This ensures that a job is not tampered with when it is out in the wild. With so many free CPU cycles running around, why not distribute a job multiple times if it is for security purposes. Also, this redundancy allows the job to fail on one or more distributed computers without a significant impact on the overall job (since results will come in from another computer).
Cleversafe is very similar to both BitTorrent and BOINC/Seti@Home, yet a bit different. Cleversafe's primary purpose is data security and its secondary purpose seems to be distributed storage and a storage grid. I haven't downloaded the code yet or dug in deep, but the idea is to carve a file up (much like BitTorrent), encrypt the pieces, and feed them out to a grid of, potentially, low cost storage. This chunking of data gives us the benefits of BitTorrent and BOINC but in a secure distributed data scenario (we offload bandwidth requirements and storage requirements to other computers (amortized across them as well) while at the same time getting better resiliency and the ability to have highly parallel bandwidth and transfer rates when grabbing a single, chunked up file back to a host). Security appears to be well-addressed as the encryption occurs at the originator of the file, not at the leaf nodes. With this model it is simple to determine whether the data was tampered with while residing on a storage node. Further, it seems that you could easily "tune" the algorithms to function more like a torrent crossed with Seti@Home to ensure that you always had many nodes to recover your data from and if you were in a particularly untrustworthy mood, you could compare data from the multiple sources to have proof of tampering...though checksums should be good enough for most cases.
This is definitely some cool stuff but, of course, protect your keys! Here is a NY Times article about the project: A Move to Secure Data by Scattering Pieces.
Time to track down more docs on the project and see how open the open source is (the license appears to be GPL, very cool), gotta go!