Storage Grid 2: Cleversafe (!?)

Really, its all about distributed storage.

BitTorrent eases the burden on computers where information is stored as well as the bandwidth limitations of distributors of information by dividing an uploaded file into pieces and then letting clients grab the "torrents". A torrent is an abstraction that, underneath the covers, can come from many different locations. The result is a very resilient network (since pieces can come from many locations) as well as a network that distributes the burden of feeding data to other computers (CPU and Bandwidth).

Here's what seems like a bit of a tangent, Seti@Home and its underlying grid implementation, BOINC. BOINC gives a framework for distributing CPU intensive jobs to a global computer grid. I'm not sure where the design feature is (BOINC or Seti@Home), but Seti@Home distributes the same block of work to many different computers and compares the results to ensure the integrity of the computations on the remote computers. This ensures that a job is not tampered with when it is out in the wild. With so many free CPU cycles running around, why not distribute a job multiple times if it is for security purposes. Also, this redundancy allows the job to fail on one or more distributed computers without a significant impact on the overall job (since results will come in from another computer).

Cleversafe is very similar to both BitTorrent and BOINC/Seti@Home, yet a bit different. Cleversafe's primary purpose is data security and its secondary purpose seems to be distributed storage and a storage grid. I haven't downloaded the code yet or dug in deep, but the idea is to carve a file up (much like BitTorrent), encrypt the pieces, and feed them out to a grid of, potentially, low cost storage. This chunking of data gives us the benefits of BitTorrent and BOINC but in a secure distributed data scenario (we offload bandwidth requirements and storage requirements to other computers (amortized across them as well) while at the same time getting better resiliency and the ability to have highly parallel bandwidth and transfer rates when grabbing a single, chunked up file back to a host). Security appears to be well-addressed as the encryption occurs at the originator of the file, not at the leaf nodes. With this model it is simple to determine whether the data was tampered with while residing on a storage node. Further, it seems that you could easily "tune" the algorithms to function more like a torrent crossed with Seti@Home to ensure that you always had many nodes to recover your data from and if you were in a particularly untrustworthy mood, you could compare data from the multiple sources to have proof of tampering...though checksums should be good enough for most cases.

This is definitely some cool stuff but, of course, protect your keys! Here is a NY Times article about the project: A Move to Secure Data by Scattering Pieces.

Time to track down more docs on the project and see how open the open source is (the license appears to be GPL, very cool), gotta go!


Posted by Paul for Matt England, original text from Matt

Paul et al,

Matt England here, one of the lead devs at Cleversafe. All the software at is completely GPL (version2); we are also considering an LGPL option. We call the technology "dispersed storage."

If you're going to try out the software, eg, build your own dispersed-storage grid, I recommend waiting for the alpha4.2.8 (or higher) release that we hope to put out either formally (on or informally at . You'll probably have an easier time getting a grid setup with these versions of software. Note we are also still in alpha (although the later stages of the alpha phase), so the user and admin experience is not yet top notch, but we expect to be getting there soon.

By the way, hopefully this New York Times link will allow your readers to bypass the for-pay requirement to read the article:

...or they can visit to see a reference to the article there.

Some other notes:
For some details on how the internals work, see:

This page needs some updates, but it's good enough to get a starter understanding of how things work.

We would like to see dispersed storage be a foundational element for other systems and software, much the same way that operating systems are to applications; of course general storage systems take a similar "layered" approach, although we like to think that the Cleversafe dispersed-storage system as touching a few additional areas in the layered "food chain." To this end, we have an API we call DSAPI (Dispersed Dtorage API); one can read more at:

Other ways to integrate our technology into other solutions include:

  • a script-able, cmdline interface (the 'dsgrid' client, runs on Linux, Windows, and other platforms in the future)
  • The dsgfs file system
  • DSAPI (as per above)

DSAPI is the "lowest level." There will be other integration methods in the future. We have some projects/ideas we are working on that we have yet to announce.

Who works on the Cleversafe projects? Check out:

We encourage to seek help, get involved, or ask any questions at (a mirrored set of forums and email lists; note the email lists are somewhat restricted at the moment, but we are about to remove this constraints after we get our spam-management house in order). We can also help clarify some of the technical analysis you mention above.

Best regards,
Matt England

Posted by Matt England on August 30, 2006 at 07:53 AM MDT #

Post a Comment:
Comments are closed for this entry.



« August 2016