Peer-to-peer systems: Exposed!

Some kind souls (Ted Leung and Bryan Cantrill) pointed out that my last name was nowhere present on this page; I have updated the description to be less witty and more informative. Of course, my full name is actually Valerie Aurora Henson, and if I used it in all its octosyllabic glory, that might clear up a common point of confusion about which pronoun to use when referring to me.

Today's snazzy systems paper is High Availability, Scalable Storage, Dynamic Peer Networks: Pick Two, another 6-pager from HotOS IX. Surely you are familiar with all those peer-to-peer systems papers motivated by greed-tinged descriptions of gigabytes of unused disk space just waiting to be used on people's personal computers? Charles Blake was skeptical of a world where unused storage falls like manna from the sky, so he created a few models and looked at some real-world data. It turns out that with realistic node membership times, member node bandwidths, and number of data copies necessary to provide reasonable availability, the amount of data that can be stored in a peer-to-peer network is severely limited by cross-network bandwidth, rather than available disk space. Blake puts it all so much more politely in his paper, with sentences like "We conclude that when redundancy, data scale, and dynamics are all high, the requisite cross-system bandwidth is beyond reasonable expectations."

This makes sense intuitively: How did most Gnutella users behave? They connected to the network long enough to download the songs they wanted, over their 56Kbps modem or 256Kbps cable modem, then left the network as soon as they were done. Most files were actually served by a relatively small number of reliable, high-bandwidth hosts; over a 3-day trace, the most reliable 5% of Gnutella peers provided 40% of the service. More pointedly, the authors estimate that the best 5% of Gnutella peers could be replaced by 10 university-run servers on cheap PC hardware - at which point the question becomes, "Why use many unreliable peers when a few reliable servers can serve as much data?" The answer is "When anonymity, security, or freedom from central control is important" - but never for performance or storage capacity reasons. In my opinion, this paper is a must-read for anyone working on peer-to-peer systems - but don't blame me if you switch fields shortly thereafter...

Comments:

Post a Comment:
Comments are closed for this entry.
About

val

Search

Categories
Archives
« July 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Today