By Ksplice Post Importer on Apr 19, 2010
How much information is in a tweet?
A month ago, we asked that question, in the First International Longest Tweet Contest. The most longwinded tweeters on earth answered the call. The result: about 4,370 bits.
Congrats to the entrants, and best of luck honing your entries for next year! Ksplice is pleased to award our acclaimed, limited-edition T-shirts to the top three contenders.
In no particular order, they are:
Mr. Lehman's entry, originally posted as a blog comment, was mathematically elegant. The scheme is simple: use the fact that Twitter allows each "character" to be a 31-bit UCS-4 code position -- there are 231, or 2,147,483,648 of them. Make use of all the bits you can without straying into legitimate Unicode (i.e. the first 1,114,112 code positions), since Twitter represents real Unicode differently and could cause ambiguities. That leaves 2,146,369,536 legal values for each "character."
Mr. Lehman's scheme represents the input as a 4339-bit integer, and simply repeatedly divides the number by 2,146,369,536. The remainder of each division, he encodes into each character. Nice! His entry includes a partial implementation in Java.
The guys from dc949.org previously made a whole Twitter filesystem, outlined in this video. The took advantage of the UCS-4 trick to update their program. They thew in a bunch of clever tricks of their own: storing data in the latitude and longitude, and 7 bits in the place_id field. The cherry on top: representing one bit in whether the tweet is "favorited"! All told, by dc949's count, this gets you to 4,368 bits.
The downside: in our testing, their implementation didn't quite work. Fed with random bits, we often got truncated or incorrect results (even with an updated version dc949 supplied). The bugs can probably be fixed, but until that happens we can't quite confirm that all these fields store as many bits as dc949 believes.
We liked Ben's entry too much not to give him a T-shirt:
Solution - Just tweet the following picture of a swimming fish: ".·´¯`·.´¯`·.¸¸.·´¯`·.¸><(((º>" Given that 1 word is 16 bits, and a picture is equal to 1,000 words, that makes my above tweet 16,000 bits of information (fitting several pictures in a tweet may extend this further) :-)
Thanks again to all who participated! Until next time...