The wireless traffic of MIT students
By Ksplice Post Importer on May 19, 2010
Wireless traffic is both interesting and delightfully accessible thanks to the broadcast nature of 802.11. I have spent many a lazy weekend afternoon watching my laptop, the Tivo, and the router chatting away in a Wireshark window.
As fun as the wireless traffic in one's house may be, there's something to be said for being able to observe a much larger ecosystem - one with more people with a more diverse set of operating systems, browsers, and intentions as they work on their wireless-enabled devices, giving rise to more interesting background and active traffic patterns and a greater set of protocols in play.
Now, it happens to be the case that sniffing other people's wireless traffic breaks a number of federal and local laws, including wiretapping laws, and while I am interested in observing other people's wireless traffic, I'm not interested in breaking the law. Fortunately, Ksplice is down the road from a wonderful school that fosters this kind of intellectual curiosity.
I met with MIT's Information Services and Technology Security Team, and we came up with a plan for me to gather data that would satisfy MIT's Athena Rules of Use. I got permission from Robert Morris and Sam Madden to monitor the wireless traffic during their Computer Systems Engineering class and made an announcement at the beginning of a class period explaining what I'd be doing. Somewhat ironically, that day's lecture was an introduction to computer security.
Some interesting results from the data set collected are summarized below. Traffic was gathered with tcpdump on my laptop as I sat in the middle of the classroom. The data was imported into Wireshark, spit back out as a CSV, and imported into a sqlite database for aggregation queries, read back into tcpdump and filtered there, or hacked up with judicious use of grep, as different needs arose.
|Protocol||# Packets||% Packets|
Time spent capturing: 45 minutes
Packets captured: 853436
Number of traffic sources in the room: 21
Number of distinct source and destination IPv4 and IPv6 addresses: 5117
Number of "active" traffic addresses (eg using HTTP, SSH, not background traffic): 581
Number of protocols represented: 48 (note that Wireshark buckets based on the top layer for a packet, so for example TCP is in this count because someone was sending actual TCP traffic without an application layer on top and not because TCP is the transport protocol for HTTP, which is also in this count). These protocols and how much traffic was sent over them are in the table on the left.
IPv4 v. IPv6
Number of IPv4 packets: 580547
Number of IPv6 packets: 270351
2.15 IPv4 packets were sent for every IPv6 packet. IPv6 was only used for background traffic, serving as the internet layer for the following protocols: DHCPv6, DNS, ICMPv6, LLMNR, MDNS, SSDP, TCP, and UDP. The TCP over IPv6 packets were all
wsdapi communications between our Windows user discussed below and his or her remote desktop. The UDP over IPv6 packets were all
ws-discovery communications, part of a local multicast discovery protocol most likely being used by the Windows machines in the room.
ICMP v. ICMPv6
Number of ICMP packets: 60
Number of ICMPv6 packets: 116167
1936.12 ICMPv6 packets were sent for every ICMP packet. The reason is that ICMPv6 is doing IPv6 work that is taken care of by other link layer protocols in IPv4. Looking at the ICMP and ICMPv6 packets by type:
|Dest unreachable (Host administratively prohibited)||1|
|Dest unreachable (Port unreachable)||35|
|Echo (ping) request||9|
|Time-to-live exceeded in transit||15|
|Multicast Listener Report msg v2||7236|
|Multicast listener done||86|
|Multicast listener report||548|
|Time exceeded (In-transit)||1|
|Unknown (0x40) (Unknown (0x87))||1|
|Unreachable (Port unreachable)||1|
These ICMPv6 packets are mostly doing Neighbor Discover Protocol (NDP) and Multicast Listener Discovery (MLD) work. NDP handles router and neighbor solicitation and is similar to ARP and ICMP under IPv4, and MLD handles multicast listener discovery similar to IGMP under IPv4.
TCP v. UDP
Number of TCP packets: 383122
Number of UDP packets: 350067
1.09 TCP packets were sent for every UDP packet. I would have thought TCP would be a clear winner, but given that MDNS traffic, which is over UDP, makes up over 30% of the packets captured, I guess this isn't surprising. The 14% of packets unaccounted for at the transport layer are mostly ARP and ICMP traffic. See also this post.
Awesomely, AIM, Jabber, MSN Messenger, and Yahoo! Messenger were all represented in the traffic:
AIM is the clear favorite (at least with this small sample size). Note that Jabber has about 1/4th the AIM participants but 4x the number of packets. Either the Jabberers are extra chatty, or the fact that Jabber is an XML-based protocol inflates the size of a conversation dramatically on the wire. Note that some IM traffic (like Google Chat) might have instead been bucketed as HTTP/XML by Wireshark.
That Windows Remote Desktop Person
119489 packets, or 14% of the traffic, were between a computer in the classroom and what is with high probability a Windows machine on campus running the Microsoft Remote Desktop Protocol (see also this support page for a discussion of the protocol specifics).RDP traffic from client to remote desktop: T.125, TCP, TPKT, X.224
RDP traffic from remote desktop to client: TCP, TPKT
Most of the traffic is T.125 payload packets. TPKTs encapulate transport protocol data units (TPDUs) in the ISO Transport Service on top of TCP. TPKT traffic was all "Continuation" traffic. X.224 transmits status and error codes. TCP "ms-wbt-server" traffic to port 3389 on the remote machine seals the deal on this being an RDP setup.
|Protocol||# Packets||% Packets|
5 clients were involved with negotiations with SSLv2, which is insecure and disabled by default on most browsers and never got past a "Client Hello".
I wanted to be able to answer questions like "what were the top 20 most visited website" in this traffic capture. The proliferation of content distribution networks makes it harder to track all traffic associated with a popular website by IP addresses or hostnames. I ended up doing a crude but quick grep "Host:" pcap.plaintext | sort | uniq -c | sort -n -r on the expanded plaintext version of the data exported from Wireshark, which gives the most visited hosts based on the number of GET requests. The top 20 most visited hosts by that method were:
Alas, pretty boring. The
eyewonder hosts are all for Engadget, and
fbcdn is part of Facebook. I'll admit that I'd been a little hopeful that some enterprising student would try to screw up my data by scripting visits to a bunch of porn sites or something. CDNs dominate the top 20, and in fact almost all of the roughly 410 web server IP addresses gathered are for CDNs. Akamai led with 39 distinct IPs, followed by Amazon AWS with 23, and Facebook and Panther CDN with 16, with many more at lower numbers.
Using the Internet means a lot more than HTTP traffic! 45 minutes of traffic capture gave us 48 protocols to explore. Most of the captured traffic was background traffic, and in particular discovery traffic local to the subnet.
Sniffing wireless traffic (legally) is a great way to learn about networking protocols and the way the Internet works in practice. It can also be incredibly useful for debugging networking problems.
What are some of your fun or awful networking experiences?