Monday Jan 17, 2011

Coffee shop Internet access

How does coffee shop Internet access work?

wireless coffee

You pull out your laptop and type http://www.google.com into the URL bar on your browser. Instead of your friendly search box, you get a page where you pay money or maybe watch an advertisement, agree to some terms of service, and are only then free to browse the web.

What is going on behind the scenes to give the coffee shop that kind of control over your packets? Let's trace an example of that process from first broadcast to last redirect and find out.

Step 1: Get our network configuration

When I first sit down and turn on my laptop, it needs to get some network information and join a wireless network.

My laptop is configured to use DHCP to request network configuration information and an IP address from a DHCP server in its Layer 2 broadcast domain.

This laptop happens to use the DCHP client dhclient. /etc/dhcp3/dhclient.conf is a sample dhclient configuration file describing among other things what the client will request from a DHCP server (your network manager might frob that configuration -- on my Ubuntu laptop, NetworkManager keeps a modified config at /var/run/nm-dhclient-wlan0.conf).

A DHCP negotiation happens in 4 parts:

DHCP

Step 1: DHCP discovery. The DHCP client (us, 0.0.0.0 in the screencap) sends a message to Ethernet broadcast address ff:ff:ff:ff:ff:ff to discover DHCP servers (Wireshark shows IP addresses in the summary view, so we see broadcast IP address 255.255.255.255). The packet includes a parameter request list with the parameters in the dhclient config file. The parameters in my /var/run/nm-dhclient-wlan0.conf are:

subnet-mask, broadcast-address, time-offset, routers,
domain-name, domain-name-servers, domain-search, host-name,
netbios-name-servers, netbios-scope, interface-mtu,
rfc3442-classless-static-routes, ntp-servers;
Step 2: DHCP offer. DHCP servers that get the discovery broadcast allocate an IP address and respond with a DHCP broadcast containing that IP address and other lease information. This is typically a simple race -- whoever gets an offer packet to the requester first wins. In our case, only MAC address 00:90:fb:17:ca:4e (Wireshark shows IP address 192.168.5.1) answers our discovery broadcast.

Step 3: DHCP request. The DHCP client picks an offer and sends another DHCP broadcast, informing the DHCP servers of the winner and letting the losers de-allocate their reserved IP addresses.

Step 4: DHCP acknowledgment. The winning DHCP server acknowledges completion of the DHCP exchange and reiterates the DHCP lease parameters. We now have an IP address (192.168.5.87) and know the IP address of our gateway router (192.168.5.1):

DHCP lease

Step 2: Find our gateway

We managed to get a lot done using broadcast packets, but at this point a) nobody in our broadcast domain knows our MAC address, and b) we don't know the MAC address of our gateway, so we can't get any packets routed out to the Internet. Let's fix that:

ARP

Before offering us IP address 192.168.5.87, the DHCP server (Portwell_17:ca:4) sends an ARP request for that address, saying "Who has 192.168.5.87. If that's you, respond with your MAC address". Since nobody answers, the server can be fairly confident that the IP address is not already in use.

After getting assigned IP address 192.168.5.87, we (Apple_8f:95:3f) double-check that nobody else is using it with a few ARP requests that nobody answers. We then send a few gratuitous ARPs to let everyone know that it's ours and they should update their ARP caches.

We then make an ARP request for the MAC address corresponding to the IP address for our gateway router: 192.168.5.1. Our DHCP server happens to also be our gateway router and responds claiming the address.

Step 3: Get past the terms of service

Now that we have an IP address and know the IP address of our gateway, we should be able to send packets through our gateway and out to the Internet.

I type http://www.google.com into my browser's URL bar. There is no period at the end, so the local DNS resolver can't tell if this is a fully-qualified domain name. This is what happens:

DNS resolution

Looking back at the DHCP acknowledgement, as part of the lease we were given a domain name: sbx07338.cambrma.wayport.net. What our local DNS resolver decides to do with host www.google.com, since it potentially isn't fully-qualified, is append the domain name from the DHCP lease to it (eg www.google.com.sbx07338.cambrma.wayport.net in the first iteration) and try to resolve that. When the resolution fails, it tries appending decreasingly specific parts of the DHCP domain name, finds that all of them fail, and then gives up and tries to resolve plain old www.google.com. This works, and we get back IP address 173.194.35.104. A whois after the fact confirms that this is Google:

jesstess@pretzel-logic:~$ whois 173.194.35.104
NetRange:       173.194.0.0 - 173.194.255.255
CIDR:           173.194.0.0/16
OriginAS:       AS15169
NetName:        GOOGLE
...
OrgName:        Google Inc.
OrgId:          GOGL
Address:        1600 Amphitheatre Parkway
City:           Mountain View
StateProv:      CA
We complete a TCP handshake with ``173.194.35.104'' and make an HTTP GET request for the resource we want (/). Instead of getting back an HTTP 200 and the Google home page, we receive a 302 redirect to http://nmd.sbx07338.cambrma.wayport.net/index.adp? MacAddr=00%3a23%3a6C%3a8F%3a95%3a3F&IpAddr=192%2e168%2e5%2e87& vsgpId=a45946c6%2d737a%2d11dd%2d8436%2d0090fb2004bc&vsgId=93196& UserAgent=&ProxyHost=:

TCP handshake + HTTP

Our MAC address and IP address are conveniently encoded in the redirect URL.

So what is going on here? Why didn't we get back the Google home page?

Our DHCP server/router, 192.168.5.1, is capturing our HTTP traffic and redirecting it to a special landing page. We don't get to make it out to Google until we finish playing a game with the coffee shop.


Let's dwell on this for a moment, because it's kind of amazing that the way the Internet is designed, our gateway router can hijack our HTTP requests and we can't stop it. In this case, we can see that the URL has changed in our browser after the redirect, but if a malicious gateway were transparently proxying our HTTP requests to an evil malware-laden clone of www.google.com, we'd have no way to notice because there wouldn't be a redirect and the URL wouldn't change.

Worrisome? Definitely, if you're trusting a gateway with sensitive information. If you don't want to have to trust your gateway, you have to use point-to-point encryption: HTTPS, SSH, your favorite IPSec or SSL VPN, etc. And then hope there aren't bugs in your secure protocol's implementation.


Well, ain't nothing to it but to do a DNS lookup on the host name in the redirect (nmd.sbx07338.cambrma.wayport.ne) and make our request there:

DNS 98.98.48.198

nmd is a host in the domain from our DHCP lease, so our local resolver's rules manage to resolve it in one try, and we get IP address 98.98.48.198. This is incidentally the IP address of the DHCP Server Identifier we received with our DHCP lease.

We try our HTTP GET request again there and get back an HTTP 200 and a landing page (still not the Google home page), which the browser renders.

The landing page has some ads and terms of service, and a button to click that we're told will grant us Internet access. That click generates an HTTP POST:

get to starbucks.yahoo.com

Step 4: Get to Google

Having agreed to the terms of service, 98.98.48.198 communicates to our gateway router that our MAC address (which was passed in the redirect URL) should be added to a whitelist, and our traffic shouldn't be captured and redirected anymore -- our HTTP packets should be allowed out onto the Internet.

98.98.48.198 responds to our POST with a final HTTP 302 redirect, this time to starbucks.yahoo.com. We do a final DNS lookup, make our HTTP GET, and get served an HTTP 200 and a webpage with some ads enticing us to level up our coffee addiction. We now have real Internet access and can get to http://www.google.com.

And that's the story! This ability to hijack HTTP traffic at a gateway until terms are met has over the years facilitated a huge industry based around private WiFi networks at coffee shops, airports, and hotels.

It is also a reminder about just how much control your gateway, or a device pretending to be your gateway, has when you use insecure protocols. Upside-down-ternet is a playful example of exploiting the trust in your gateway, but bogus DNS resolution or transparently proxying requests to malicious sites makes use of this same principle.

 ~jesstess

Wednesday May 19, 2010

The wireless traffic of MIT students

Wireless traffic is both interesting and delightfully accessible thanks to the broadcast nature of 802.11. I have spent many a lazy weekend afternoon watching my laptop, the Tivo, and the router chatting away in a Wireshark window.

As fun as the wireless traffic in one's house may be, there's something to be said for being able to observe a much larger ecosystem - one with more people with a more diverse set of operating systems, browsers, and intentions as they work on their wireless-enabled devices, giving rise to more interesting background and active traffic patterns and a greater set of protocols in play.

Now, it happens to be the case that sniffing other people's wireless traffic breaks a number of federal and local laws, including wiretapping laws, and while I am interested in observing other people's wireless traffic, I'm not interested in breaking the law. Fortunately, Ksplice is down the road from a wonderful school that fosters this kind of intellectual curiosity.

I met with MIT's Information Services and Technology Security Team, and we came up with a plan for me to gather data that would satisfy MIT's Athena Rules of Use. I got permission from Robert Morris and Sam Madden to monitor the wireless traffic during their Computer Systems Engineering class and made an announcement at the beginning of a class period explaining what I'd be doing. Somewhat ironically, that day's lecture was an introduction to computer security.

Some interesting results from the data set collected are summarized below. Traffic was gathered with tcpdump on my laptop as I sat in the middle of the classroom. The data was imported into Wireshark, spit back out as a CSV, and imported into a sqlite database for aggregation queries, read back into tcpdump and filtered there, or hacked up with judicious use of grep, as different needs arose.

Protocol # Packets % Packets
MDNS 259932 30.46
TCP 245268 28.74
ICMPv6 116167 13.61
TPKT 78311 9.18
SSDP 31441 3.68
HTTP 28027 3.28
UDP 17006 1.99
LLMNR 16991 1.99
TLSv1 14390 1.69
DHCPv6 11572 1.36
DNS 10870 1.27
SSH 8804 1.03
SSLv3 3094 0.36
Jabber 2507 0.29
ARP 2003 0.23
SSHv2 1503 0.18
IGMP 1309 0.15
SNMP 1232 0.14
NBNS 619 0.073
WLCCP 513 0.060
AIM Buddylist 265 0.031
NTP 245 0.029
HTTP/XML 218 0.026
MSNMS 192 0.022
IP 139 0.016
AIM 121 0.014
SSL 105 0.012
IPv6 90 0.011
AIM Generic 84 0.0098
DHCP 64 0.0075
ICMP 60 0.0070
X.224 59 0.0069
AIM SST 45 0.0053
AIM Messaging 39 0.0046
BROWSER 37 0.0043
YMSG 35 0.0041
AARP 18 0.0021
OCSP 16 0.0019
AIM SSI 11 0.0013
SSLv2 8 0.00094
T.125 5 0.00059
AIM Signon 4 0.00047
NBP 4 0.00047
AIM BOS 3 0.00035
AIM ChatNav 3 0.00035
AIM Stats 3 0.00035
AIM Location 2 0.00023
ZIP 2 0.00023

Basic statistics
Time spent capturing: 45 minutes
Packets captured: 853436
Number of traffic sources in the room: 21
Number of distinct source and destination IPv4 and IPv6 addresses: 5117
Number of "active" traffic addresses (eg using HTTP, SSH, not background traffic): 581
Number of protocols represented: 48 (note that Wireshark buckets based on the top layer for a packet, so for example TCP is in this count because someone was sending actual TCP traffic without an application layer on top and not because TCP is the transport protocol for HTTP, which is also in this count). These protocols and how much traffic was sent over them are in the table on the left.

IPv4 v. IPv6
Number of IPv4 packets: 580547
Number of IPv6 packets: 270351

2.15 IPv4 packets were sent for every IPv6 packet. IPv6 was only used for background traffic, serving as the internet layer for the following protocols: DHCPv6, DNS, ICMPv6, LLMNR, MDNS, SSDP, TCP, and UDP. The TCP over IPv6 packets were all icslap, ldap, or wsdapi communications between our Windows user discussed below and his or her remote desktop. The UDP over IPv6 packets were all ws-discovery communications, part of a local multicast discovery protocol most likely being used by the Windows machines in the room.

ICMP v. ICMPv6
Number of ICMP packets: 60
Number of ICMPv6 packets: 116167

1936.12 ICMPv6 packets were sent for every ICMP packet. The reason is that ICMPv6 is doing IPv6 work that is taken care of by other link layer protocols in IPv4. Looking at the ICMP and ICMPv6 packets by type:

ICMP Type/Code Pkts
Dest unreachable (Host administratively prohibited) 1
Dest unreachable (Port unreachable) 35
Echo (ping) request 9
Time-to-live exceeded in transit 15

ICMPv6 Type/Code Pkts
Echo request 8
Multicast Listener Report msg v2 7236
Multicast listener done 86
Multicast listener report 548
Neighbor advertisement 806
Neighbor solicitation 105710
Redirect 353
Router advertisement 461
Router solicitation 956
Time exceeded (In-transit) 1
Unknown (0x40) (Unknown (0x87)) 1
Unreachable (Port unreachable) 1

These ICMPv6 packets are mostly doing Neighbor Discover Protocol (NDP) and Multicast Listener Discovery (MLD) work. NDP handles router and neighbor solicitation and is similar to ARP and ICMP under IPv4, and MLD handles multicast listener discovery similar to IGMP under IPv4.

TCP v. UDP
Number of TCP packets: 383122
Number of UDP packets: 350067

1.09 TCP packets were sent for every UDP packet. I would have thought TCP would be a clear winner, but given that MDNS traffic, which is over UDP, makes up over 30% of the packets captured, I guess this isn't surprising. The 14% of packets unaccounted for at the transport layer are mostly ARP and ICMP traffic. See also this post.

Instant Messenging

Awesomely, AIM, Jabber, MSN Messenger, and Yahoo! Messenger were all represented in the traffic:


Participants # Packets
AIM 22 580
Jabber 6 2507
YMSG 4 35
MSNMS 3 192

AIM is the clear favorite (at least with this small sample size). Note that Jabber has about 1/4th the AIM participants but 4x the number of packets. Either the Jabberers are extra chatty, or the fact that Jabber is an XML-based protocol inflates the size of a conversation dramatically on the wire. Note that some IM traffic (like Google Chat) might have instead been bucketed as HTTP/XML by Wireshark.

That Windows Remote Desktop Person

119489 packets, or 14% of the traffic, were between a computer in the classroom and what is with high probability a Windows machine on campus running the Microsoft Remote Desktop Protocol (see also this support page for a discussion of the protocol specifics).

RDP traffic from client to remote desktop: T.125, TCP, TPKT, X.224
RDP traffic from remote desktop to client: TCP, TPKT

Most of the traffic is T.125 payload packets. TPKTs encapulate transport protocol data units (TPDUs) in the ISO Transport Service on top of TCP. TPKT traffic was all "Continuation" traffic. X.224 transmits status and error codes. TCP "ms-wbt-server" traffic to port 3389 on the remote machine seals the deal on this being an RDP setup.

Security

All SSH and SSHv2 traffic was to Linerva, a public-access Linux server run by SIPB for the MIT community, except for one person talking to a personal server on campus.

Protocol # Packets % Packets
TLSv1 14390 81.8
SSLv3 3094 17.6
SSL 105 .59
SSLv2 8 .045

5 clients were involved with negotiations with SSLv2, which is insecure and disabled by default on most browsers and never got past a "Client Hello".

HTTP

I wanted to be able to answer questions like "what were the top 20 most visited website" in this traffic capture. The proliferation of content distribution networks makes it harder to track all traffic associated with a popular website by IP addresses or hostnames. I ended up doing a crude but quick grep "Host:" pcap.plaintext | sort | uniq -c | sort -n -r on the expanded plaintext version of the data exported from Wireshark, which gives the most visited hosts based on the number of GET requests. The top 20 most visited hosts by that method were:

Rank GETs Host Rank GETs Host
1 336 www.blogcdn.com 11 88 newsimg.bbc.co.uk
2 229 www.blogsmithmedia.com 12 87 www.google.com
3 211 assets.speedtv.com 13 66 sih.onemadogre.com
4 167 profile.ak.fbcdn.net 14 66 ensidia.com
5 149 images.hardwareinfo.net 15 64 student.mit.edu
6 114 static.ensidia.com 16 61 s0.2mdn.net
7 113 www.facebook.com 17 57 alum.mit.edu
8 111 www.blogsmithcdn.com 18 56 www.wired.com
9 93 ad.doubleclick.net 19 56 cdn.eyewonder.com
10 90 static.mmo-champion.com 20 55 www.google-analytics.com

Alas, pretty boring. The blogcdn, blogsmith and eyewonder hosts are all for Engadget, and fbcdn is part of Facebook. I'll admit that I'd been a little hopeful that some enterprising student would try to screw up my data by scripting visits to a bunch of porn sites or something. CDNs dominate the top 20, and in fact almost all of the roughly 410 web server IP addresses gathered are for CDNs. Akamai led with 39 distinct IPs, followed by Amazon AWS with 23, and Facebook and Panther CDN with 16, with many more at lower numbers.

Wrap-up

Using the Internet means a lot more than HTTP traffic! 45 minutes of traffic capture gave us 48 protocols to explore. Most of the captured traffic was background traffic, and in particular discovery traffic local to the subnet.

Sniffing wireless traffic (legally) is a great way to learn about networking protocols and the way the Internet works in practice. It can also be incredibly useful for debugging networking problems.

What are some of your fun or awful networking experiences?

~jesstess

About

Tired of rebooting to update systems? So are we -- which is why we invented Ksplice, technology that lets you update the Linux kernel without rebooting. It's currently available as part of Oracle Linux Premier Support, Fedora, and Ubuntu desktop. This blog is our place to ramble about technical topics that we (and hopefully you) think are interesting.

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today