At 03:22 UTC on Friday, 25 August 2017, the internet experienced the effects of another massive BGP routing leak. This time it was Google who leaked over 160,000 prefixes to Verizon, who in turn accepted these routes and passed them on. Despite the fact that the leak took place in Chicago, Illinois, it had devastating consequences for the internet in Japan, half a world away. Two of Japan’s major telecoms (KDDI and NTT’s OCN) were severely affected, posting outage notices (KDDI / OCN pictured below).
Massive routing leaks continue
In recent years, large-scale (100K+ prefix) BGP routing leaks typically fall into one of two buckets: the leaker either 1) announces the global routing table as if it is the origin (or source) of all the routes (see Indosat in 2014), or 2) takes the global routing table as learned from providers and/or peers and mistakenly announced it to another provider (see Telekom Malaysia in 2015).
This case is different because the vast majority of the routes involved in this massive routing leak were not in the global routing table at the time but instead were more-specifics of routes that were. This is an important distinction over the previous cases. In the vernacular of the BGP protocol, more-specific routes describe smaller ranges of IP addresses than less-specifics and, within the BGP route selection process, the path defined by the more-specifics are selected over those of less-specifics.
These more-specifics were evidently used for traffic shaping within Google’s network. When announced to the world, they were selected by outside networks over existing routes to direct their traffic, thus having greater impact on traffic redirection than they might have otherwise.
So why was Japan affected so severely?
Of the 160,000 routes leaked, over 25,000 of them were of routed address space belonging to NTT OCN, the most of any network that was impacted. None were from KDDI however. KDDI was impacted because, as a transit customer of Verizon, it accepted over 95,000 leaked prefixes from Verizon. Compounding the problem for Japan, another major Japanese telecom, IIJ, also accepted over 97,000 leaked prefixes from Verizon. As a result, any traffic going from KDDI or IIJ to OCN was being routed to Google’s network in Chicago – much of it likely getting dropped due to either high latency or bandwidth constraints.
Each day we perform hundreds of millions of traceroutes across the internet to measure paths and performance. Whenever a major routing event like this takes place, we can see evidence of its impact by observing the change in these traces. Below is a graphic showing the volume of traceroutes we see entering Google’s network around the time of the leak. The spike in the center of the graph is the sudden increase of traffic entering Google from Verizon. In all, about 10,000 traceroutes got sucked into Google over a very brief period of time en-route to destinations around the world.
Below is a traceroute performed the day before the leak from our server in Equinix Japan to an IP address in OCN’s network in Japan. As expected, it stays within Japan and arrives at its destination in 15ms.
Below is the same traceroute during the leak. IIJ hands to off to Verizon (Alter.net) in San Jose before taking a trip to Chicago to go to Google. Google then takes over routing this traceroute back to Japan over its internal network. Instead of 15ms, the round-trip time is 256ms – a very noticeable difference.
Here’s an example of a traceroute from Shanghai, China to Macau (on the coast of China) that makes the same detour through Chicago during the leak.
Starting from the other side of the world, here’s a traceroute that began at LINX in London but is taken by Verizon (Alter.net) to Chicago and Google before completing its journey to Vodafone in Nürnberg, Germany.
On Saturday it was reported that Google apologized for causing the disruption in internet connectivity in Japan on Friday. Verizon also had a role to play for this leak. On any given day, Google typically sends Verizon fewer than 50 prefixes. An instantaneous jump to over 160,000 prefixes representing over 400 million unique IPv4 addresses should have tripped a MAXPREF setting on a Verizon router and triggered an automated response, at the very least. Thankfully Verizon did not send the leaked routes on to any other major telecoms in the DFZ like Level 3, Telia, or NTT (AS2914, specifically), or the impact could have been much more severe.
We’ve written about routing leaks a number of times, including here and here. Not long ago we wrote up a case where a routing leak by another party managed to render Google unavailable for many. In every case, there is more than one party involved. There is a leaker, of course, but there is also always another network that distributes leaked routes out onto the internet. We have to do better to look out for each other when mistakes inevitably arise. The internet is a team effort.