Last week, I wrote a blog post discussing the dangers of BGP routing leaks between peers, illustrating the problem using examples of recent snafus between China Telecom and Russia's Vimpelcom. This follow-up blog post provides three additional examples of misbehaving peers and further demonstrates the impact unmonitored routes can have on Internet performance and security. Without monitoring, you are essentially trusting everyone on the Internet to route your traffic appropriately.
In the first two cases, an ISP globally announced routes from one of its peers, effectively inserting itself into the path of the peer's international communications (i.e., becoming a transit provider rather than remaining a peer) for days on end. The third example looks back at the China Telecom routing leak of April 2010 to see how a US academic backbone network prioritized bogus routes from one of its peers, China Telecom, to (briefly) redirect traffic from many US universities through China.
Recap: How this works
To recap the explanation from the previous blog (and to reuse the neat animations our graphics folks made), we first note that ISPs form settlement-free direct connections (peering) in order to save on the cost of sending traffic through a transit provider. Suppose that ISP A and ISP B establish such a private link between their networks. At the BGP routing level, ISP A will then send routes from its customers to its peer ISP B, who will in turn send these routes on to its customers. As a result, the customers of ISP B will send traffic destined for ISP A through the newly established peering link, saving ISP B from having to pay its transit providers to carry the traffic. This flow of routes and traffic is illustrated below.
The first way this can go wrong is for ISP B to announce the routes received from ISP A out to the global Internet (through its transit providers) or to ISP B's other peers. By doing this, ISP B inserts itself onto the path of incoming traffic to ISP A from outside ISP B's own network, something ISP A certainly didn't expect when it took on ISP B as a peer.
ISP B can also mess up by sending routes learned either from its transit providers or peers to ISP A. If these routes are accepted by ISP A (and they typically will be), such errors put ISP B onto the path of outgoing traffic from ISP A to the networks erroneously announced along this peering link.
These two scenarios can happen independently. As shown in our last blog, China Telecom leaked routes to and from Vimpelcom numerous times throughout the year. Most of these incidents involved China Telecom leaking routes it learned from Vimpelcom out to the global Internet (scenario 1); however, on a few occasions, China Telecom also passed a full or partial routing table to Vimpelcom (scenario 2), altering how traffic flowed out of Vimpelcom.
Additional recent examples of peering leaks
Yandex is essentially the Russian version of Google. It is the dominant Russian-language search engine and, like Google, Yandex has established a lot of peering links — although, for obvious reasons, with a greater emphasis on the Russian-speaking world. Beltelecom is the incumbent telecom of Belarus and has become a recurring character in this blog (either for globally routing RFC6598 address space or MITM hijacks). Beltelecom and Yandex have a peering relationship, as it makes a lot of sense for eyeball networks (Beltelecom) and content providers (Yandex) to try to save on transit costs by interconnecting. However, for twelve days this year, Beltelecom announced routes it learned from Yandex to its transit provider Telecom Italia (i.e., leak scenario 1 from above).
The AS paths of the impacted routes took the following form:
… 6762 6697 13238 …
This AS path shows that routes from Yandex (AS13238) shared with its peer Beltelecom (AS6697) who leaked them to Telecom Italia (AS6762), a global Tier 1 provider. Normally no provider outside of Belarus would use Beltelecom to reach Yandex.
The result was that traffic destined for Yandex from customers around the world in Telecom Italia's downstream cone was misdirected first to Beltelecom. For Yandex's networks in Russia (which shares a border with Belarus), the impact on latency was minor. However, Yandex has networks outside of Russia (including some in the Netherlands and the United States) and, for those networks, the latency and paths were dramatically altered. For those receiving Yandex routes via Telecom Italia, Beltelecom inserted itself into Yandex-destined traffic from 22 May through 3 June of this year.
Consider the following example traceroute from Brazil to Yandex's presence in Palo Alto, California before Beltelecom started leaking Yandex routes. The trace illustrates a typical traffic path, namely from Brazil to Miami and then on to New York and finally California.
trace from João Pessoa, Brazil to Yandex-Palo Alto at 05:46 May 20, 2014
2 184.108.40.206 (HostDime.com.br Data Center, João Pessoa, Brazil) 0.259ms
3 220.127.116.11 (SITECNET INFORMÁTICA LTDA, João Pessoa, Brazil) 8.122ms
4 18.104.22.168 22.214.171.124.static.impsat.net.br 54.263ms
5 126.96.36.199 po3-20G.ar2.MIA2.gblx.net (Miami, US) 165.525ms
6 188.8.131.52 xe-0-3-0.mia10.ip4.tinet.net 119.060ms
7 184.108.40.206 (GTT, New York) 179.928ms
8 220.127.116.11 servicenow-gw.ip4.gtt.net (New York, US) 199.109ms
9 18.104.22.168 poker-vlan801.yndx.net (Las Vegas, US) 187.579ms
10 22.214.171.124 (Yandex, Palo Alto, US) 179.933ms
11 126.96.36.199 spider-199-21-99-96.yandex.com (Palo Alto, US) 187.462ms
However, during the leak, traffic from the same server in Brazil to the same Yandex location in California was redirected to Beltelecom in Minsk, Belarus and then on to Yandex in Moscow, after which Yandex took the traffic to California on its internal backbone.
trace from João Pessoa, Brazil to Yandex-Palo Alto at 00:57 May 23, 2014
2 188.8.131.52 (HostDime.com.br Data Center, João Pessoa, Brazil) 0.249ms
3 184.108.40.206 (SITECNET INFORMÁTICA LTDA, João Pessoa, Brazil) 54.175ms
4 220.127.116.11 18.104.22.168.static.impsat.net.br 54.932ms
5 22.214.171.124 ae1-100G.ar4.GRU1.gblx.net (São Paulo, BR) 70.403ms
6 126.96.36.199 telecomitalia2.ar4.GRU1.gblx.net (São Paulo, BR) 54.192ms
7 188.8.131.52 xe-3-3-2.franco71.fra.seabone.net (Frankfurt) 220.925ms
8 184.108.40.206 beltelekom.franco71.fra.seabone.net (Frankfurt) 404.091ms
9 220.127.116.11 ie1.net.belpak.by (Minsk, BY) 252.911ms
10 18.104.22.168 core1.net.belpak.by (Minsk, BY) 254.373ms
11 22.214.171.124 100ge.core.belpak.by (Minsk, BY) 251.801ms
12 126.96.36.199 stat.byfly.by (Minsk, BY) 295.233ms
13 188.8.131.52 ugr-p3-te0-3-0-18.yndx.net (Moscow, RU) 266.851ms
14 184.108.40.206 ugr-p1-be1.yndx.net (Moscow, RU) 266.571ms
15 220.127.116.11 dante-ae3.yndx.net (Moscow, RU) 266.611ms
16 18.104.22.168 panas-xe-0-0-1-984.yndx.net (Moscow, RU) 276.847ms
17 22.214.171.124 (Yandex, Moscow, RU) 309.937ms
18 126.96.36.199 gretchen-xe-1-1-0.yndx.net (Germany) 281.413ms
19 188.8.131.52 ash1-c1-xe-0-0-1-985.yndx.net (Asheville, US) 356.142ms
20 184.108.40.206 whist-vlan801.yndx.net (Las Vegas, US) 356.994ms
21 220.127.116.11 (Yandex, Palo Alto, US) 346.466ms
22 18.104.22.168 spider-199-21-99-96.yandex.com (Palo Alto, US) 356.994ms
Did Yandex finally notice this after nearly two weeks of poor performance due to misdirected traffic? Did Beltelecom ultimately catch the error, after perhaps noticing a surge in traffic along its peering link with Yandex? We may never know the answers to these questions, but we easily quantified the impact to Yandex's Internet performance using our continuous global measurement and monitoring platform.
Our next example of a peering relationship gone bad concerns Rascom's leak of Telma's routes this summer. Telma is the incumbent telecom of the African island-nation of Madagascar, and Rascom is a Russian fixed-line operator with a network that extends throughout Europe. While the previous example involved a very common type of peering relationship, between a content producer (Yandex) and a content consumer (Beltelecom), why on earth would a Russian ISP peer with an ISP from Madagascar? How much traffic could possibly be passing between these two networks? Could that traffic really justify a private connection to help to reduce transit costs? Probably not. The reason for this arrangement is likely due to the fact that both entities happen to be present at London Internet Exchange and decided to peer because … why not?
When a provider from a faraway place like Africa or the Middle East establishes a presence at one of the European IXes, it often will establish peering relationships with anyone and everyone. Once present at an IX, each additional connection carries little marginal cost, so you might as well connect with everybody there in the hopes of reducing your transit costs, if only slightly. But as we'll see in this example, each of your peers has the potential to screw up and alter the flow of your Internet traffic. In other words, every relationship, no matter how seemingly insignificant, carries real risks. There is no free lunch on the Internet.
Consider the following example of a normal traffic path from New York to Telma in Madagascar, the day before the routing leak. Level 3 carries the traffic to London, where Telma picks it up and takes it first to Paris and then Madagascar.
trace from New York to Telma, Madagascar at 08:42 Aug 11, 2014
2 22.214.171.124 vlan725.car3.NewYork1.Level3.net 0.602ms
3 126.96.36.199 vlan70.csw2.NewYork1.Level3.net 69.191ms
4 188.8.131.52 ae-71-71.ebr1.NewYork1.Level3.net 69.274ms
5 184.108.40.206 ae-41-41.ebr2.London1.Level3.net 70.905ms
6 220.127.116.11 ae-56-221.csw2.London1.Level3.net 69.265ms
7 18.104.22.168 ae-25-52.car5.London1.Level3.net 181.282ms
8 22.214.171.124 TELMA.car5.London1.Level3.net 75.363ms
9 126.96.36.199 mx-480-lon-ae0-0-to-divinetwork.dts.mg 87.635ms
11 188.8.131.52 mx-480-par-ge-0-0-9-to-7710src12-th2.tgn.mg 93.228ms
12 184.108.40.206 mx-10-2-tul-so-1-2-3-to-mx-480-par.tgn.mg 280.859ms
13 220.127.116.11 p-galaxy-lag-10-to-mx-10-2-tul.tgn.mg 294.419ms
16 18.104.22.168 ademalinux.adema.mg (Madagascar) 296.248ms
This next trace illustrates the impact of the routing leak on the path and latency between New York and Madagascar. In this case, Tata takes the traffic to London and then Frankfurt before handing it off to Golden Telecom (Vimpelcom). Golden takes the traffic to Moscow and delivers it to Rascom, who takes it straight back to London (at LINX), handing it off to Telma so it can continue its journey to Madagascar. Wow!
trace from New York to Telma, Madagascar at 12:30 Aug 12, 2014
2 22.214.171.124 ix-11-3-5-0.tcore1.NTO-New-York.as6453.net 0.791ms
3 126.96.36.199 (Tata Communications, London) 85.475ms
4 188.8.131.52 if-2-2.tcore1.L78-London.as6453.net 85.733ms
6 184.108.40.206 if-2-2.tcore1.PVU-Paris.as6453.net 85.654ms
7 220.127.116.11 if-3-2.tcore1.FR0-Frankfurt.as6453.net 85.369ms
8 18.104.22.168 if-7-2.tcore1.FNM-Frankfurt.as6453.net 85.666ms
9 22.214.171.124 if-2-2.thar1.F2C-Frankfurt.as6453.net 85.332ms
10 126.96.36.199 (Tata Communications, Frankfurt, DE) 85.792ms
11 188.8.131.52 cat08.Moscow.gldn.net 130.005ms
12 184.108.40.206 HostLine2-gw.Moscow.gldn.net 131.454ms
13 220.127.116.11 (Rascom, Vyborg, RU) 129.323ms
15 18.104.22.168 ams-equ-cr1-to-stk.rascom.as20764.net 128.012ms
16 22.214.171.124 (London Internet Exchange (LINX)) 126.734ms
17 126.96.36.199 mx-480-lon-ae0-0-to-divinetwork.dts.mg 133.410ms
19 188.8.131.52 mx-480-par-ge-0-0-9-to-7710src12-th2.tgn.mg 153.67ms
20 184.108.40.206 mx-10-2-tul-so-1-2-3-to-mx-480-par.tgn.mg 356.605ms
21 220.127.116.11 p-galaxy-lag-10-to-mx-10-2-tul.tgn.mg 348.541ms
24 18.104.22.168 ademalinux.adema.mg (Madagascar) 350.924ms
We wouldn't be surprised if the network engineers at both Rascom and Telma were completely unaware of this circuitous routing. This level of monitoring is often overlooked.
China Telecom—National LambdaRail
Although our final example isn't recent, it is worth mentioning in this discussion. During the big China Telecom routing leak of April 2010 that caused an international stir, it is interesting to note where the bogus routes announced by China Telecom (AS23724) propagated the farthest. Before it ceased operations earlier this year, National LambdaRail (NLR) was a "high-speed national computer network owned and operated by the U.S. research and education community." NLR also had a peering relationship with China Telecom, the state telecom of China. When NLR received the bogus origination announcements from its Chinese peer, it accepted them and routed traffic to China that was intended for numerous other locations around the world.
This is what can be most pernicious about routes received across peering links. Routes from peers are typically prioritized over routes from providers to avoid transit costs. While many, but certainly not all, transit providers filter the routes they receive from their customers in some manner, it is far less common for peers to do any filtering on the routes they exchange, largely due to the difficulty of determining appropriate routing behavior for an independent entity. These prioritized and unfiltered peer routes have the potential to cause the performance and security problems we've outlined here.
In a 2012 paper entitled "A Case Study of the China Telecom Incident", I assisted the authors by searching and analyzing traceroute data from the iPlane project for examples of traceroutes that were sucked into China Telecom during the routing leak. Since much of iPlane's data is generated using the networks of universities in the U.S. and many universities had a connection to NLR, there were many U.S. universities that had traffic redirected through China Telecom. As is standard practice, NLR had prioritized routes from its peers—including China Telecom. U.S. universities may have also prioritized routes from NLR over commercial transit links because NLR might have been a subsidized and therefore cheaper option. This provided a vector for those bogus routes to briefly (the entire incident lasted only 18 minutes) redirect traffic through China.
Here is an example traceroute pulled from iPlane traceroute data that illustrated the impact of the routing leak. Starting in Norman, Oklahoma this trace goes out to to Internet2 and onto NLR's routers on the west coast of the US. There it hands the traffic off to China Telecom before returning it back to the US; it next appears in Cogent's network in Chicago (ord) before making its way over to Boston.
0 22.214.171.124 (University of Oklahoma, Norman, US) 0.384ms
1 192.168.255.50 (RFC 1918) 0.287ms
2 192.168.255.233 (RFC 1918) 158.051ms
3 126.96.36.199 (OneNet, Oklahoma City, US) 0.364ms
4 188.8.131.52 (OneNet, Oklahoma City, US) 0.875ms
5 184.108.40.206 (OneNet, Oklahoma City, US) 3.025ms
6 220.127.116.11 (OneNet, Oklahoma City, US) 3.057ms
7 18.104.22.168 (OneNet, Oklahoma City, US) 40.005ms
8 22.214.171.124 (Oklahoma Regents, Oklahoma City, US) 18.231ms
9 126.96.36.199 ae-3.210.chic0.tr-cps.internet2.edu 18.699ms
10 188.8.131.52 (National LambdaRail, Los Angeles, US) 71.529ms
11 184.108.40.206 (National LambdaRail, Los Angeles, US) 71.614ms
12 220.127.116.11 (National LambdaRail, Los Angeles, US) 71.606ms
14 18.104.22.168 (China Telecom, Guangzhou, CN) 280.357ms
16 22.214.171.124 te0-7-0-6.ccr21.ord03.atlas.cogentco.com 296.328ms
17 126.96.36.199 te0-0-0-30.ccr22.yyz02.atlas.cogentco.com 294.124ms
18 188.8.131.52 be2242.ccr22.jfk05.atlas.cogentco.com 298.383ms
19 184.108.40.206 te0-7-0-5.ccr22.atl01.atlas.cogentco.com 315.434ms
20 220.127.116.11 te0-2-0-3.ccr22.dca01.atlas.cogentco.com 328.007ms
21 18.104.22.168 te0-7-0-35.ccr21.atl01.atlas.cogentco.com 335.061ms
22 22.214.171.124 te0-1-0-4.ccr22.bos01.atlas.cogentco.com 340.852ms
23 126.96.36.199 vl3808.na01.0.bos01.atlas.cogentco.com 339.829ms
24 188.8.131.52 (TA ASSOCIATES, Boston, US) 341.378ms
Next we provide a few examples of AS-level traceroutes (also based on the iPlane data) from U.S. universities impacted by the China Telecom routing leak, presented in a sequence-alignment style. In each sequence, there is a trace that was redirected through China Telecom (AS4134) by way of NLR (AS11164) on 8 April 2010. To illustrate the normal paths at that time, the errant one is sandwiched by AS-level traces seen on the previous and successive days.
AS path for 184.108.40.206 (planetlab2.cs.purdue.edu) to 220.127.116.11 (Advertinet, US)
[04/07/10] 17 ----- ----- 209 12067 (19.18 ms)
[04/08/10] 17 11164 4134 209 12067 (106.47 ms)
[04/09/10] 17 ----- ----- 209 12067 (22.04 ms)
University of California, Santa Cruz
AS path for 18.104.22.168 (planetslug3.cse.ucsc.edu) to 22.214.171.124 (Spartan Stores Inc., US)
[04/07/10] 5739 2152 11164 286 19151 26554 33372 (67.57 ms)
[04/08/10] 5739 2152 11164 4134 3356 26554 33372 (237.99 ms)
[04/09/10] 5739 2152 11164 286 19151 26554 33372 (66.35 ms)
University of Massachusetts
AS path for 126.96.36.199 (planetlab1.cs.umass.edu) to 188.8.131.52 (SCOTTRADE, US)
[04/07/10] 1249 ----- ----- 1239 3561 12221 (33.06 ms)
[04/08/10] 1249 22742 11164 4134 7018 12221 (247.83 ms)
[04/09/10] 1249 ----- ----- 1239 3561 12221 (44.78 ms)
University of Florida
AS path for 184.108.40.206 (planetlab2.acis.ufl.edu) to 220.127.116.11 (Bresnan Communications, LLC., US)
[04/07/10] 6356 ----- ----- 3356 7018 33588 (101.12 ms)
[04/08/10] 6356 11164 4134 174 7018 33588 (280.64 ms)
[04/09/10] 6356 ----- ----- 3356 7018 33588 (101.53 ms)
AS path for 18.104.22.168 (planetlab2.een.orst.edu) to 22.214.171.124 (Secure-Netz, DE)
[04/07/10 00:00:00] 4201 ----- 3701 3356 25074 (222.03 ms)
[04/08/10 00:00:00] 4201 11164 4134 3320 25074 (287.73 ms)
[04/09/10 00:00:00] 4201 ----- 3701 3356 25074 (170.01 ms)
AS path for 126.96.36.199 (planetlab2.eecs.northwestern.edu) to 188.8.131.52 (Copa Airlines, PA)
[04/07/10 00:00:00] 103 22335 3549 11556 26105 28031 (83.89 ms)
[04/08/10 00:00:00] 103 22335 11164 4134 26105 28031 (148.91 ms)
[04/09/10 00:00:00] 103 22335 3549 11556 26105 28031 (84.89 ms)
There were literally thousands more examples like these in the iPlane data.
In this blog post (and the last one), we don't want to suggest we are somehow against peering. Peering is an essential feature of Internet connectivity and will continue to be in the future.
The main takeaway is that if your network is going to prioritize routes from a peer over a transit provider, then your network engineers should also take the time to set up appropriate filtering and monitoring of these links to ensure you don't accept and act on bogus routes. As far as the routes you share with your peers, you need to monitor the paths traffic is taking to reach your network to determine if a peer is leaking your routes. Additionally, if you don't exchange much traffic with another entity, it may not be worth peering with them just because you are both present at the same Internet exchange point. But if you must be promiscuous when peering, please use protection.