Internet Routing: More Complicated Than You Thought
Since the birth of the Internet decades ago there has been a need to route traffic from one computer to another and from one network to another network. As the Internet has grown so has the complexity of this process. Today there are many protocols available to route traffic including the Border Gateway Protocol and the Open Shortest Path First protocol. One protocol which most Internet users do not directly work with is called Border Gateway Protocol, but this protocol is essential to the operation of the Internet and an important part of the redundancy of the Internet. In this article I will use the Border Gateway Protocol to convey some complex issues which most people are not aware of, or at least do not often think about.
Background on BGP
The Internet is made up of thousands of individual networks. These networks are grouped together into neighbourhoods called Autonomous Systems (ASes), of which there are tens of thousands. When a computer on one of these ASes needs to send packets to a computer within another AS then the packet needs to find it's way through various "neighbourhoods" to arrive at the desired AS. This process of packets moving throughout ASes is called routing. The protocol to facilitate this routing is called the Border Gateway Protocol (BGP), see figure 1 below for a graphical representation of routing between logical ASes. This differs from other routing protocols such as Open Shortest Path First (OSPF), which is very common on medium to large corporate networks. The OSPF protocol however does not scale well to the scale that BGP was designed to operate best. Though BGP is not limited to routing packets between ASes and can often be useful for very large enterprise networks.
Figure 1: BGP routing between Autonomous Systems, courtesy of http://www.cisco.com
The BGP is often referred to as External BGP (EBGP) when used to route packets between ASes and Internal BGP (IBGP) when used within an AS. As I mentioned, IBGP can often be useful for very large enterprise networks. One reason which a network administrator might choose to implement IBGP on her network might be to take advantage of BGP's multi-homing which can improve a network's redundancy to multiple access points.
When large networks implement the BGP all peers must connect to each other in what is often called a full-mesh, which just means that each router needs to communicate with each other router in both directions. This can be quite a tax on a router's memory requirements. Needless to say this can also quickly lead to a lot of network traffic. There are however techniques which can be employed to overcome some of these scalability issues.
One issue which the Internet had to deal with a few years ago was the exponential growth of routing tables. This was "solved" (they're still quite large) with the introduction of Classless Inter-Domain Routing (CIDR). The larger the routing table the longer it would take for a router to update that table, which can lead to significant delays in network performance. This can also contribute to a spike in CPU utilization which could then itself lead to routing delays.
One technique that a network designer can use is called confederations. With confederations a large enterprise network can be dived into logical ASes with each AS implementing full-mesh connections and then additional routing between ASes. Essentially, this means that an enterprise network would mirror what the Internet looks like from an architectural point of view.
A network implementing IBGP can also implement the a Router Reflecting (RR) design. In this design multiple routers can peer with a central point acting as a router reflector server while other routers become router reflector clients. This can significantly reduce the amount of sessions that a router must maintain and can significantly increase router performance. However, this design can lead to less than optimal routing. As always there is a trade-off.
Another performance characteristic which network designers need to consider is called route flapping. For a graphical example of route flapping see Figure 2 below. Within any given large network it is quite common for BGP routing tables to be updated frequently as links go up and down. However, for any one particular router this type of activity should be relatively infrequent. Sometimes though, routers are configured incorrectly; this often leads to repeated and excessive cycles of being up and down. This can then lead to excessive activity with all of that routers peers. When a router comes online, whether for the first time or after a short down-time, there is a re-announcement (or an announcement) to each of its peers. This can potentially lead to quite a heavy CPU usage for a short period of time (a few seconds). This activity is called route flapping. To overcome route flapping many BGP implementations include route flap damping.
Route flap damping is a process which attempts to limit the propagating negative effects of route flapping. When route flap damping is employed it works by ignoring repeated re-announcements (when a router becomes available again) for successively longer periods of time. Previously route flap damping was seen as quite an effective method of reducing route flapping. However, as the performance of routers and their ability to update their routing tables as significantly increased and the speed of links between router peers as increased route flap damping is often a hindrance to optimal network performance. Many network professionals now argue against route flap damping. To be safe, if you are a network administrator it's probably best just to manage the router effectively in the first place!
Figure 2: The excessive effects of route flapping, courtesy of www.inetdevgrp.org
Architecture of Routers
What has made these routers perform so much better that route flap damping is no longer necessary? Well there are many obvious reasons. For one, router CPUs have increased their performance similar to how desktop CPUs have increased their performance. Secondly, bus speeds have also increased significantly. Another major change which has contributed to increased router performance is the architectural changes which have been taking place over the past few years. Specifically I'm talking about the change which has seen the route tables being updated by route processors on the interface cards. These interface card processors also handle the actual routing. The CPU handles the operating system and any management tasks being performed by the administrator. This decentralization has significantly improved the overall performance of a network's routing performance.
As you can see there are many factors that go into designing a routing implementation for large networks. This kind of thinking starts at the protocol design stage, continues to the router design stage (the design of the hardware, software, and event he over-all architecture of the router), and continues even further to the stage of network design where designers and administrators use the tools available to them in the protocol and on the routers themselves to optimize the routing for their network. Till next time!