?>

Optimal Routes for your Data

An overview of currently used routing protocols and a investigation of a mathematically optimal alternative. Visualizations and code included! Post in collaboration with Akshay Yeluri.

When you send an email, do you ever wonder how that data makes its way from your phone or computer to whoever you are sending it to? This question gets to the core of how the internet works, which in all its power and magic, is essentially a data transfer system. But it is a ridiculously huge data transfer system – it’s estimated that global internet traffic is around 122,000 petabytes per month (for reference, a petabyte is about 10,000 4k movies worth of data). This term, I took a class called Communication Networks that focused on understanding the protocols that make the internet work under the hood. I found a good layperson’s overview of some of these topics in this Medium article, in case you’re also interested. Personally, I was particularly fascinated by routing protocols, which determine the paths through the network that packets (bits of data) take to make it to their appropriate destinations.

What are Routing Protocols?

First, a bit of background is needed to understand routing protocols. There exists some collection of hosts (people’s personal devices) which are the sources and destinations for the data packets. A limited number of hosts can connect to a single router – which is a device whose job it is to get packets to their correct destination. Each router is also connected to a limited number of other routers. All together, these connections form a network of millions of routers that connect billions of hosts.

This image shows a small network of routers connected to hosts. In future visualizations, we will only show the routers in the graph for simplicity. Since each host connects to only one router, finding a path from Host 1 to Host 3 (for example) is equivalent to finding a path from router A to router C.

Now, the problem that routing protocols try to solve can be formulated as follows: When host A sends a packet to host B, what path should that packet take to get to B as fast as possible? Once these paths are decided for every pair of hosts, the routers ensure that packets (which are tagged with their destination) actually travel along these routes. Each router has a routing table which tells it which one of its neighbors to route each packet to (based on the packet’s destination). Routing tables fully determine the path that every packet in the network will take, so we can reformulate our problem as: What should the routing tables be so that packets get to their destinations as fast as possible?

It remains to ask: What makes a certain path fast or slow? Initially you might think that it’s solely based on the number of links on the path (more links = slower) and to some extent this is true. However, it is also true that increased traffic along a connection makes it slower due to queuing delay. You can think of queueing delay as the time the packets have to stand in line before being sent along a busy link. This adds complexity to the problem, as the choice of routing tables affects how many packets are sent along a particular link, and therefore affects the average travel time of the packets (a.k.a. the average packet delay). This means that a good routing protocol will make trade-offs between choosing the shortest paths and avoiding routing packets along highly trafficked links.

Dynamic Shortest Paths Routing

The routing protocol that is actually used in the internet is a variation of dynamic shortest paths routing. Shortest paths is a bit of a misnomer, as it is not actually shortest paths routing but fastest paths routing. At every timestep, the protocol involves constructing the graph consisting of all the routers in the network and their connections, and assigning each directed edge a weight. This weight is the packet delay along the link calculated using the current routing table and the data traffic from the previous timestep. Note that both of these are required to calculate the total traffic along the link, and therefore the packet delay.

Once this graph is constructed, an algorithm like Floyd-Warshall is used to calculate all-pairs shortest paths for the weighted graph. This corresponds to all-pairs fastest paths for the network, since the weight on each edge is the time to travel along that edge. So minimizing the weighted length of the path from A to B minimizes the travel time of a packet going from A to B. The routing tables in the network are then updated to match the shortest paths found by Floyd-Warshall.

Shortest path from A to F on the weighted graph is shown in blue. Note that there is a ‘shorter’ path (A, B, D, F) but that has higher total edge weight.

The ‘dynamic’ in the name of this protocol comes from the fact that it updates the routing tables at every timestep based on the traffic of the previous timestep. This means that it can naturally accommodate traffic distributions that change over time. We would hope that under stationary traffic distributions, dynamic shortest paths routing would eventually converge, however this is not the case. In fact, a major disadvantage of dynamic shortest paths routing is that the routing tables can oscillate wildly between timesteps. This behavior comes from the fact that the algorithm generally chooses to use links that were lightly trafficked (and therefore fast) in the previous timestep, but does not consider that this choice will cause those very links to now be heavily trafficked (and therefore slow). The variants on dynamic shortest paths routing that are used in practice exist mainly to address this problem. If you would like to know more about these variants, an interesting paper on preventing oscillations in routing tables can be found here.

Visualizations (Shortest Paths)

Code on Github

The left image shows how the average packet delay changes with steps of the dynamic shortest paths protocol. We see the oscillatory behavior characteristic of this protocol. The right image shows how the total traffic on each link evolves, for example we can see the (1, 2) link becoming more heavily trafficked and the (1, 0) link being used less as the simulation progresses. The total traffic graph is essentially what Floyd-Warshall is run on in each step of the protocol, as heavily trafficked links have higher weight and are slower.
This shows the evolution of the routing tables with respect to each destination node in the network (average packet delay is also shown on the bottom right). We can see that for destination 0, node 3 oscillates between sending data to node 2 and node 1. This makes sense given that when node 3 routes to node 2 the link (3, 2) becomes heavily trafficked (traffic of 4) and when node 3 routes to node 1 the link (3, 1) becomes heavily trafficked (traffic of 6). This from the traffic video above.

Probabilistic Routing

An interesting alternative to the routing protocols used in practice can be found in the paper “A Minimum Delay Routing Algorithm Using Distributed Computation” by Robert Gallagher (1977). We can consider the routing tables we have talked about so far to be deterministic: once the routing table is fixed, all packets traveling from A to B follow the same path. Gallagher proposes routing tables that we will call ‘probabilistic’, meaning that all traffic traveling from A to B does not necessarily take the same route (even within a timestep). Given a packet at router A destined for B, the packet’s next location is drawn from a probability distribution over neighbors of A. This probability distribution is defined in the routing table for A.

The benefit of this formulation of routing tables is that it allows more flexibility in dividing traffic evenly across the network, and therefore allows for less packet delay. In fact, Gallagher gives a protocol for updating probabilistic routing tables that he proves always converges to the minimal average packet delay given stationary traffic distributions. This means that it’s relatively easy to make probabilistic routing optimal! However, the trade-off is that probabilistic routing tables are more complicated, since implementing them requires every router to be able to sample from a probability distribution.

As our project for Communication Networks, we implemented Gallagher’s protocol and checked that it converged to the optimal routing tables by also performing numerical minimization of the average packet delay (using scipy). Our code for this project can be found at our GitHub. Interestingly, we found a small network (only 4 nodes!) where the best probabilistic routing table significantly outperforms the best deterministic routing table. This means that using probabilistic routing tables could prove worthwhile in networks where minimizing packet delay is very important and simple network design is less of a consideration.

Visualizations (Gallagher)

Code on Github

The left image shows how the average packet delay changes with steps of the Gallagher protocol. We can observe convergence to the minimum average packet delay solution. We also notice that the total traffic on each link (shown on the right) is not always an integer value. This is because the non-integer values in the routing tables allow the expected number of sent packets along a link to be a non-integer.
This shows the evolution of the routing tables with respect to each destination node in the network (average packet delay is also shown on the bottom right). Going back to the example from the shortest paths case, we can see that node 3 routes some fraction of traffic destined for node 0 to node 1 and some fraction to node 2. This is why this protocol is able to converge instead of oscillating like the previous example.