Thursday, March 05, 2009

Deep Diving Router Architecture, Part I

When I was young (how old do you think I am now?) I used to look at a router as a “black box” or just a node. I mean, I was not interested to go to the internal packet switching process inside the router itself and I was focusing more on the protocols and features that are run between nodes. Well, actually interest is not the best word to describe it. If you don’t work for a company who makes the routers, do you think you can get more detail information about what is really going on inside the box? Now I’m still young (I guess) but at least I have the chance to look deep dive down to the architecture level of a router hardware.

And actually it’s not always required to have such knowledge anyway in our daily job. Most of the network engineers, even the CCIEs, may just need to assume that the router is a box with multiple interfaces, and its function is to forward the packet to the next hop based on the routing table built from dynamic or static routing protocol. Then we put more focus on the communication between routers to build that routing table, instead of the packet switching process from one interface to the other inside a router. In OSPF, the LSA packets, database and SPF calculation discussion can be very complex and give us lots of headache, especially if we have to do redistribution with another IGP protocol or BGP and so on. So once we can see the routes in the routing table, and there is no other treatment such as filter or policy, normally we would happily assume that the packet will be processed and forwarded to the next hop. Then we can focus more on the other features or applications that run on top of the routing, which will probably give us another different kind of headaches.

So for most of us mere mortals, it may be enough to say that the packet switching within a router means switching the packet from ingress (input) interface to egress (output) interface. In CCIE we do need to dig a bit inside, for example when we have to determine the sequence of features implementation in the router. Does NAT come first or Access Control List? How about policy based routing that override the routing table? And so on. But we never really bother to look at which the internal part of a router who does this or that. Later I can explain why most of us don’t bother, other than due to lack of resources available to learn it.

Why is it important to understand the internal packet switching?
For me personally, is to understand the limitation of protocols or features implementation due to the hardware. And this is important for any design engineer. I mean, we can build a network design to specify number and type of hardware for core routers, aggregation, access etc. Then we recommend the protocols and features to be enabled, and come up with a nice and complete configuration to be pasted to the box. In reality, there is standard for a protocol but every vendor may implements it differently, depending on their interpretation of the standard or perhaps because they invent their own approach in following the standard. And for some features, or the way the protocols are implemented, depend on the hardware architecture. We may end up into situation where the new network has been up and running and only after sometime we start noticing a performance or scalability issue due to the limitation of the hardware inside the routers, when we really have heavy traffic in the network or when we want to expand the design.

A very simplified process of packet switching can be shown in the above picture. The packet travels on the wire with Layer 3 and Layer 2 header information as per TCP/IP protocol stack. The interface processor in a router is capable to pick it up, inspect and strip the layer 2 header and send it to the route processor for further process. While waiting for the route processor doing a layer 3 lookup in the routing table (and forwarding table) to check what should it do to the packet, the packet itself must be stored in a queue or buffer. Once the next hop is determined, the route processor knows to which interface it should send the packet. Then the packet can be moved to an output queue to wait before it can be transmitted back to the wire, get re-written with the new layer 2 header containing the information of the next hop, then the packet can leave out the box. The input and output queue can be virtual, so it can refer to the same physical memory and the packet never moves anywhere. But it makes it possible to apply different treatment when the packet is considered in input queue (before the lookup) and when it is already in the output, where the lookup has been done and the destination interface for the packet has been determined.

So the keywords are: Layer 3 and Layer 2 header, input queue, routing table and forwarding table, lookup, move packet between different location or queues, output queue, layer 2 re-write.

Let’s see it once again in more detail. Here is the snapshot from Vijay Bollapragrada’s Inside Cisco IOS Architecture book, for a very basic switching process called process switching.

Once the interface processor receives the packet from the network media on input or ingress interface, it has to store it in the buffer or memory (1) and at the same time it has to interrupt the processor (2) to inform there is a packet need to be processed. The book focus on software architecture, so it explains how the processor then invokes a process (3), which is called ip_input in Cisco, to start doing the lookup in the routing and forwarding table. This lookup results on which output or egress interface the router need to send out the packet, along with layer 2 information need to be written to the packet before it can be sent out (4). Processor then will do the layer 2 rewrite (5) and move the packet to be processed by egress interface processor (6), then off the packet goes back to the network media. Step 7 is just to inform the main processor that the packet has been sent out, so the memory can be freed and the packet counter on the interface can be increased.

I have to admit that I won’t be able to explain as good as how Vijay (and the other guys) does, so I suggest to read the book for those who are still curious. But my point here is just to emphasize that there are different tasks need to be done other than the lookup, such as moving the packet from ingress to egress interface, and re-writing the new layer 2 information to the packet, which will become important for later discussion.

Again, why we need to worry about internal process of packet switching? Hang on there. I know we usually put more focus on the interaction between routers with routing protocol, to ensure each router can build the routing table successfully. Once we have the table, the Layer 3 lookup process itself now can be done very fast. For each incoming packet we need to compare the destination against the database containing the list of all destinations with the associated egress interface. It can be done quickly, especially since a vendor like Cisco has invented a mechanism so the comparison doesn’t need to be done by going through the entry in the list one by one. Instead, Cisco Express Forwarding (CEF) builds a new mtrie data structure from the routing table, as shown in the next picture. Once the entry has been found, it can give a pointer to the adjacency table which contains the layer 2 information of the next hop.

Enough with the lookup process and how the router can determine to which interface it should send the packet. There is a book written dedicatedly to explain CEF in more detail. And I want to focus on the hardware architecture instead of software or algorithm of the lookup, so I suggest you to read this Cisco Express Forwarding book as well as Vijay’s book.

Now, let’s talk about moving the packet from ingress interface to egress interface. As discussed previously, the packet can be stored in a central memory while waiting for the lookup process. So the ingress interface processor must store the packet there, and the egress interface process can copy the packet (with new Layer 2 information) from the same central location. As you can see, with this idea, the bottleneck is in the central memory performance and obviously the memory must be able to serve multiple requests from different interface processors at the same time.

To improve the memory performance, one may want to use local memory on each interface. So the packet is stored in local memory of ingress interface, then it can be copied to the shared central memory over bus communication, and the local memory of egress interface can get the packet from there. You may start asking, why the ingress interface memory doesn’t send the packet directly to the egress interface memory? Hold your horse for a while. It is possible but it requires some sort of intelligence on the ingress interface processor to define to which egress interface memory it should send the packet. In other word, the ingress interface components may need to do the lookup. I will talk more about this in the next part.

When you open the chasing of an old mid-range router, you may see something similar with below picture. The main board is the base component to connect all other components. There is a central route processor, central memory, the interface network cards, PCI bus to communicate the network cards to the route processor, and other components such as flash where we can store the software image, boot ROM to run the firmware required for booting process before we can load the router software image, and so on.

Back to our keywords quickly: Layer 3 and Layer 2 header are inside the packet. Input queue or buffer can be in ingress network card local memory or in central memory. Routing table and forwarding table are build by route processor using protocol to communicate to other routers. Layer 3 lookup (along with the layer 2 information of the next hop) is done by route processor, by using algorithm to compare the destination against the routing table and forwarding table. Move packet between different location or queues, meaning the packet from ingress network cards local memory must be copied to the central memory using PCI or bus communication, then the egress network cards local memory can get it from there. Output queue is the egress network cards local memory or central memory. Layer 2 re-write to put the layer 2 information to the packet must be done by route processor before the packet can be sent out the router. All the features such as filter or NAT are done by the route processor. Applying the feature on ingress interface or egress interface can just simply be a function to apply the feature on the state of the packet before or after the lookup has been done.

Looking at the picture above, does it remind you of something? Yes, it looks the same as the components of normal PC main board! This is a reason why some talented people can build their own router software, upload it to normal PC, put multiple network cards, and claim they can compete or even beat a router built in dedicated hardware by router vendor.

My take on this: it depends. If you want to compare the free router on normal PC to some old mid-range router, this might be true. Because all the tasks inside the router are done in central processor and memory, so what it takes is to build a good software to do lookup and packet switching, with optimization to ensure it can utilize the resource in proper or better way.

But how about the latest features in next generation network? Do you think some people will build it for free? The features in a router are getting more complicated it needs decision from the team on how to implement it even there is a standard already defined. And in second part I will explain what a vendor has gone far to develop a modern or next generation router. Because obviously the challenge is not on how to switch the packet between ingress interface to egress interface, but how to do so as fast as possible. And it has to be done consistently for different type of packets, for different size of packets, in massive amount to accommodate the demand of huge bandwidth nowadays. Then later on we will start facing more challenges on how to deploy some features that should be done in the hardware, for example to apply different treatment of packets based on priority on egress network card to ensure high priority packets can be transmitted first back to the network media or the wire. Or re-writing the layer 2 information to the packet should be done in the hardware too to ensure maximum performance.

If you have read this far, and you think all the information above is more than enough to help you in your daily job, and you think it’s more important to go back to all the headaches caused by the communication between routers, or protocols and features that need to be run in multiple routers, then you are completely welcomed to still see a router as a black box or a node with multiple interfaces where the packet is going in and out. And there is really no harm if you want to skip the next part and make decision not to bother at all with the internal packet switching process inside a router.

End of part one.

1 comment:

aa said...
This comment has been removed by the author.