Thursday, November 06, 2008

First There Was CRS-1

In about 5 days Cisco will release a new carrier class product that I personally consider revolutionary. I'm not allowed to give any more information here, I can say that I have been following the news internally for quite some time now. And actually we have two new products for this segment, one has been released to a very selected customer without public announcement and the other one that will be announced next week. Don't ask me the reason for that, and even I know it I would not be able to make my comments here.

Why now? Why more products for Carrier or Service Provider? Well, I believe it all started with CRS-1. It took Cisco several years to do the research and when it was released in 2004 CRS really set the new standard for a carrier-class and next-generation routers. It's been several years now and there are so many successful deployment of CRS-1 in the market. CRS is positioned to be a Core router in the network, so the next step is obviously to utilize the technology and new features invented during the research for CRS-1 to develop new products for different segment or different position in the network.

Following are several new characteristics introduced by CRS-1 that may become the basic requirement and standard for any new products designed for carrier and service providers:

Distributed Architecture - we have moved very far from a centralized architecture where a central CPU must do all the work to distributed architecture. In distributed architecture the Route Processor is used just for control plane: to set up routing protocol adjacencies with the neighbor routers, build the routing table, build the forwarding table, then push this forwarding table to the line card. So all packet forwarding or data plane is handled by Line Card. RP should be free from the task to forward packet except in some special cases. Btw, different with most of the previous products, in CRS number of RP can be more than two in a single chassis and that will be useful for some new features will be explained later. And RP functionality is even more reduced because there are dedicated fan controllers to control the cooling system and dedicated alarm module.

High Availability - RP must be redundant and this is not something new actually. Once the primary RP fails the secondary RP should kick in. The different now is: there should not be any packet loss during the process. Since the forwarding table has been pushed to Line Card, during the RP failover the line card should be able to continue forwarding the packet even with only the last state of the table before the failover. But how about the switch fabric? This is the connector between one line card to another, and RP must always communicate with the the fabric and line card. There was a case with the old router that even there is no packet drop during RP failover but when the new RP needs to restore the communication to the fabric it has to drop some of the packets. This is not an issue with CRS. And obviously the fabric itself is redundant.

Modular Line Card = PLIM + MSC - CRS introduce the new type of line card as one line card is formed by two different cards connected via passive midplane. The first is the physical layer called PLIM where we have the physical ports and necessary hardware to do framing, and the second is the intelligent card called MSC that can connect to any different kind of PLIM. MSC is the one who does lookup to forwarding table, apply QoS and Access Control List and so on. Without MSC the PLIM can be considered a dump hardware with physical ports only. Without PLIM, MSC can be considered a smart brain without any arms and legs to interact with outside world. This is a very important concept because now we can start with buying a low speed PLIM and later on we can upgrade to a powerful PLIM without upgrading the MSC. And vice versa, if one day we need to upgrade the capacity of the MSC we don't have to re-patch all the cables that are currently connected to the PLIM.

Non-blocking ports, non-blocking fabric - so the current hardware can provide 40Gbps in a single port. And it has to be a real 40Gbps input and output linerate aka non-blocking at all. This can be achieved in PLIM/MSC because there are different ASIC to handle ingress and egress packet forwarding. This is a very important to know because with different ASIC means anything that can overload the ASIC to process the ingress traffic won't disturb the other ASIC to process egress traffic. And don't forget the fabric. Non-blocking linecard should be supported by non-blocking switch fabric. The fabric in previous products was started with bus technology until cross-bar fabric where the packet must be scheduled and linecard must wait its turn before it can access the switch fabric. Now it's completely different as every line card can access the fabric anytime. Btw, when the packet is sent to the fabric it will be transformed into a cell with a fixed size that is more efficient to be processed by the switch fabric instead of various sizes of regular packets.

Multicast replication - multicast has become a very important aspect in our life especially because of the high demand of IPTV and multicast streaming traffic. Any router should be able to handle multicast traffic, but the question is can the router handle a really huge number of multicast traffic? The replication of multicast packet should be done not only on interface level but as well as inside the switch fabric. So the idea is when the ingress interface receives multicast packet and sends it to the fabric, the fabric must do the replication to ensure the egress linecard where there are subscribers can get the packet. Then the egress linecard may have different ports where the subscribers are connected, so the packet must be replicated again here. And don't forget to have a different queue for unicast and multicast because we don't want one can kill the other. Most of the providers normally run unicast traffic as well as multicast streaming so each should work up to its maximum performance without disturbing the other.

QoS in every aspect - the next generation carrier-class product is built with QoS mindset. If in previous product QoS is only in ingress and egress interface buffer, now there must be mechanism to differentiate the packet in case there is congestion in the switch fabric. Well, it's actually very difficult to congest the switch fabric due to its huge capacity. But the congestion may occur on the queue from the fabric to the egress interface. So even the class of service to differentiate the traffic in fabric may not be extensive as in interface buffer, we should be able to mark high priority packet to ensure it will not get dropped when the fabric queue is full. When there is congestion in fabric queue, there is a back pressure mechanism to inform the ingress interface so ingress interface can slow down sending out traffic to fabric by either buffering or dropping the ingress packets.

Multi chassis - this is a break-through concept where routers can be connected together and work just as a single chassis. It is very useful to increase the capacity of the overall system, more efficient because some resources can be shared, and introduce new concept of router collocation or router hosting with Secure Domain Routers (SDR) technology that will be discussed next. With multi-chassis system there is a chassis designated as switch fabric chassis. So the ingress linecard from each chassis will send the traffic to the first part of the fabric still in the same chassis, then the packet (already cell now) will be sent to the fabric chassis where all the lookup and necessary replication are done, then it will be sent to the destination egress linecard in different chassis or even in the same chassis with the ingress linecard.

Zoning Power System - the previous redundant power supply system where we have two or more power modules to provide 1+1 redundancy is not enough. CRS-1 16-slot introduce a zoning power system where there are two power shelves contain three power modules each and one power shelf is divide into 6 zones. So all the 16 slots for line card are powered as per the zones. Zone 1 may power the first 4 slots, and zone 6 power a different 4 slots. There are 2 zones that powered the RP, Switch Fabric and Fan Controller. With this zoning system in mind, we can plug two connections to the same destination in two different line card in different power zone.

IOS XR for carrier-class router - be ready to deal with IOS XR whether you like it or not! This is the next generation software for new carrier-class router and completely different with IOS. Well, many guys ask me why IOS size is getting bigger and bigger, why IOS has so many different family name, why there are many bugs listed in bug tool and so on. First of all, all software must have bug. If the vendor don't release the bug list because they claim the software has no bug, they lie. And be ready to get surprised and unknown behaviors that we might be able to avoid if we have the list of the previous known bugs. Second, IOS was built long time ago to accommodate any types of customer with different requirement of the features. There was a version of IOS that can run desktop protocol at the same time it has new MPLS features for Service Providers. So if one software tries to have all the features obviously its size becomes really big. And more features mean more chance to hit bugs. When you accumulate all the bugs and put it in the list, even some bugs are only for specific feature that may not be enabled, the list can become really long. Now Cisco has tried to split IOS for different segment completely even for the same hardware platform, for example IOS SR for 7600 is targeted for Service Provider while IOS SX for 6500 is targeted for Enterprise (6500/7600 used to be very similar and can run the same IOS)

Micro kernel, modular and self-healing - the very first different between IOS XR to IOS is IOS XR use micro-kernel and modular while IOS considered as monolithic where one big file handles everything. With XR, micro kernel is the heart of the software then we can add subsystems, software modules and applications on top of it. So Control Plane, Data Plane and Management Plane are completely handled by different subsystems. With this modularity, the terminology of self-healing becomes make sense because if there is a problem in one subsystem it should not affect the others. And each process owns its own protected address space in virtual memory, so issue with a process can be fixed automatically and will not disturb the other processes. The process like OSPF can be restarted without impacting the BGP process. That's what I call true modular software.

In Service Software Upgrade - ISSU becomes a very popular term for many customers. But many people still mistakenly think that ISSU means the real software upgrade without any downtime at all in any circumstances. We need to think like this: even the software has already modular with micro kernel, but same as any other operating system there are some basic processes that are required in order to have the system up and running. So even we can do hitless upgrade without any packet drop for some subsystems because it requires only process restart but for some other subsystems this is not possible to achieve without a full restart. And as far as I know until now there is no router vendor can achieve software upgrade for major version without any restart at all. So ask the vendor more specifically if you have ISSU requirement. And as I have mentioned the architecture is distributed so even the RP is restarted the data plane may still work using the previous state before the RP restart. But how if we need to upgrade the firmware of the linecard itself? I'm not saying the ISSU is not perfect, I'm just saying we just need to see it more specifically and look at the feature for different kind of circumstances and compare it with our own requirements

IOS XR was built for CRS - yes this is true. And when one new software is tested and considered successful, definitely the next step is to re-use it for another hardware platform. So even the GSR can run IOS XR but it's a different software file with the one for CRS because the hardware architecture is different. It's understandable, just as there are Linux for 32-bit and 64-bit with different files. What matters is there is only one IOS XR for Service Provider core network, with the same CLI and no more different type of software families as in IOS. Having said that, it's still IOS XR even the software for CRS and GSR (and the new products to come) is different and sometime the features provided with the same version is slightly different. Btw, for those who already familiar with IOS be ready to be shocked when the first time using the IOS XR CLI. Eveything that we ever wish for to be fixed in IOS has already accommodated by XR. From small thing like using / instead of full subnet mask in IP address, configuration changes won't be applied until it's committed, feature to rollback the config to new features such as admin plane config mode and always-on debug. Try to get one XR machine and see it yourself.

Say goodbye to route-map - Next generation routers need next generation way to control Route Policy. Hence come the Route Policy Language (RPL) to replace route-map in IOS XR. It's actually a new programming language embedded in IOS XR to achieve the purpose of controlling route policy with scalability in mind. Just as any programming language it has the conditional operators like if, if-then, if-else and so on, Booleans and Compound Booleans expression. We can use parameter and variable, we can nest the policy, and the best is we can re-use some policy over and over again by calling it in the function for different kind of other policies. It looks complicated in the beginning but once you start to use it it's difficult to go back to route-map.

Secure Domain Routers, beyond virtual routers - I have seen more and more Service Provider customers use this capability in live network. With SDR we can make partition of a single chassis into several completely different routers. It's not the same with virtual router since each router in SDR has its own RP, line card and its own memory space. Anything happen in one SDR doesn't disturb the other SDR at all. What is shared just the chassis and the switch fabric. For this we need to have RP for the whole chassis called the admin SDR and Distributed RP (DRP) that consumes one slot of line card. Then from the admin plane config mode we can allocate that DRP along with few linecards as part of one SDR. Admin SDR can create and remove the SDR but it doesn't know what is going on inside one SDR. Even the communication from one SDR to another SDR in the same chassis must use external connection. This idea brings new terminology of router collocation since one physical chassis can become several completely different routers to be positioned in different spot in the network. How about router hosting? Those who have the chassis can rent the SDR to the customers just as server hosting. The possibility to invent new way during the implementation of this feature is endless.

Control Plane Policing with Local Packet Transport Services - some types of traffic are still processed by Route Processor. For example the control plane traffic such as routing protocol or network management. RP must also process the packet with destination to RP itself, for example packets destined to loopback IP address or the IP address of the physical interface in the router. And if for some reason someone decides to turn of the CEF switching and want to use packet switching, all packets will pass through the RP for packet forwarding process. This is not recommended but it happens once in life, especially if we have a very skeptical guy in the team. So it is clear that the RP must be protected from all the packets that must be processed by the RP. The first reason obviously to protect the RP from Denial of Service attack when some smart guy can try to send lots of TCP Syn packet destined to RP IP address, for example. And the second is to make sure even the legitimate traffic such as routing protocol and network management must be limited from consuming the whole resources of the RP. This is where the Local Packet Transport Services kick in and it's enabled by default to protect the RP by limiting number of packets can reach the RP.

So it's true, just as in School of Rock "One great rock show can change the world", I guess one great product can change the world too. Make one revolutionary product, and the rest is just history.

Get Ready.


Demarco said...

I think Cisco is trying to copy Juniper on their Backbone routers like M-Series and T-series. I recognize some of the features described (seperation of control plane and forwarding plane, redundant Routing Processor,...) as being integral parts of many Juniper routers and even EX-Switches.
what do you think, Himawan?

Himawan Nugroho said...

? Any modern router has those features and capabilities, so I don't think we can determine who's copying who, especially since most of J folks used to work for Cisco :) what more interesting is basically try to see the decision each vendor made when they develop their products, for example: Cisco decided to go with multi-chassis, CRS use Benes 3-stage architecture in fabric, use the PLIM + MSC concept etc. That's for hardware, for software IOS XR use micro kernel instead of BSD, Secure Domain Routers with separation of physical hardware instead of virtual routers concept and so on. And let's talk about feature which is even more interesting: Martini compare to Kompela, PIM vs BGP etc

Asad said...

iyalah mirip kan banyak engineer ex-juniper yang bkin CRS..

hehee Btul gak om??

amirsh said...

When one of the operator in Saudi make a first call (feb 2008), CRS-1 already in-place boss.

Is it different CRS-1 H/W release?

Himawan Nugroho said...

CRS-1 has been around for several years now. I was talking about the new products, Cisco ASR xxxxx ;) and ASR 9000, that use similar architecture and run IOS XR.

Anonymous said...

Ok, it's been enough time since Nov 2008 to separate truth from the marketing junk.

We were all waiting patiently.
What did we get?

ASR9K is a substandard design that can do merely 40G/slot and not even proven to do L3 at wire speed. Quad EZ NP-3 setup worked well for JNPR for few years on MX series but was never trusted enough to do true L3 features.

Now Cisco tells us they also want to join the party but has no clue how to do it... ASR9K.. ASR14k...

Himawan, get a life.
You are a human with a brain, not a Cisco-owned puppet head. For one time, try tofigure something by yourself, not from glossy marketing material. Maybe you will like it, after all.


Himawan Nugroho said...

Hi Anonymous,
thank you for your comment. You are right, before any products are launched/delivered officially there's always a marketing campaign first. I have so many internal information about ASR9K but unfortunately I can't share it to public. So let's just wait and see how the product performs in production network. Due to the crisis, any company is slowing down the development process, so we may need just a little bit patience.
One thing I must mention here: I don't write something that I don't believe. I believe it because what I see so far. And I work for Cisco indeed, but it doesn't mean I have to market the products.