Wednesday, August 09, 2017

Building Intent Based Networking System

I've been unhappy with my creation-to-consumption ratio lately, which is the amount of time spent creating compared to amount of time spent consuming. Yes I spend time creating design documents, business proposals, system architecture, slides for both technical and non-technical content, product requirement documents, blog posts, and occasionally write simple codes, but much of my free time is spent consuming for Netflix, newspapers, Twitter, televised sports, Facebook, blogs, Medium, TV series, online courses and others.

You may say we need consumption as an input prior to creating. And I agree, consuming is fine if it is part of learning or research in order to create something. But creation must come first. So if I commit to create something, let's say a system design or even this blog post, I must start by starting the work first and whenever I feel some information is needed to add or validate the work only then I will consume new inputs to mix with the old ones and fuel creativity.

Tonight I'm sitting in front of my macbook, in an attempt to increase my creation-to-consumption ratio, by writing about building Intent-Based Networking System (IBNS). Let's start with problem definition.

The end customer is a Small-to-Medium Business (SMB) owner who wishes to expand his business to multiple offices. Some of the owner's requirements are below:

"I will have three different size of offices: small for max 20 employees, medium for max 50 employees, and large or main branch with max 200 employees."
"In every office all employees who work in back office must use company-provided wired PC, while employees who work in sales may use company-provided wireless laptop or their personal computing device."
"Anyone can use company's internal collaboration application to chat anytime, however the use of video conference application must be using scheduling system."
"Those who work in sales can access our customer data using web portal, however only those who work in back office can access the database to update the entry."

As you can see, all the above are described using high-level human language, driven by business requirements, policy based, and focus to the applications. These are business intents, and usually in normal network operation we will need an architect or engineer to translate such requirements into technical specification all the way until the method to do the implementation. In the near future, this problem will be solved with no human interaction using Intent-Based Networking.

According to Andrew Lerner from Gartner, Intent-Based Networking is a piece of networking software that helps to plan, design and implement/operate networks that can improve network availability and agility. IBNS incorporates four key things:
  • Translation and Validation– Converts higher-level business policy (what) as input from end users and converts it to the necessary network configuration (how) 
  • Automated Implementation – Uses network automation and/or network orchestration to configure the appropriate network changes (how) across existing network infrastructure 
  • Awareness of Network State – Ingests real-time network status for systems under its administrative control, and is protocol- and transport-agnostic 
  • Assurance and Dynamic Optimization/Remediation– Continuously validates in real time that the original business intent is being met, and can take corrective actions when it is not met 
So company executives or managers define a high-level business policy they want to be enforced in the network. The IBNS verifies that the policy can be executed, then manipulates network resources to create the desired state and enforce policies using fully automated operations. IBNS is gathering data to constantly monitor the state of the network, to ensure the desired state of the network is maintained, and can take automated corrective action to maintain state.

Wait. Is this Intent-Based Networking just another name for network automation and orchestration?

Based on comparison chart created by Big Switch above, Intent defines what is the goal (declarative) while automation or orchestration provides explicit method of how to do the implementation (imperative). This creates layer of abstraction since the input now is more high level to describe business needs, and not implementation specific. And since the Intent is declarative, it requires validation of the Intent to ensure it can be translated by the system into series of tasks that need to be done. Automation and orchestration do not deal with telemetry or monitoring of network state. Real-time network state monitoring is one key element since the system needs to validate in real time that the original business intent is being met. When it is not met, the system can take dynamic actions to correct it.

Where is SDN here? SDN is an architecture for networks with original idea to separate the control plane in SDN controller, from data plane in networking device. This separation makes abstraction between application developer and network device: anyone who wants to create new application related to networking now does not have to understand how specific network device works, instead she can make her application to communicate to SDN controller using northbound API. And if there is any changes required to be done on the device, the controller will do so using southbound API. IBNS can work in both SDN-based and non-SDN based network infrastructure.

A while ago I built the five levels of Autonomous Network, mimicking the levels in Autonomous Vehicle:

Level 0 means no automation at all, and engineer configures network device manually using CLI. As pointed out by Gartner, even if in 2016 there are still 85% operations teams using CLI as their primary interface, the number will go down to 30% in 2020. So some form of automation will surely happen. In Level 1, which is the task specific automation, engineer can write code to communicate to network devices using various APIs to execute certain task e.g. reconfigure BGP configuration, get network state information from the device, redirect the traffic by shutting down certain interfaces or doing routing protocol manipulation, and so on.

Level 2 is when we want to execute series of task to complete one workflow, let's say to deploy the configuration to all network devices for one new medium-size office based on the requirements above. If we assume all required hardware are already in place, the system will start by pulling out the definition of medium-size office: how many devices, what are the types, what kind of configuration needs to be done, and perhaps the current device configuration so the system can make decision if the new config will only be appended to the current one, or will add/replace the full config completely. Then a sequence of tasks following a playbook or recipe, using either an open-source or purposely build software, will be executed to automate the end-to-end workflow to complete the deployment.

Orchestration is needed in Level 3 when there are more components involved in the process. To address some of the business requirements in this exercise will need to orchestrate the work between different controllers or managers for networks (both physical and virtual), servers, storage, security policy as well as management system. For example, we need to make sure the new office will be added into the inventory database. The device configuration and security policy like user segmentation will be enforced by network controller. Then perhaps new virtual machines need to be started on physical servers to host the applications or virtualized network functions. All information, device config, network state until application data must be stored in the storage somewhere, that needs to be orchestrated too. 

Thing gets more interesting from Level 4. If we have already used SDN-based network, we can just use the northbound API to inform the controller what we want it to change from the network. But with non SDN-based infrastructure, it means we still can connect directly to each network device to push the configuration changes. Even if we use automation platform or orchestrator to connects to the network device using API like Netconf and REST, and no longer through manual CLI or SNMP, this approach is called stove pipe as explained in diagram from Tail-F above. The main disadvantage of this approach other than scalability issue, is the communication between the platform to each device (or Managed Object as per Tail-F) is implementation specific depending on the device vendor. This means if we change the device vendor we may have to change the method of implementation from our platform to the new vendor's devices.

Introducing Model Driven networking. Instead of the platform to connect directly to each network device, we can build a model to represent the device and the device config, so now any automation and orchestration happens on the model first. Model based networking provides abstraction since even the network devices are changed to different vendor, the model will remain the same. Another benefit is: any planned changes to the network can be simulated and validated in the model first, and only gets implemented in the real device when the changes are considered safe.

And finally, Level 5 is the target for network intelligence for any infrastructure. We specify business intent, we define the policy between users or components in the network, we provide declarative requirements, and the system will execute without any human interaction. Zero human touch networking. This is the level for Intent-Based Networking System.

Now we know the definition and characteristic of IBNS, how to build one?

I'm using bottom up approach here even though top down will work too and I would argue it is a better approach:

First, start by building the infrastructure. No kidding, we still need the network. Some vendors may call it Network Fabric, and it may consist both physical and virtual networks. We still need physical cables to connect between physical network devices or at least the servers running network functions. In later case we can use overlay protocols to connect between different network functions.

Second, automate and orchestrate the infrastructure. As mentioned earlier, if it's SDN-based infrastructure we will have a controller to deal with network control plane then push the desired state to the device. And in non-SDN infrastructure we still can use a controller to automate any configuration changes to network device (or the model of network device). We need to use an orchestrator to combine this controller with another physical or virtual infrastructure managers that manages the servers, storages and virtual machines.

Third, build the telemetry and monitoring system. At minimum we need a mechanism to measure the state of the network from the status of the service, topology view, both previous and latest configurations, logs for any state change, and error checking mechanism for any failed changes.

Forth, create a mean to translate business intent. This could be in the simple web portal or mobile app that provides service catalogue offering packages for user to select from, with some degree of possible customization. Some day this may turn into a form of virtual assistant that will listen to our voice command, and will translate the captured information into a series of workflows need to be executed by the system.

Obviously the building exercise I write here is a very simple one that should work in principle, event though the devil is always in the details.

Let's see how Google does it, as taken from their public presentation.

Google has been using abstraction with model-driven approach to provide network topology view, configuration data structure and content, and telemetry data structure and attributes. Imagine a vendor-agnostic network topology. The information we need from such topology is a representation of all network devices as nodes, and link to connect the nodes to each other. It can have both node and link attributes such as node identification, port information (e.g. Node A's first port is connected to Node B, second port to Node C and so on) and link speed. We can also have the information to map the node to the current actual network device, for example Node A is currently representing Cisco hardware type X with specific hardware and port configuration, that obviously can be changed when needed. Such information is required for the system to know how to map the model to the actual network device e.g. Node A's first port means interface Gi0/1 on Cisco type X.

Configuration and monitoring information must be described in vendor-agnostic way so it is not bound to specific configuration line or monitoring attribute from the vendor. Any network device configuration will be described as models of interfaces, routing protocols, routing table, routing policy, ACL etc. Each configuration model such as BGP protocol later can be mapped into specific implementation for different vendor's device configuration. And the state of the network like routing table information can be retrieved from the device and populated into the model as well.

Google implements telemetry system using publish–subscribe messaging pattern where senders of messages (publishers) categorize published messages into classes without knowledge of specific receivers (subscribers), and subscribers express interest in one or more classes and only receive messages that are of interest. Using gRPC protocol, it is possible to have a continuous time-series data stream from the device with incremental updates. And the device can provide asynchronous event-driven reporting that does not require to get any response from the servers/collectors (think about device logging or SNMP trap). Obviously it is possible for the collectors to run ad-hoc request to collect data from the devices, that could be a synchronous RPC call.

Once we have all the components in place, what we need now is to connect all pieces together to get the system up and running. The users, or operators, of the system use application to describe configuration intent. An example of this is a web portal to provide the operators to select the option of one use case: "drain the traffic from link X" let's say because we want to do maintenance of the link or migrate it to another link. The instruction is sent using declarative API to both configuration and topology model. Once the requirement is translated into changes to the model, the system will analyze the current configuration to understand what changes are needed to generate required configuration instance. This configuration will be mapped into specific vendor configuration line that will be pushed to the device using different option of southbound protocols depending on which device. Telemetry data is used to monitor configuration changes and the system will provide feedback to the operators when the intent has been implemented successfully.

As closing remark, Intent-Based Networking is considered as "one of the most significant breakthroughs in enterprise networking". Cisco CEO claims Intent Based Networking will redefine the network for the next 30 years. Gartner made prediction that 10% of enterprises will use intent-driven network design and operation tools, reducing their network outages by 65%, by 2020.

And after reading this post, I believe you will agree with such statements because it makes sense. It makes sense not because of the amount of technologies involved in the system, but because the system can provide the answers to the requirements from the business. And that is just what needed from any innovation in this space: to solve real business problem.

Disclaimer: This post represents my own personal view. All the sources of information are available online and accessible to public. No confidential information belong to my current employer is disclosed in this post