Saturday, September 16, 2017

Network Engineer Jobs

So you want to work for Google as Network Engineer? Check out one of the job ads here. I pasted the screenshot below just in case the ad is removed once the position is no longer available.

"You'll build software for distributed services, abstractions and the components of the system that operates and powers Google." OK, even this is not common in Network Engineer job description, it makes sense since Google is running one of the world's largest networks to connect its data centers that are scattered all around world. As minimum requirement, you must have experience in software development in one or more modern programming languages e.g. C++, Java, Python, Go, etc. And learn how to code using "Teach yourself Python in 24 hours" won't be enough since it is expected for you to have experience in data structures, complexity analysis and software design.

Is Google really looking for Network Engineer (NE), and not Software Engineer (SWE)? Yup, you still need to have expertise in networking protocols and technologies, including end-to-end packet flow, forwarding and routing. Google knows that a world class distributed computing infrastructure must run on world class networking infrastructure that is operated reliably and at scale. When network capacity in the company's data centers has grown so fast that conventional routers and switches can't keep up, Google could not buy, for any price, a data-center network that would meet the requirements of its distributed systems. So the engineers decided to build its own instead.

And someone like me who relies on 3 CCIEs and CCDE only won't be qualified to apply. Before you ask if it is still needed to pursue certification or not, let me say it again: you still need in-depth networking knowledge. You still need to know OSPF, ISIS, and BGP in details. And you may use that kind of certification to build the expertise. But don't turn into certification junkies just like a younger me once! Especially if your target is to only pass the exam, it won't get you to Google for sure. Once you understand network engineering, learn software engineering and how to design, analyze and troubleshoot large-scale distributed systems. In this company, and similar companies that build and maintain large-scale networking like Facebook, Amazon and others, Network Engineer is expected to write software and tools that interact with networking systems, to support Software Defined Networking, zero touch networking, to automate network operations, and to develop advanced monitoring systems.
Google is definitely looking for someone who is at minimum already in Phase 5 as per my Network Engineer evolution, that will progress to Phase 6 someday. And during my past 440 days in Google, I'm so lucky to be surrounded by these guys.

Wednesday, August 09, 2017

Building Intent Based Networking System

I've been unhappy with my creation-to-consumption ratio lately, which is the amount of time spent creating compared to amount of time spent consuming. Yes I spend time creating design documents, business proposals, system architecture, slides for both technical and non-technical content, product requirement documents, blog posts, and occasionally write simple codes, but much of my free time is spent consuming for Netflix, newspapers, Twitter, televised sports, Facebook, blogs, Medium, TV series, online courses and others.

You may say we need consumption as an input prior to creating. And I agree, consuming is fine if it is part of learning or research in order to create something. But creation must come first. So if I commit to create something, let's say a system design or even this blog post, I must start by starting the work first and whenever I feel some information is needed to add or validate the work only then I will consume new inputs to mix with the old ones and fuel creativity.

Tonight I'm sitting in front of my macbook, in an attempt to increase my creation-to-consumption ratio, by writing about building Intent-Based Networking System (IBNS). Let's start with problem definition.

The end customer is a Small-to-Medium Business (SMB) owner who wishes to expand his business to multiple offices. Some of the owner's requirements are below:

"I will have three different size of offices: small for max 20 employees, medium for max 50 employees, and large or main branch with max 200 employees."
"In every office all employees who work in back office must use company-provided wired PC, while employees who work in sales may use company-provided wireless laptop or their personal computing device."
"Anyone can use company's internal collaboration application to chat anytime, however the use of video conference application must be using scheduling system."
"Those who work in sales can access our customer data using web portal, however only those who work in back office can access the database to update the entry."

As you can see, all the above are described using high-level human language, driven by business requirements, policy based, and focus to the applications. These are business intents, and usually in normal network operation we will need an architect or engineer to translate such requirements into technical specification all the way until the method to do the implementation. In the near future, this problem will be solved with no human interaction using Intent-Based Networking.

According to Andrew Lerner from Gartner, Intent-Based Networking is a piece of networking software that helps to plan, design and implement/operate networks that can improve network availability and agility. IBNS incorporates four key things:
  • Translation and Validation– Converts higher-level business policy (what) as input from end users and converts it to the necessary network configuration (how) 
  • Automated Implementation – Uses network automation and/or network orchestration to configure the appropriate network changes (how) across existing network infrastructure 
  • Awareness of Network State – Ingests real-time network status for systems under its administrative control, and is protocol- and transport-agnostic 
  • Assurance and Dynamic Optimization/Remediation– Continuously validates in real time that the original business intent is being met, and can take corrective actions when it is not met 
So company executives or managers define a high-level business policy they want to be enforced in the network. The IBNS verifies that the policy can be executed, then manipulates network resources to create the desired state and enforce policies using fully automated operations. IBNS is gathering data to constantly monitor the state of the network, to ensure the desired state of the network is maintained, and can take automated corrective action to maintain state.

Wait. Is this Intent-Based Networking just another name for network automation and orchestration?

Based on comparison chart created by Big Switch above, Intent defines what is the goal (declarative) while automation or orchestration provides explicit method of how to do the implementation (imperative). This creates layer of abstraction since the input now is more high level to describe business needs, and not implementation specific. And since the Intent is declarative, it requires validation of the Intent to ensure it can be translated by the system into series of tasks that need to be done. Automation and orchestration do not deal with telemetry or monitoring of network state. Real-time network state monitoring is one key element since the system needs to validate in real time that the original business intent is being met. When it is not met, the system can take dynamic actions to correct it.

Where is SDN here? SDN is an architecture for networks with original idea to separate the control plane in SDN controller, from data plane in networking device. This separation makes abstraction between application developer and network device: anyone who wants to create new application related to networking now does not have to understand how specific network device works, instead she can make her application to communicate to SDN controller using northbound API. And if there is any changes required to be done on the device, the controller will do so using southbound API. IBNS can work in both SDN-based and non-SDN based network infrastructure.

A while ago I built the five levels of Autonomous Network, mimicking the levels in Autonomous Vehicle:

Level 0 means no automation at all, and engineer configures network device manually using CLI. As pointed out by Gartner, even if in 2016 there are still 85% operations teams using CLI as their primary interface, the number will go down to 30% in 2020. So some form of automation will surely happen. In Level 1, which is the task specific automation, engineer can write code to communicate to network devices using various APIs to execute certain task e.g. reconfigure BGP configuration, get network state information from the device, redirect the traffic by shutting down certain interfaces or doing routing protocol manipulation, and so on.

Level 2 is when we want to execute series of task to complete one workflow, let's say to deploy the configuration to all network devices for one new medium-size office based on the requirements above. If we assume all required hardware are already in place, the system will start by pulling out the definition of medium-size office: how many devices, what are the types, what kind of configuration needs to be done, and perhaps the current device configuration so the system can make decision if the new config will only be appended to the current one, or will add/replace the full config completely. Then a sequence of tasks following a playbook or recipe, using either an open-source or purposely build software, will be executed to automate the end-to-end workflow to complete the deployment.

Orchestration is needed in Level 3 when there are more components involved in the process. To address some of the business requirements in this exercise will need to orchestrate the work between different controllers or managers for networks (both physical and virtual), servers, storage, security policy as well as management system. For example, we need to make sure the new office will be added into the inventory database. The device configuration and security policy like user segmentation will be enforced by network controller. Then perhaps new virtual machines need to be started on physical servers to host the applications or virtualized network functions. All information, device config, network state until application data must be stored in the storage somewhere, that needs to be orchestrated too. 

Thing gets more interesting from Level 4. If we have already used SDN-based network, we can just use the northbound API to inform the controller what we want it to change from the network. But with non SDN-based infrastructure, it means we still can connect directly to each network device to push the configuration changes. Even if we use automation platform or orchestrator to connects to the network device using API like Netconf and REST, and no longer through manual CLI or SNMP, this approach is called stove pipe as explained in diagram from Tail-F above. The main disadvantage of this approach other than scalability issue, is the communication between the platform to each device (or Managed Object as per Tail-F) is implementation specific depending on the device vendor. This means if we change the device vendor we may have to change the method of implementation from our platform to the new vendor's devices.

Introducing Model Driven networking. Instead of the platform to connect directly to each network device, we can build a model to represent the device and the device config, so now any automation and orchestration happens on the model first. Model based networking provides abstraction since even the network devices are changed to different vendor, the model will remain the same. Another benefit is: any planned changes to the network can be simulated and validated in the model first, and only gets implemented in the real device when the changes are considered safe.

And finally, Level 5 is the target for network intelligence for any infrastructure. We specify business intent, we define the policy between users or components in the network, we provide declarative requirements, and the system will execute without any human interaction. Zero human touch networking. This is the level for Intent-Based Networking System.

Now we know the definition and characteristic of IBNS, how to build one?

I'm using bottom up approach here even though top down will work too and I would argue it is a better approach:

First, start by building the infrastructure. No kidding, we still need the network. Some vendors may call it Network Fabric, and it may consist both physical and virtual networks. We still need physical cables to connect between physical network devices or at least the servers running network functions. In later case we can use overlay protocols to connect between different network functions.

Second, automate and orchestrate the infrastructure. As mentioned earlier, if it's SDN-based infrastructure we will have a controller to deal with network control plane then push the desired state to the device. And in non-SDN infrastructure we still can use a controller to automate any configuration changes to network device (or the model of network device). We need to use an orchestrator to combine this controller with another physical or virtual infrastructure managers that manages the servers, storages and virtual machines.

Third, build the telemetry and monitoring system. At minimum we need a mechanism to measure the state of the network from the status of the service, topology view, both previous and latest configurations, logs for any state change, and error checking mechanism for any failed changes.

Forth, create a mean to translate business intent. This could be in the simple web portal or mobile app that provides service catalogue offering packages for user to select from, with some degree of possible customization. Some day this may turn into a form of virtual assistant that will listen to our voice command, and will translate the captured information into a series of workflows need to be executed by the system.

Obviously the building exercise I write here is a very simple one that should work in principle, event though the devil is always in the details.

Let's see how Google does it, as taken from their public presentation.

Google has been using abstraction with model-driven approach to provide network topology view, configuration data structure and content, and telemetry data structure and attributes. Imagine a vendor-agnostic network topology. The information we need from such topology is a representation of all network devices as nodes, and link to connect the nodes to each other. It can have both node and link attributes such as node identification, port information (e.g. Node A's first port is connected to Node B, second port to Node C and so on) and link speed. We can also have the information to map the node to the current actual network device, for example Node A is currently representing Cisco hardware type X with specific hardware and port configuration, that obviously can be changed when needed. Such information is required for the system to know how to map the model to the actual network device e.g. Node A's first port means interface Gi0/1 on Cisco type X.

Configuration and monitoring information must be described in vendor-agnostic way so it is not bound to specific configuration line or monitoring attribute from the vendor. Any network device configuration will be described as models of interfaces, routing protocols, routing table, routing policy, ACL etc. Each configuration model such as BGP protocol later can be mapped into specific implementation for different vendor's device configuration. And the state of the network like routing table information can be retrieved from the device and populated into the model as well.

Google implements telemetry system using publish–subscribe messaging pattern where senders of messages (publishers) categorize published messages into classes without knowledge of specific receivers (subscribers), and subscribers express interest in one or more classes and only receive messages that are of interest. Using gRPC protocol, it is possible to have a continuous time-series data stream from the device with incremental updates. And the device can provide asynchronous event-driven reporting that does not require to get any response from the servers/collectors (think about device logging or SNMP trap). Obviously it is possible for the collectors to run ad-hoc request to collect data from the devices, that could be a synchronous RPC call.

Once we have all the components in place, what we need now is to connect all pieces together to get the system up and running. The users, or operators, of the system use application to describe configuration intent. An example of this is a web portal to provide the operators to select the option of one use case: "drain the traffic from link X" let's say because we want to do maintenance of the link or migrate it to another link. The instruction is sent using declarative API to both configuration and topology model. Once the requirement is translated into changes to the model, the system will analyze the current configuration to understand what changes are needed to generate required configuration instance. This configuration will be mapped into specific vendor configuration line that will be pushed to the device using different option of southbound protocols depending on which device. Telemetry data is used to monitor configuration changes and the system will provide feedback to the operators when the intent has been implemented successfully.

As closing remark, Intent-Based Networking is considered as "one of the most significant breakthroughs in enterprise networking". Cisco CEO claims Intent Based Networking will redefine the network for the next 30 years. Gartner made prediction that 10% of enterprises will use intent-driven network design and operation tools, reducing their network outages by 65%, by 2020.

And after reading this post, I believe you will agree with such statements because it makes sense. It makes sense not because of the amount of technologies involved in the system, but because the system can provide the answers to the requirements from the business. And that is just what needed from any innovation in this space: to solve real business problem.

Disclaimer: This post represents my own personal view. All the sources of information are available online and accessible to public. No confidential information belong to my current employer is disclosed in this post

Saturday, July 15, 2017

Network Engineer Evolution

About two years ago I made a learning roadmap for network engineers who want to transform their skills towards Software Defined Networking. I presented it at various events including Cisco Live. It was good, but it looks like I didn't provide the full story. So let's discuss it again, and we will start from the very beginning.

Any network engineer who just starts his or her career today will begin in Phase 1: as the User of networking products where the engineer only knows how to configure the product, hopefully by reading the documentation from the vendor's website first. This type of engineer is what I call "Config Monkey" (sorry, monkey!). If you think you are still in this phase, please don't get offended: I started my career here too. There is no innovation at all, only follow the manual to make the products run.

Then we will move to Phase 2: as Advanced User of networking products. This is the phase where the engineer understands how networking protocols work in detail. He is a domain expert now and can start fine tuning the protocols to optimize the infrastructure. IGP timers, fast re-route, BGP attributes etc. and the engineer should go back and forth between the protocols standard and how they get implemented in vendor's products. So all the fine tunes are based on the 'knobs' provided by the vendor. And by nature, phase 2 network engineer possesses the skill to do troubleshooting as well.

Phase 3 is when the engineer starting to become System Integrator. Even it's similar like Advanced User, but now the engineer must deal with different network functions from wireless access and top of rack switches to security devices, firewalls, domain names, caching, network-based storage, content delivery, application load balancer and so on to provide end-to-end services to end users. He is aware about design trade-offs of various choices. There is still no innovation yet, however by now the engineer has possessed the skill to design, integrate and fine tune complex system all the way to application layer.

SDN and network virtualization comes to the picture in Phase 4: Advanced System Integrator. The system now consists both physical and virtual components. Overlay network runs above the underlying physical infrastructure. Virtual infrastructure has multiple controllers and managers that need to be integrated between each other. Network services have life cycle from initiation until depletion so it must be monitored. Phase 4 engineers talk about APIs when integrating different components. Both physical and virtual networks must run in harmony to provide end-to-end connectivity for the users to access the applications and services.

Once the engineer passes Phase 4, this is the point where he can decide to take either one of different technical paths: first, is to move towards the business and start becoming Solutions Architect. Architect must translate business requirements into technical specifications, and provide integrated solutions to answer the requirements. We can live happily ever after here. I know it because I used to be in this phase for many years when I work for Cisco.

The second path is like what Morpheus described as the red pill: stay in Wonderland and learn how deep the rabbit hole goes. We can choose to stay as engineer to go even deeper in the next two phases.

Phase 5: Contributor. Phase 4 engineer assumes all components will just work when they are integrated, just like playing Lego. Yes, she still needs to understand how one component consumes the API of the other component. But in reality the integration is not, or never, that straight forward. Engineer moves to phase 5 once she starts developing few components to make the system works smoothly. It could be as simple as making automation script using SDK provided by the product's vendor. Or create new driver for an open source platform to connect to specific network device. Or customize the current module of one software to make the system runs. Engineer writes code, understands software development workflow, and fills up the missing ingredient to build one solid system.

Phase 6 is the phase served for Creator. This is the God-mode in Network Engineering. Engineer can look at the current network protocol and decide to invent the new and (hopefully) the better one. When building a complex system with multiple products from different vendors, engineer can assess if it is required to build new software component to have a successful integration. Phase 6 engineer thinks about scalability all the time, centralized vs. distributed model, and about the workflow from beginning of user's request until the service is provided. She generates ideas require to solve complex and open-ended problems. She thinks agile and runs iteration to optimize the system. Engineer in this phase is the one who translates business intent into automated workflow execution to deliver the service.

So let's look at my T-shape SDN Skill Transformation path and try to relate it to the 6-phase of Network Engineer Evolution above.

Obviously you need to be at least a Phase 3 engineer before looking at this path. To start the journey in Phase 4, you need to learn virtualized infrastructure for network, compute and storage as well as the managers and controllers to manage them. Learn abstraction and modeling. Then learn software architecture and engineering. Get involved in software development. Start writing code or optimize existing code to move to become Contributor in Phase 5 and beyond. Or if you decide to take Solution Architect path instead, switch the mindset and learn business skills.

I won't mention any vendor's product, or any vendor's certification, anymore in the evolution. Understand the expectation of what engineer in each phase has to deliver, skills that must be possessed, then make your own judgement to decide which vendor or which certification program (if needed) you want to use in your learning process.

And please don't be mistaken to think I self-proclaimed myself as a Phase 6 engineer. I made decision a while ago to become Product Manager instead, that provides me opportunities to work with many creators to build the next cool things in networking.

Final thought: it's okay to start as monkey once. But knowing where we are, and where we want to be, can surely help to plan how to get evolved.

Tuesday, July 04, 2017

One Year Ago Today

One year ago today, fourth of July, was my first day at Google Zürich. It’s been a very interesting journey so far, and from the beginning I spent most of my time to focus on three things: switch to Product Management to learn how to build great product, work on scalable Enterprise networking solution from cloud-based SDN to intent-driven automation, and learn data analysis in-depth from data visualization all the way to Machine Learning, to be used in product development.

As you notice, I rarely post new blog since I joined the company last year. And I find it quite difficult to find any active blog from other Googlers too. Just like any tech company, when we joined all of us signed an agreement containing various obligations including the requirement to hold proprietary information and trade secrets in strictest confidence. But I believe there should be some non-confidential things that we can share in our personal blog.

So why can’t we blog?

First, we are very busy here. And not because we have to, but we choose to.

I mean, there are just too many interesting things to do and to learn at Google. If you work for the best company in the world that empower every employee to innovate, in everything we do, you surely want to spend time the best you can. We write a lot, like writing product requirement document, design specification, or execution plan, but then we will be busy building the product and getting things done.

Second, most of us feel that what we do is not new.

There are so many talented people in Google with great ideas and executing them everyday. So unless we innovate something completely new, or improve something to make it 10x better, most Googlers think what we do is not new, it’s common, and we assume everyone must has known this already so it’s not worth sharing. That could be true for within Google but some of the ways we do things here (again, the non-confidential things) could be very useful for people outside.

Third, we are trusted with so many confidential information, we don’t want to unintentionally share them.

Google culture is very open. Every Noogler, new hire, usually get access to Google codebase within the first week in the company. Employees share their salary and bonus in Google sheet. There is a weekly company-wide all-hands meeting called TGIF where top management to various teams present about a product Google has been working on, and then take any questions from the audience. Any questions from old timer to new hire and even intern. And we are all trusted not to leak the information to outside the company.

This has created the culture of trust, that make us believe we are truly part of the family. And as family member, you don’t want to break the trust by sharing confidential information even unintentionally to outside the family.

(Read here about the impression of company culture from an intern)

Having said those, I will still try to continue blogging here.
Watch this space.

Thursday, January 19, 2017

2016 Year in Review

Every beginning of the year I usually review what I have done the past one year, make notes, and build the plan for the upcoming year. I made many mistakes in the past, did things I’m not proud of, however I use them as opportunity to learn and try to be better next time.

Early 2016 I found that my startup company was competing directly against Cisco (that was still my employer at that time). That was quite surprising. I founded that company in 2012 initially as my pet project, the lab for my MBA, where I can practice whatever I learned from the business school. My pitch for the startup was simple: we do what Cisco (or Cisco Services) will not do. We built online learning platform to learn Cisco certification using group mentoring system. We run physical network audit. We did system integration projects to interoperate Cisco products with any other vendors.

However, since late 2014 the engineering team in my company have evolved. They grew skills in network programming. The team put more focus on Software Defined Networking (SDN). They built lab to validate Network Function Virtualization (NFV). And then the team started to develop our own SDN Controller and Network Automation platform.

Then customers started to come. Customers wanted SDN solution, NFV infrastructure and network automation, but the ones that are vendor-agnostic. They came to my company. They asked the team to bid in the project. That’s when finally Cisco started to notice because they were bidding too.

Early April I decided to resign from Cisco to run my own company as full time CEO.

Mid 2016 I received an offer from Google to join them in Zürich, Switzerland. From April I have built company vision for my startup and laid multi-year strategy, and I knew they can be executed under the current leadership team even without me. I also have personal reason to move my family to Europe. So I agreed to leave Dubai and started working at Google from July.

Even before I joined Google, I already made a plan of what I will learn in the company. Google is the right place to learn so many interesting things, but for 2016 I just wanted to focus on three things:

1. Learn how to build great product

“Behind every great product, there is a great product manager” - Marty Cagan

Google has created 7 great products with more than a billion users using each. And as Ben Horowitz wrote: a good Product Manager is the CEO of the product. A Product Manager combines business, technology, and design in order to discover a product that is valuable, feasible, and usable.

Product Management is above all else a business function, focused on maximising business value from a product. A Product Manager understands the technology stack from the product, and most importantly understanding the level of effort involved is crucial to making the right decisions. And Product Manager is the voice of the user inside the business and must be passionate about the user experience.

2. Continue to learn about SDN, but the scalable ones

Deep down inside I’m still a network engineer. I’ve been focusing on SDN & NFV since 2014 when I was in Cisco. Google has been using software-based solution in its network infrastructure even before the world called it SDN. However, I’m currently interested with highly scaled SDN solution using cloud based platform.

And I’m very interested with transformation path for any Enterprise company to evolve towards a fully automated network operation. I even built the five levels of Autonomous Network, mimicking the levels in Autonomous Vehicle, and currently working on the fifth level: intent-based, policy-driven, zero touch networking.

3. Learn Data Analysis to Machine Learning

Google is the best place to learn Data Science. Period. With Google Brain and DeepMind as part of the Alphabet group, this is the only company I know that puts Machine Learning first in every aspect of its products. Currently I'm focusing to learn about data analysis, data vizualisation and predictive analysis using machine learning.

The three things above are still my valid learning plan for 2017.
How about you? What is your learning plan this year?

Build great product.
Cloud based SDN solution.
With data analytics and machine learning.
“Building the network of the future”. Got it?