Sorry, you need to enable JavaScript to visit this website.
Home / Intel in Kubernetes* / Blogs / Jc415 / 2020 / Enhancing K8s Networking

Enhancing K8s Networking

Jim Curry
Nov 02, 2020

Today we have the opportunity to get a sneak peek into one of the speaking sessions for Kubecon North America 2020 – Enhancing K8s Networking with SmartNICs

Q: Okay, so for the first question, please tell me who you are, what your job is, tell us a little about yourself, and what made you such a software guy?

A: My name is Dave Cremins. I'm a Cloud Software Architect working in the Cloud Native Orchestration team in the Network Platforms Group. And I essentially look at new solutions for advancing the networking landscape for Kubernetes and for cloud native in general. And I also look at items like advanced resource management. So be it for optimal placement or for better efficiency and utilization of resources available on a platform. So those are my two main areas that I work on from day-to-day.

I've worked with numerous languages over the years, C#, Java, Node.js , JavaScript, I've done a bit of Scala, Go, Python, Ruby, PHP. So I've worked with numerous languages and numerous stacks over the years. And I think what complements my position at Intel is that I can bring a high-level perspective to a low-level, or I can bridge the gap between high-level and low-level given some of the work that we conduct at Intel. Especially from an orchestration perspective with the networking aspects, that they're so low-level and we pretty rarely go past Layer 3 in the sack, whereas I rarely came down to Layer 4 in the stack in previous positions.

I like software. I think it's a creative process. I think it's something that allows you to express craftsmanship if done correctly. So I take pride in putting time into building the right solutions for actual problems. So anyone can build software, but I think there's a particular skill involved to ensure that you build it correctly so that it's easy to use, it can scale well, and I'm a full believer in the Unix philosophy, "Do one thing and do it well."

 

Q: What are the main concerns for deploying cloud native network functions in a Kubernetes environment? What are the main concerns for deploying these things?

A: When we look at cloud native, three things come to mind - containerization, microservices and orchestration. And that's really the three pillars of cloud native technology. Kubernetes is the de-facto orchestration system today, and it does a really good job of abstracting platforms. That's essentially one of its most powerful and appealing features.

Now for network functions, inherent in network functions essentially is the certain characteristics in terms of low-latency, high-throughput, deterministic performance. And these types of things can be difficult to achieve in the Kubernetes environment given that a lot of the workloads that land on Kubernetes were all kind of web-scale type workloads. They were all kind of broken down from their monolithic approach into a microservice approach and then deployed as containers and managed and then orchestrated via Kubernetes.

Now we want to do the same with network functions, but network functions are slightly different. And we don't have any means today to actually, or there's no definitive way of slicing a network function or a VNF and building something that characteristically represents a cloud native aspect for network functions. That transition is still ongoing and there's numerous companies and communities out there looking at how to successfully transition from a VNF to a CNF and then for it to successfully be orchestrated on Kubernetes.

Another concern is that some of the primitives in Kubernetes aren't good enough to ensure that network functions can avail of that deterministic performance. So we need things like alignment of the necessary resources to deliver on the capabilities of the network function. That's things like the CPU's, the network resources that need to be leveraged and the memory that it needs as well. So in Intel, they've invested in numerous components there to ensure that Kubernetes is actually primed for network functions to be deployed there.

And you know, network functions and the transition from the appliance model to VNF and then from VNF to CNF, we're still trying to understand how to do CI/CD systems, how to do ephemeral systems, and how do all of these kind of cloud native attributes come together and allow network functions to be successful. Between the necessary constructs in Kubernetes being upgraded to facilitate network function deployment, the alignment of these resources, and the acceleration of these particular resources so that the strict criteria for network functions can avail of this stuff, like the low-latency high performance and scalability aspects and then the actual patterns of cloud native themselves.

These are the type of concerns that need to be addressed in order to facilitate a full transition from the VFN model to the cloud native network function model.

 

Q: You have spent time enabling bare metal containers deployments. What are in your view, advantages of containers bare metal deployments for network functions?

A: First of all, let's address some of the trade-offs. There are numerous deployment models out there today, be it virtualized, para-virtualized or bare metal.  What it comes down to is your requirements. What do you need to drive your network function? I really focus on the bare metal aspect because bare metal doesn't have any virtualization tax. We don't have any hypervisors in place. We have our host configuration, our ROS and we have a number of runtimes, our host agents that manage the systems for us.

By virtue of the fact that we have eliminated a virtualization layer, we automatically have the ability to take advantage of the raw metal itself. This is a very appropriate way of ensuring that network functions once deployed on a bare metal environment can deliver on the necessary performance that's expected of them.

In terms of management, it's easier to manage a bare metal environment in my experience because you don't need to carry the weight of the virtualization layer. You essentially have access to numerous acceleration technologies, be it for encryption, compression, for data path acceleration, for networking. There are other acceleration mechanisms in place, like, FPGAs or GPU's and so on. And these are all, let's say things that can be leveraged to ensure that the ultra-performance can be delivered on when we do this.

There are other aspects to bare metal as well. When you have a bare metal system and you have your OS and you have your runtimes on top of that - those components are pluggable, so we don't have to worry about our infrastructural setup in terms of bare metal. We've got a bare metal system with these runtimes that are hot-swappable. So essentially, it allows us to embrace infrastructural change and not worry about the side effects to the applications or the workloads running on top of these runtimes sitting on top of the bare metal itself. That's another big advantage.

And another area is that, as opposed to the standard economies of scale approach from the CSPs today, bare metal allows us to do a vertical scale up because of the available performance on it. And a lot of the work that we've done around container bare metal is for telco environments and 5G deployments and it's for edge deployments as well.  There are a number of advantages to bare metal, especially from a performance perspective.

 

Q: Has Intel developed any technology  with the community to enable the superiority of bare metal deployment for network function?

A: We developed a container network interface (CNIs) for SR-IOV technology. SR-IOV is a very popular technology leveraged by telcos today to ensure almost line rate speeds available to containerized workloads. We've built this, and in order to leverage SR-IOV, we actually built Multus.  Multus is essentially a meta-plugin or a proxy-type system that proxies the requests from Kubelet to provision a pod with networking interfaces. Without this, you cannot have an SR-IOV enabled pod. So we've ensured this.

We've essentially unlocked Kubernetes for telco workloads, given the requirements that they have in terms of separation of traffic and things like that. Because we can't leverage to control the plane interface for things like telco-type traffic, it's just not possible. In actual fact, we need a separation there and to do that then we leverage multiple different interfaces on a pod.

We've also developed the CPU Manager and Topology Manager.  These are offerings within Kubernetes to ensure that we can negate noisy neighbor problems to ensure that we can align a request of resources on a single socket if it's in a multi-socket system. A lot of this stuff is running on the assumption that under the hood, it's all bare metal. So we're not dealing with VCPUs, we're dealing with actual CPUs and things like that.

We've also built some userspace acceleration systems. Again, they come in the form of CNIs so that we can accelerate East to West traffic within a single node. We have a userspace CNI that provisions OvS-DPDK for you. We also have that behavior baked directly into things like Kube-OVN. If you want to use Kube-OVN as your SDN controller. We've now primed it to ensure that you can actually deploy OvS-DPDK connections or OvS-DPDK interfaces into your pods.

We're looking at some stuff around AF_XDP. We've also done work on the Resource Management Daemon (RMD). We have a bucket load of components that we have built for bare metal deployments, to ensure that they are ready for the cloud native network functions that are run on them. All of this is available on our Intel GitHub.

We've also started to move some of our components, especially our networking components, into the network plumbing working group, which is a sub-community within the Kubernetes ecosystem.  They are tasked with defining specs for multiple attachments to pods, defining new components and paving the way forward for Kubernetes networking in general and how we can accelerate that for all types of workloads. We've donated a bunch of our components to this particular organization. In summary, we've numerous components for bare metal to ensure that network functions are successful in a bare metal Kubernetes environment.

 

Q: What is your view on current and future acceleration in bare metal environments to advance predictable performance for network functions?

A: If we look at acceleration options today, we have things like FPGAs, GPUs and QAT. I think the most popular type of acceleration is the SmartNIC systems whereby we have a NIC capability with essentially an SoC capability. We're able to essentially have almost a separate host that we can offload a lot of our networking concerns to.

OvS is a very popular technology often deployed in the telco landscape. How could we accelerate OvS?  Why would we want to accelerate it? The benefit of accelerating OvS is to take some of the fundamental infrastructure requirements necessary for all of these workloads and move them off our platform and move them on to accelerators that can do the job for us and do it faster. This alleviates the pressure on our particular platform so that we can accommodate more workloads. That's a very powerful concept. It's something that I think is absolutely critical to the likes of in edge, depending on the different categorizations of edge, be it far edge, or closer proximity in terms of your point of presence.

With edge deployments, we don't have the same freedom that we do with data centers or cloud. Physically we're restricted. We're not going to have as much hardware at the edge. So it makes sense for us to ensure that if we do have options to accelerate edge deployments, that we leveraged the accelerators today on our platform to ensure that we can actually run more workloads and utilize our platform resources.

Intel have provided device plugins for Kubernetes that targets things like SR-IOV, QAT, FPGA and  GPUs. We are very active in this particular arena. Accelerators and accelerating the system is something that's been driven by the promise of things like 5G - ultra-low latency, superior performance, and incredibly good throughput.

To try and take advantage of that  we want to ensure that we have things in place that allow us to create that proportionate system so that we can do hardware offloads and utilize acceleration techniques on systems or cards that are plugged into your platform. Then the platform can worry about actually the workloads and deliver on the actual business value.

We have enablement paths that are being deployed and being leveraged.

 

Q: That’s a lot to consider.  Could you give us a pithy summary of the above?

A: We've built a lot of stuff for bare metal.  Bare metal is a good deployment target for the edge. From a bare metal perspective, we can look at the scale-up model versus the scale-out model. We've got a lot of innovation that we've built directly for bare metal deployments, like our CPU Manager, our Topology Manager, our SR-IOV CNIs, Multus, the userspace CNIs, the OvS-DPDK enhancements to Kube-OVN. We've got other things like enabling huge pages in Kubernetes. We've got stuff like RMD, we're looking at other networking technologies like AF_XDP. All of this is central to our bare metal deployments.

If you target bare metal for the edge, then a lot of this stuff is very applicable and very suited to edge computing. It's all very complimentary to the edge. Because what is the edge really? It's essentially taking the processing capabilities that are occurring today in cloud environments or enterprise environments and moving them closer to where the actual data is generated. So I think this is something that is very suited and applicable to edge. And that's one of the things that I will cover in my presentation for KubeCon.

 

Q: What are some of the security boundaries the container orchestration is exploring to improve workload isolation while increasing device workload accelerator sharing?

A: The security boundaries is something we look at. Let's take the SmartNIC, for example. So a SmartNIC will allow us to do acceleration, but it also actually allows us to put the Root of Trust directly into almost another platform, given the type of SmartNIC that has been deployed. This way, we are able to essentially provide the right level of security with the right policies in place without ever having to interfere with the tenants actual workload or the tenants running on the platform. And again, this is another area that we can look to and ensure that it's primed from a security perspective. And again, it's another type of, let's say characteristic, that can be very applicable for edge computing as well.

It's really when we talk about multi-tenancy, if we do have multiple tenants, how do we provide the right isolation and security boundaries? Instead of looking at a platform level, we can look to an infrastructure component - something like a SmartNIC and then allow tenants then to run on the platform. So you create the boundaries and you move the necessary concerns then into their applicable domains.

 

Q: What are some of the techniques the container orchestration is testing to better locate workloads that share distributed cluster hardware features like a graph accelerator, FPGA, or storage so workloads have ideal placement performance?

A: Telemetry Aware Scheduling is a complex component, right? And it's definitely suited more towards advanced use cases.  When it's distributed and there's maybe hardware features in place. Maybe you have metrics available from these particular hardware features and then you want to generate a policy or apply a policy for that particular accelerator based on the metrics that you've obtained from it and have your pod do something different. Or maybe move somewhere else, or maybe claim another resource or whatever the case would be.

Telemetry Aware Scheduling, this is exactly what this does. As I said, once we up the onboarding to Kubernetes, especially for the plethora of workload types that that should run under or that can run under, you will see new demands then for more complex, policy-based scheduling than what's available from native Kubernetes. We've built Telemetry Aware Scheduling to address this need.

We've also built or are going to build something around topology management. This will be complimentary to Topology Manager. Topology Manager is able to align resources on a single socket for a single node. But what happens when we need to look at the topology of our cluster? How do we ensure that the resources that I depend on to deliver for my workload? How do I ensure then that I can be placed correctly, so that I can leverage the right accelerations on the platform and I get the access to the right resources and that I'm fully aligned in a single socket if it is a multi-socket system?  We're looking at a lot of these different components moving forward. And one of the great things about working here working on this is that we consistently interact with communities, with customers, and other groups. And we create that necessary feedback loop to ensure that what we build at Intel can be leveraged and solves real-world problems for numerous domains, be it telco, or edge, or whatever the case would be. And that's really good. And we will have more, we've got some more innovative track items that I can't really discuss right now. And again, these are all to address problems that will be prevalent as onboarding continues to ramp up.

Join Dave @ Kubecon 2020 to learn more

November 17–November 20, 2020

https://kccncna20.sched.com/event/ekCf/enhancing-k8s-networking-with-smartnics-dave-cremins-intel