Sorry, you need to enable JavaScript to visit this website.
Home / Intel in Kubernetes* / Blogs / Jc415 / 2020 / Boost K8s Performance with Telemetry Aware Scheduling

Boost K8s Performance with Telemetry Aware Scheduling

Jim Curry
Nov 17, 2020

Today we have the opportunity to learn about maximizing performance with cloud native networking with telemetry.

Q1: Please tell who you are and what your job is?

A: My name is Killian Muldoon, I’m a Software Engineer in Intel’s Network Platform Group focused on Cloud Native Orchestration. Our team works on problems in the intersect between complex hardware and high performance software. My work takes me from networking to resource management and telemetry, but I’m centered on finding ways to express high value hardware capabilities in a practical and consumable way to Kubernetes end users.


Q2: What are the major roadblocks you see when deploying cloud native networking?

A: Fitting core network functions that were almost pure hardware just a few years ago to a cloud native frame throws up some interesting challenges. Core network functions need to be sure they’re on a predictable hardware/SW infrastructure even when they’re deploying – using Kubernetes – to a constantly shifting  set of servers.

What we see again and again are two problems. The first is from a resource allocation point of view – we don’t get the right hardware mix for our workload and performance is below expectations as a result. Kubernetes is able to handle the most resources on a platform – but more complex resources like CPU cache, NUMA topology and even power – aren’t part of the resource model.

The second major problem is observability – it’s not easy to see why a workload is underperforming expectations, and the resolution is invisible as a result. These issues have a large impact on workload and overall cluster performance.


Q3: You talk about increasing performance - what tools do we have to meet those Key Performance Indicators?

A: Out of the box Kubernetes does a lot of resource management, and things like CPU claims and telemetry driven autoscaling are built in. If you’re able to track the right telemetry, the built in tools can keep your key metrics in line.

There’s a lot of use cases outside that box, but part of the allure of Kubernetes is its many extension and customization points. If your needs are more fine grained the system needs to be helped along, there’s a bunch of tools that allow you to help yourself. There’s no support for exotic hardware but you can write whatever device plugins you’d like.

You can track performance indicators through a telemetry stack managed by Kubernetes – something like collectd and Node Exporter with Prometheus collecting metrics - and transform them into something useful in Prometheus or with some other toolkit. If your indicator is out of line with objectives, though, the resolution can be very workload dependent. Two actions that are applicable to a lot of workloads will be scaling out and using Telemetry Aware Scheduling to direct new workloads to the right platform for them.


Q4: Tell us please about Telemetry Aware Scheduling, how it addresses the challenge, and how it is implemented?

A: With Telemetry Aware Scheduling (TAS)  we’re coming at the performance problems by connecting information from the bottom – platform telemetry – to decisions at the top – the scheduler. It links a specific metric expectation for workloads and helps to direct them to the right platform. Each workload can have a customized policy that specifies its needs and preferences in terms of telemetry signals.

TAS is a scheduling extender, it sits alongside the Kubernetes scheduler and gives it hints about which platforms offer the best signal for a given workload. It can block placement on platforms in the danger zone, and it can even reschedule workloads from one platform to another if the data passes some critical threshold, such as there not being enough bandwidth available or if there’s some critical health issue likely to hit the server.

TAS works on any telemetry you can think of, with signals coming through Prometheus or some other metrics database. Anything that can be expressed as a numeric signal – including categories – can be used to influence or direct placement with TAS. That lends itself to a wide range of use cases from power management to dealing with cache misses and device allocation.

At Kubecon Amsterdam Tom Golway, Chief Technologist at Hewlett Packard Enterprise’s Advanced R&D group, and I showed a demo that combined a bunch of these signals – including everything from network bandwidth to resource NUMA alignment – into a single “golden” telemetry signal for workload placement.


Q5: What other features does K8s need to further determinism and high performance?

A: With Cloud Native Network Functions you’ve got a complicated system with specific performance needs. From a high level, determinism is more important than pure performance – you can add more hardware to boost throughput, but more servers won’t help if you can’t predict what you’ll get from each new workload.

Kubernetes wasn’t really built with high performance or low latency computing in mind, so there’s still a lot of places to improve. One of the big roadblocks right now is NUMA topology management – where the physical location of resources like CPU, memory and network cards can have a big impact on workload performance.

Topology Manager – worked on by my colleague Conor Nolan and others in the community - has been a part of Kubelet since last year. It solves the NUMA issue at the node level, but there’s a gap at the cluster level. The Kubernetes control plane isn’t able to manage the NUMA needs of workloads, and the result is unpredictable performance degradation.

There’s a community effort today to bring Topology Aware Scheduling to Kubernetes. That’s the next big step for offering deterministic performance, but there’s still a lot of room to make these technologies accessible and usable.


Q6: How do you see the evolution of these technologies moving forward?

A: I think it’s important to remember that most developers don’t want to think about hardware – they just want their workload to run. But there’s lots of exciting stuff happening on the hardware level – with more forms of compute like QAT and specialized AI chips seeing increased adoption – and that’s going to improve performance of key workloads.

What we need to find in the Kubernetes world is the right abstraction to give developers and operators the tools to understand a platform in terms of its capabilities instead of in terms of its hardware. It’s almost like a communication problem – the hardware has these great capabilities, but it’s buried in Byzantine yaml so users have trouble getting it. The right expression will carry the full power of the hardware, make sense in Kubernetes semantics, and still be understandable from an end user perspective.


Learn more:

Telemetry Aware Scheduling:

Kubecon EU 2020 talk:

CNCF Talk: