Accelerate Istio* Dataplane with eBPF Part 1
The term service mesh is used to describe the network of microservices that make up such applications and the interactions between them. Istio is by far the most popular service mesh because of its rich features set. This article proposes a method to optimize the default dataplane in and accelerate it with a new kernel-customized mechanism —eBPF. This article is the 1st part of a trilogy. It introduces background knowledge for better understanding of our idea and generalizes some key steps to achieve the goal of acceleration.
Extended Berkeley Packet Filter (eBPF) is a highly flexible and efficient virtual machine-like construct in the Linux* kernel that allows it to execute bytecode at various hook points in a safe manner. It is used in a number of Linux kernel subsystems, most prominently networking, tracing, and security.
There are many different Berkeley Packet Filter (BPF) program types available; two of the main types for networking are explained in the subsections below.
BPF_PROG_TYPE_SOCK_OPS (sock_ops for short) allows BPF programs of this type to access some of the socket's fields (such as IP addresses, ports, etc.). It is called multiple times from different places in the network stack code. In addition, It uses the existing BPF cgroups infrastructure so the programs can be attached per cgroup with full inheritance support. We use sock_ops to capture the sockets that meet our requirement and add them to the map accordingly.
BPF_PROG_TYPE_SK_MSG (sock_msg for short) can be attached to the sockhash map to capture every packet sent by the socket in the map and determine its destination based on the msg’s fields (such as IP addresses, ports, etc.)
Helper functions enable BPF programs to consult a core-kernel-defined set of function calls to retrieve data from or push data to the kernel. Available helper functions may differ for each BPF program type. For example, BPF programs attached to sockets are only allowed to call into a subset of helpers, compared to BPF programs attached to the TC layer. One helper function is explained below.
This helper is used in programs that implement policies at the socket level. If the message *msg* is allowed to pass (i.e. if the verdict eBPF program returns **SK_PASS**), redirect it to the socket referenced by *map* (of type **BPF_MAP_TYPE_SOCKHASH**) using a hash *key*.
Maps are efficient key/value stores that reside in kernel space. They can be accessed from a BPF program to keep the state among multiple BPF program invocations. They can also be accessed through file descriptors from user space and can be arbitrarily shared with other BPF programs or user space applications.
Sockhash and Sockmap are data stuctures used to store kernel-opened sockets. Sockmap is currently backed by an array and enforces keys to be four bytes. This works well for many use cases. However, this has become limiting in larger use cases where a Sockhash wouldbe more appropriate. When the Sock_msg program which is attached on the Sockhash is called to redirect a msg, the 5-tuple lookup key can ensure that the peer socket can be found as soon as possible.
Dataplane in Istio
An Istio service mesh is logically split into a data plane and a control plane. The control plane manages and configures the proxies to route traffic.The data plane is composed of a set of intelligent proxies (Envoy*) deployed as sidecars. These proxies mediate and control all network communication between microservices, which is the focus of our optimization.
At the pod's start stage, a sidecar container and an init container are injected into an application manifest. Traffic is directed from the application services to and from these sidecars without developers needing to worry about it. The sidecar proxies grab the inbound and outbound traffic to and from the container by setting up the iptable rules within the pod namespace.
Get into the application pod namespace and get the configured iptables as shown below.
$ nsenter -t 4215 -n iptables -t nat -S -P PREROUTING ACCEPT -P INPUT ACCEPT -P OUTPUT ACCEPT -P POSTROUTING ACCEPT -N ISTIO_INBOUND -N ISTIO_IN_REDIRECT -N ISTIO_OUTPUT -N ISTIO_REDIRECT -A PREROUTING -p tcp -j ISTIO_INBOUND -A OUTPUT -p tcp -j ISTIO_OUTPUT -A ISTIO_INBOUND -p tcp -m tcp --dport 80 -j ISTIO_IN_REDIRECT -A ISTIO_IN_REDIRECT -p tcp -j REDIRECT --to-ports 15001 -A ISTIO_OUTPUT ! -d 127.0.0.1/32 -o lo -j ISTIO_REDIRECT -A ISTIO_OUTPUT -m owner --uid-owner 1337 -j RETURN -A ISTIO_OUTPUT -m owner --gid-owner 1337 -j RETURN -A ISTIO_OUTPUT -d 127.0.0.1/32 -j RETURN -A ISTIO_OUTPUT -j ISTIO_REDIRECT -A ISTIO_REDIRECT -p tcp -j REDIRECT --to-ports 15001
Traffic between APP and Sidecar in Istio
Pod traffic can be divided into three categories, defined by its direction: Inbound, Outbound and Envoy to Envoy. We will explain these three categories in the subsections below using an example of a request from client service to a server service.
When an application sends a request message to a remote service, the message is sent through a socket created by another application (client) as a socket buffer (SKB) in kernel space. It traverses the network stack until it is intercepted at the netfilter level by iptables rules. It finishes the rest of the travel path in the network stack with the new destination. The socket listens on the loopback interface created by Envoy for outbound traffic with a static ip and port (127.0.0.1:15001) .
Envoy to Envoy (same host)
After the request message has been processed by Envoy, it sends a message to the pod that hosted the server service accordingly. SKB intercepted the message by the iptables rule that was set in the server side network namespace and redirected to the default port (pod_ip:15006) to which Envoy was listening to handle Inbound traffic.
After the request message was processed by Envoy, it uses a loopback address (127.0.0.1) to access the server service. SKB goes through network stack twice and finally arrives at the server side.
How acceleration works
Socket to socket redirection
eBPF allows us to redirect SKB from socket to socket, which saves the cost of traversing the rest of the network stack. The picture below illustrates the traffic flow after the acceleration.
Distinguish and add socket pairs to hashmap
Sock_ops allows us to distinguish sockets based on connection information, such as IP addresses, port numbers, etc. To redirect from socket to socket, we need the socket pair of the connection and then to add them to the sockhash map for the next move.
Redirection for socket in hashmap
SKB_ops is attached to a hashmap, capturing every SKB sent by the socket in the map and determining its destination. To send a SKB to a peer socket, pass 4-tuple information (src ip,src port, dest ip, dest port) of the destination as parameter to the Redirection Helper Function.It searches the peer socket in sockhash map and takes redirection when it finds it. Otherwise, it continues to traverse the network stack as in the past.
At the beginning, this article introduced eBPF and its components including Program Type, Helper Function, and Map. Then we introduced several concepts in Istio: control plane and data plane, sidecar mode, and how it grabs inbound and outbound traffic with iptables. In addition, we divided the traffic path from client to server into three categories and explained what happened to SKB in each category. Finally, we explained how to accelerate traffic between app and sidecar.
· Demysitifying Istio’s Sidecar Injection Model: https://istio.io/latest/blog/2019/data-plane-setup/
· BPF and XDP Reference Guide: https://docs.cilium.io/en/latest/bpf/