Cilium on Rancher

As part of our awesome /mid week where each year we have a look at new promising technologies in the Kubernetes & Cloud Native space, we took the time to have a deeper look into Cilium and tried out different features of it on multiple Rancher clusters with Cilium as CNI (Container Network Interface).

In this blog post we would like to highlight some first thoughts and experiences about Cilium, since we believe it has a huge potential in becoming the de facto standard CNI on Kubernetes clusters. Even if it’s not hard to install and use Cilium on Rancher clusters, we would like to describe how it’s done. This should give you the possibility to set it up on your own Rancher setup within a few minutes.

What is Cilium?

Cilium goes a whole new way in order to bring networking into Kubernetes. In contrast to traditional Kubernetes networking plugins which make heavy use of iptables, Cilium completely removes this dependency or at least tries to minimize its usage. That’s mainly because the founders of Cilium (https://isovalent.com/) saw the huge potential of a new upcoming technology inside the Linux Kernel called eBPF.

eBPF allows applications to inject small code snippets directly into the kernel by attaching them to “attachment points” such as kernel functions (kprobes), userspace functions (uprobes), system calls, sockets and many more. Since eBPF is a huge topic on its own and provides enough content for multiple books, we won’t explain it here any further. If you want to get it to know better, we recommend to have a look at Thomas Graf’s (Cilium co-founder) talk about “BPF & Cilium” and/or the official eBPF documentation.

Cilium leverages eBPF to provide a

  • highly scalable Kubernetes CNI which interconnects nodes via native routing, VXLAN or GENEVE (optionally encrypted with IPsec),
  • kube-proxy (& iptables) load balancer replacement,
  • multi-cluster connectivity solution,
  • microservice-aware network policy enforcement (L3 – L7) and visibility solution.

… to just name a few features. If you want to dig deeper into the Cilium features, consider having a look at Thomas Graf’s “Cilium – Bringing the BPF Revolution to Kubernetes Networking and Security” talk and/or the official cilium.io website.

Installing Cilium on Rancher

Installing Cilium as a CNI plugin on a Rancher cluster is quite simple. The following steps are not much different from the ones in the official “Quick Installation” guide. They just need some ajustments on the Rancher side as a prerequisite.

In the cluster creation menu of Rancher’s UI, you need to select Edit as YAML since the graphical interface only allows to choose from predefined list of “Network Providers”.

In the YAML, search for the network: dictionary and replace everything below it with plugin: none.

Change:

network:
  mtu: 0
  options:
    flannel_backend_type: vxlan
  plugin: canal

to:

network:
  plugin: none

Hint: This YAML configuration also works the same way if you are setting up your Rancher cluster using the RKE CLI tool.

Now you can click Next and let Rancher create the cluster.

Before adding nodes to the cluster, you should ensure they all have the eBPF filesystem mounted:

ubuntu@mid-cilium-node3:~$ sudo mount | grep /sys/fs/bpf
none on /sys/fs/bpf type bpf (rw,relatime,mode=700)

If there is no none on /sys/fs/bpf ... output shown, you need to mount it and persist the configuration using the following commands:

sudo mount bpffs -t bpf /sys/fs/bpf
sudo bash -c 'cat <<EOF >> /etc/fstab
none /sys/fs/bpf bpf rw,relatime 0 0
EOF'

Afterwards, you can add the nodes to the cluster using the Rancher-provided sudo docker run ... rancher/rancher-agent:v2.4.8 ... commands. After a few minutes, you will see that the Nodes overview will show an error message in the Rancher UI:

This error is expected: The Kubernetes components are not able to fully communicate with each other because you didn’t deploy the CNI plugin, yet. To remedy the situation, first add the official Cilium Helm v3 chart repository to the Rancher catalog (Tools -> Catalogs -> Add Catalog).

Now you are able to deploy Cilium as a Rancher App inside the cluster’s System project.

Specify the Helm chart values according to your needs. All available values can be found in the official values.yaml file in the Cilium Github repository. In our setup, we chose to enable IPv6 to use dual stack, exposed some Prometheus metric endpoints and enabled Hubble to get some insights into the cluster internal communication paths.

 

---

hubble:

  listenAddress: ":4244"

  metrics:
    enabled:
    - dns:query;ignoreAAAA
    - drop
    - tcp
    - flow
    - icmp
    - http

  ui:
    enabled: true
    ingress:
      enabled: true
      hosts:
        - hubble.cilium.<one-of-your-node-ips-here>.xip.puzzle.ch

  relay:
    enabled: true

ipv6:
  enabled: true

prometheus:
  enabled: true

After 1-2 minutes, you should see all nodes as active, and proper deployments of all Cilium objects.

Finally, the Cilium app should automatically have created an Ingress object which allows you to access the Hubble web UI via hubble.cilium.<one-of-your-node-ips-here>.xip.puzzle.ch (or whatever URL you’ve specified in your Helm chart values).

Conclusion

While only a quick glance at Cilium so far, we already consider its potential to be massive, especially if the requirements for Kubernetes networking include items like the following ones:

  • Provide networking to large Kubernetes clusters: Cilium is able to completely work without iptables, so it scales way better than other CNI plugins (see “BPF as a revolutionary technology for the container landscape” for details).
  • Short latency / high throughput applications: Cilium is able to skip TCP/IP stack traversals inside the Kernel in certain use cases (e.g. when a Service Mesh like Istio is used, which deploys an Envoy proxy sidecar container into each pod).
  • Observability needs: Cilium comes with the Hubble UI which provides helpful insights into all communication paths.

Of course, Cilium also offers further quite handy features, but some of them are already supported in other CNI plugins like Canal, Calico, Flannel or Kube-Router as well:

  • Multi cluster support using its Cluster Mesh feature.
  • Deny network policies, cluster-wide network policies and application layer (L7) network policies.
  • Options to use native routing or change the tunneling protocol (VXLAN, GENEVE).
  • Encryption of the tunneled traffic between two nodes using IPsec.
  • CNI Chaining which is especially interesting for setups in a public cloud like AWS or Azure

Last but not least, we would also like to mention that Cilium is currently not officially supported by Rancher or Red Hat OpenShift. So if you are using such a Kubernetes distribution and are paying for the 3rd level support, you are probably better off waiting for the official support for Cilium to arrive. If you nevertheless would like to have Cilium in your production setup and therefore need 3rd level support, Cilium Enterprise from isovalent (company behind Cilium) might be a good option for you. For Red Hat OpenShift, the operator certification is already in progress and the integration with OpenShift OKD is already provided by Cilium.

2 Kommentare

  • Victor Javi, 4. Januar 2021

    Hello,

    Nice article. What should be used for kube-proxy in rke configuration when doing setup for Cilium?

    THanks,
    Vic

  • mm
    Philip Schmid, 5. Januar 2021

    Hi Vic,

    In the described PoC above, we simply used Ciliums default „kubeProxyReplacement“ configuration (which is „probe“, see cilium/values.yaml). We therefore did not need to change any default Rancher kube-proxy configurations. RKE/Rancher does not allow to disable or configure kube-proxy. If you would like to do this, you need to use RKE2. See inside the „RKE2 Basic Configuration“ and „Cilium Installation“ sections from this RKE2 + Cilium lab setup: https://github.com/PhilipSchmid/k8s-home-lab

    Regards,
    Philip