Sorry, you need to enable JavaScript to visit this website.
Home / Intel in Kubernetes* / Blogs / Tingjie / 2020 / Introduction to a Cloud-Native Storage Orchestrator: Rook*

Introduction to a Cloud-Native Storage Orchestrator: Rook*

Tingjie Chen
Oct 19, 2020

Introduction

Cloud-native applications based on Kubernetes* (or K8s) are widely used in production environments, which introduces a challenge: how do you integrate a traditional storage system into a Kubernetes cluster? We propose Rook as a solution.

Rook is an open source cloud-native storage orchestrator that transforms storage software into self-managing, self-scaling, and self-healing storage services. The storage systems that are supported by Rook include Ceph*, EdgeFS, Cassandra*, CockroachDB*, NFS, and YugabyteDB.

 

Storage systems supported by Rook

This article uses Ceph as an example to introduce the conception and framework of Rook, and then explains the deployment and application. You can quickly get started with Rook by following the steps in this article.

Rook-Ceph Architecture

About Ceph

Before we explore Rook, let’s review Ceph. If you are familiar with Ceph, you can skip this section.

Ceph is an open-source, highly scalable, distributed storage solution for block storage, shared filesystems, and object storage with years of production deployments. It was born in 2003 as an outcome of Sage Weil’s doctoral dissertation and then released in 2006 under the LGPL license.

Ceph consists of several components:

  • MON (Ceph Monitors) are responsible for forming cluster quorums. All the cluster nodes report to MON and share information about every change in their state.
  • OSD (Ceph Object Store Devices) are responsible for storing objects and providing access to them over the network.
  • MGR (Ceph Manager) provides additional monitoring and interfaces to external management systems.
  • RADOS (Reliable Autonomic Distributed Object Stores) is the core of Ceph cluster. RADOS ensures that stored data always remains consistent with data replication, failure detection, and recovery among others.
  • LibRADOS is the library used to gain access to RADOS. With support for several programing languages, LibRADOS provides a native interface for RADOS as well as a base for other high-level services, such as RBD, RGW, and CephFS.
  • RBD (RADOS Block Device), which are now known as the Ceph block device, provide persistent block storage, which is thin-provisioned, resizable, and stores data striped over multiple OSDs.
  • RGW (RADOS Gateway) is an interface that provides object storage service. It uses libRGW (the RGW library) and libRADOS to establish connections with the Ceph object storage among applications. RGW provides RESTful APIs that are compatible with Amazon* S3 and OpenStack* Swift.
  • CephFS is the Ceph Filesystem that provides a POSIX-compliant filesystem. CephFS uses the Ceph cluster to store user data.
  • MDS (Ceph Metadata Server) keeps track of file hierarchy and stores metadata only for CephFS.

Operator Pattern

Rook operator is the core of the Rook framework. Operator is a custom Kubernetes controller that make use of CRs (Custom Resource) to manage applications and their components.

The Kubernetes controller watches the state of your cluster resource and tries to move the current cluster state closer to the desired state. Each controller is responsible for a specific resource and is implemented by reconciliation loop. When the monitored resource is created, updated, or deleted, the reconciliation is triggered.

Rook has defined several CRs, we provide a PVC based CephCluster CR as an example. PVC is the concept of Kubernetes. For details, you can refer to: https://kubernetes.io/docs/concepts/storage/persistent-volumes/

Rook for Ceph

The Rook operator is a simple container that has all that is needed to bootstrap and monitor the storage cluster. The operator starts and monitors Ceph daemon pods, such as MONs, OSDs, MGR, and others. It also monitors the daemon to ensure the cluster is healthy. Ceph MONs are started or failed over when necessary. Other adjustments are made as the cluster grows or shrinks.

Just like native Ceph, Rook-Ceph provides block, filesystem, and object storage for applications.

  • Ceph CSI (Container Storage Interface) is a standard for exposing arbitrary block and file storage systems to containerized workloads on Container Orchestration Systems like Kubernetes. Ceph CSI is integrated with Rook and enables two scenarios:
    • RBD (block storage): This driver is optimized for RWO pod access where only one pod may access the storage.
    • CephFS (filesystem): This driver allows for RWX with one or more pods accessing the same storage.
  • For object storage, Rook supports the creation of new buckets and access to existing buckets via two custom resources: Object Bucket Claim (OBC) and Object Bucket (OB). Applications can access the objects via RGW.

For Ceph cluster maintainers, there are three methods to configure Ceph:

  • Toolbox + Ceph CLI: The recommended method is to execute Ceph commands in a Rook Toolbox pod.
  • Ceph Dashboard: The second recommended method is to use native Ceph. This method the same priority as configuring via Ceph CLI.
  • Advanced configuration via ceph.conf override of ConfigMap. Some settings cannot easily be modified via CLI or dashboard. For example, we cannot directly delete MONs with Ceph CLI. The only method is to override configMap.

Programmers can develop their own plugin by using the Kubernetes client API to access ConfigMap or Toolbox pod to manipulate a Rook-Ceph cluster.

Start to use

This section shows the deployment of a Rook-Ceph cluster and the RBD provisioning of a Kubernetes application.

Software Configuration

  • Ubuntu*

18.04 LTS

  • Docker*

19.03.8

  • Kubernetes

1.18.6

  • Rook

1.3.3

  • Ceph

Nautilus 14.2.10

  • Ceph OSD PVC

Local volume

 

Rook-Ceph deployment

Prepare the Kubernetes cluster

Since Rook is based on the Kubernetes cluster, you need to set up a Kubernetes cluster and configure the network properly. If you have an existing Kubernetes cluster, you can skip this section.

1. Install the docker engine: https://docs.docker.com/engine/install/ubuntu/

2. Install kubeadm, kubelet, and kubectl: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#installing-kubeadm-kubelet-and-kubectl

sudo apt-get update && sudo apt-get install -y apt-
transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo
apt-key add -
cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

    3. Pull the docker images:

    kubeadm config images pull --kubernetes-version=v1.18.6
    

      4. Bring up the Kubernetes cluster on the master node:

      sudo swapoff -a
      sudo kubeadm init --pod-network-cidr 10.244.0.0/16 --apiserver-advertise-address <your_host_ip> --node-name master --ignore-preflight-errors=all
      kubectl apply -f https://docs.projectcalico.org/v3.14/manifests/calico.yaml
      kubectl taint nodes master node-role.kubernetes.io/master:NoSchedule-
      

        5. Join the Kubernetes nodes:

        kubeadm join <master_host_ip>:6443 --token zifmp3.27h736nwdfjli6fi \
            --discovery-token-ca-cert-hash sha256:5c11bfd28f016fd15b656850324de5d4d9a042c9a9e620aba3d1c959b7ac0ad5
        

          Bring up Rook and the Ceph cluster

          Bring up the Rook operator and the Ceph cluster with these daemons: MONs, OSDs, MGR, and Ceph CSI plugin and driver.

          1. Launch Rook common components with CRDs (Custom Resource Definition):

          kubectl apply -f https://github.com/rook/rook/blob/release-1.3/cluster/examples/kubernetes/ceph/common.yaml

            2. Launch the Rook operator with CSI support:

            kubectl apply -f https://github.com/rook/rook/blob/release-1.3/cluster/examples/kubernetes/ceph/operator.yaml
            

              3. Launch the Ceph cluster:

              
              kubectl apply -f https://github.com/rook/rook/blob/release-1.3/cluster/examples/kubernetes/ceph/cluster.yaml
              

                Before applying this yaml configuration, you need to adjust some daemon parameters in your own environment.

                 

                Settings in Ceph monitors:

                  mon:
                    count: 3
                    allowMultiplePerNode: false
                

                  Settings in Ceph OSDs, node and raw devices, or filters must be assigned:

                    storage: # cluster level storage configuration and selection
                      useAllNodes: true
                      useAllDevices: true
                      #deviceFilter:
                      config:
                        # metadataDevice: "md0" # specify a non-rotational storage so ceph-volume will use it as block db device of bluestore.
                        # databaseSizeMB: "1024" # uncomment if the disks are smaller than 100 GB
                        # journalSizeMB: "1024"  # uncomment if the disks are 20 GB or smaller
                        # osdsPerDevice: "1" # this value can be overridden at the node or device level
                        # encryptedDevice: "true" # the default value for this option is "false"
                  # Individual nodes and their config can be specified as well, but 'useAllNodes' above must be set to false. Then, only the named
                  # nodes below will be used as storage resources.  Each node's 'name' field should match their 'kubernetes.io/hostname' label.
                  #    nodes:
                  #    - name: "172.17.4.201"
                  #      devices: # specific devices to use for storage can be specified for each node
                  #      - name: "sdb"
                  #      - name: "nvme01" # multiple osds can be created on high performance devices
                  #        config:
                  #          osdsPerDevice: "5"
                  #      - name: "/dev/disk/by-id/ata-ST4000DM004-XXXX" # devices can be specified using full udev paths
                  #      config: # configuration can be specified at the node level which overrides the cluster level config
                  #        storeType: filestore
                  #    - name: "172.17.4.301"
                  #      deviceFilter: "^sd."
                   

                    Use PVC to provision OSD:

                    kubectl apply -f https://github.com/rook/rook/blob/release-1.3/cluster/examples/kubernetes/ceph/cluster-on-pvc.yaml

                     

                        storage:
                         storageClassDeviceSets:
                         - name: set1
                           # The number of OSDs to create from this device set
                           count: 3
                           # IMPORTANT: If volumes specified by the storageClassName are not portable across nodes
                           # this needs to be set to false. For example, if using the local storage provisioner
                           # this should be false.
                           portable: true
                           # Certain storage class in the Cloud are slow
                           # Rook can configure the OSD running on PVC to accommodate that by tuning some of the Ceph internal
                           # Currently, "gp2" has been identified as such
                           tuneDeviceClass: true
                           # Since the OSDs could end up on any node, an effort needs to be made to spread the OSDs
                           # across nodes as much as possible. Unfortunately, the pod anti-affinity breaks down
                           # as soon as you have more than one OSD per node. If you have more OSDs than nodes, K8s may
                           # choose to schedule many of them on the same node. What we need is the Pod Topology
                           # Spread Constraints.
                           # Another approach for a small number of OSDs is to create a separate device set for each
                           # zone (or other set of nodes with a common label) so that the OSDs will end up on different
                           # nodes. This would require adding nodeAffinity to the placement here.
                           placement:
                             podAntiAffinity:
                               preferredDuringSchedulingIgnoredDuringExecution:
                               - weight: 100
                                 podAffinityTerm:
                                   labelSelector:
                                     matchExpressions:
                                     - key: app
                                       operator: In
                                       values:
                                       - rook-ceph-osd
                                     - key: app
                                       operator: In
                                       values:
                                       - rook-ceph-osd-prepare
                                   topologyKey: kubernetes.io/hostname
                             topologySpreadConstraints:
                             - maxSkew: 1
                               topologyKey: kubernetes.io/hostname
                               whenUnsatisfiable: DoNotSchedule
                               labelSelector:
                                 matchExpressions:
                                 - key: app
                                   operator: In
                                   values:
                                   - rook-ceph-osd
                                   - rook-ceph-osd-prepare
                           resources:
                           #   limits:
                           #     cpu: "500m"
                           #     memory: "4Gi"
                           #   requests:
                           #     cpu: "500m"
                           #     memory: "4Gi"
                           volumeClaimTemplates:
                           - metadata:
                               name: data
                               # if you are looking to give your OSD a different CRUSH device class than the one detected by Ceph
                               # annotations:
                               #   crushDeviceClass: hybrid
                             spec:
                               resources:
                                 requests:
                                   storage: 64Gi
                               # IMPORTANT: Change the storage class depending on your environment (e.g. local-storage, gp2)
                               storageClassName: local-sc
                               volumeMode: Block
                               accessModes:
                                 - ReadWriteOnce
                           # dedicated block device to store bluestore database (block.db)
                           - metadata:
                               name: metadata
                               spec:
                                 resources:
                                   requests:
                                     # Find the right size https://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#sizing
                                     storage: 3Gi
                               # IMPORTANT: Change the storage class depending on your environment (e.g. local-storage, gp2)
                               storageClassName: local-sc       
                    #     volumeMode: Block       
                    #     accessModes:       
                    #       - ReadWriteOnce  

                      4. Launch toolbox, which has Ceph CLI utilities support:

                      kubectl apply -f https://github.com/rook/rook/blob/release-1.3/cluster/examples/kubernetes/ceph/toolbox.yaml

                        Volume Claim and Application

                        This phase is for the application. Request the RBD volume by claim and mount it on the application.

                        1. Define StorageClass for Ceph-CSI:

                        kubectl apply -f https://github.com/rook/rook/blob/release-1.3/cluster/examples/kubernetes/ceph/csi/rbd/storageclass.yaml

                        In the StorageClass configuration, the provider name MUST be: rook-ceph.rbd.csi.ceph.com and there are pool settings for the RBD volumes.

                          2. Launch the CochroachDB application and define the mount point with RBD volumes provided by Ceph CSI.

                          
                          kubectl apply -f https://github.com/cockroachdb/cockroach/blob/master/cloud/kubernetes/cockroachdb-statefulset.yaml

                          Use these volume claim settings.

                              volumeClaimTemplates:
                              - metadata:
                                  name: datadir
                                spec:
                                  accessModes:
                                    - "ReadWriteOnce"
                                  storageClassName: rook-ceph-block
                                  resources:
                                    requests:
                                      storage: 16Gi
                            

                              After the deployment, check the status of Rook-Ceph and the application pods.

                              Configure Ceph cluster

                              As mentioned before, there are three ways to configure a Rook-Ceph cluster:

                              1. Toolbox and Ceph CLI.

                              Once the rook-ceph-tools pod is running, connect it with these commands:

                                kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash
                                

                                For example, create a new replication pool: rbd2 like this:

                                   

                                  2. Ceph Dashboard.

                                  For the details of dashboard, refer to this page: https://rook.io/docs/rook/v1.3/ceph-dashboard.html

                                     

                                    3. Advanced configuration via ceph.conf override ConfigMap.

                                    There are some settings that cannot easily be modified via CLI or dashboard, such as Ceph monitor count. It is part of CRD: CephCluster and monitored by an operator. You need to remove the dedicated monitor by editing configmap: rook-ceph-mon-endpoints after decreasing the mon.count of CephCluster CRD.

                                    kubectl edit configmap rook-ceph-mon-endpoints -n rook-ceph

                                     

                                    Another case is the parameters in ceph.conf. You need to update configmap: rook-config-override.

                                    kubectl edit configmap rook-ceph-override -n rook-ceph
                                    -------------------------------------
                                    apiVersion: v1
                                    kind: ConfigMap
                                    metadata:
                                      name: rook-config-override
                                      namespace: rook-ceph
                                    data:
                                      config: |
                                        [global]
                                        osd crush update on start = false
                                        osd pool default size = 2
                                    

                                     

                                    If you have requirement for upgrading or cleanup, you can refer to the documentation in the Rook community.

                                      Summary

                                      In this article, we introduced the conception of operator, Rook-Ceph framework, and how to quickly deploy PVC based OSDs for Ceph.

                                      Compared with a native deployment of Ceph cluster, the advantages of Rook are obvious: It can simplify the deployment, management, expansion, and upgrade efforts. Since Rook is deployed in a Kubernetes cluster, the Ceph client and storage node can be easily scheduled and reused. Compared with dedicated “storage” and “client” nodes, it helps reducing cost in a lightweight cluster.

                                      Reference

                                      Notices and Disclaimers

                                      Intel technologies may require enabled hardware, software or service activation.

                                      No product or component can be absolutely secure. 

                                      Your costs and results may vary. 

                                      ® Intel Corporation.  Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries.  Other names and brands may be claimed as the property of others.