The Next Generation of Cloud Storage Fabric: NVMe-oF*
For many people, cloud storage has become indispensable for storing personal photos, sharing files with work colleagues, and other daily tasks.
There are many cloud storage solutions on the market. Some cloud storage providers build their service on private platforms, like Amazon AWS*, Microsoft Azure*, and Google* cloud storage, while other providers use various open source projects, like Ceph*, OpenStack*, and OpenSDS*. One of the most important factors for any cloud storage service, is the speed of data access, which depends heavily on the performance of the backend storage devices.
The following picture shows current options available in cloud storage backend systems or in typical server systems. In the past, there was a large, well-known gap between hard drives and RAM, however, in the past few years, this gap has narrowed.
In the evolution of mechanical storage devices, the first shift was from hard drives to NAND SSD, providing better latency and improved drive performance, making it a better storage option. In the last decade, NAND SSD has undergone rapid development, resulting in advanced capabilities and widespread use.
Hard drive and NAND SSD are simply storage media, and for either, a storage communication protocol is necessary for data transmission from storage device to the host. Currently PCIe* is one of the fastest storage communication protocols and NVM Express* is built on top of the PCIe bus protocol, making it optimal for PCIe-based solutions. Since NVMe performs highly-parallel I/O, it allows SSDs to perform to their full potential. At a high level, NVMe is viewed as a host controller interface specification that makes SSD and host communication faster.
The below diagram compares HDD and NAND SSD in I/O performance in I/O per second (IOPS) and latency evolution history. Compared to a hard drive, IOPS of NVMe increased about 1000 times, and latency reduced 20 times. Compared to legacy SATA SSD, IOPS of NVMe increased more than ten times. Higher IOPS and lower latency delivers a better end user experience. In the current cloud environment, NVMe SSDs, such as PCIe-attached NVMe drive families from Intel, are replacing SATA SSDs and hard drives, and increasing the volume of cloud storage.
NVMe* over Fabrics (NVMe-oF*)
With the development of NVMe SSD, the local computing capability on a single server can no longer exploit the full performance of SSD. That is to say, the compute capability is becoming a performance bottleneck. To solve this problem, storage is separated from compute and SSDs are put into a storage cluster, which can be accessed by many remote compute servers.
The NVMe committee developed the NVMe over Fabrics (NVMe-oF) specification to enable communication between a host and storage over a network, and solve the network bandwidth and latency challenge from the data transmission between compute server and remote storage.
About 90% of the NVMe-oF protocol is the same as the local NVMe protocol. The NVMe-oF protocol mainly extends the transport section of the local NVMe protocol to support various networking fabrics, such as RDMA and Fibre Channel. Because the NVMe interface is very efficient and lightweight, the bottleneck is removed.
The following image shows a modern cloud deployment model, which typically includes a set of compute nodes, which are disaggregated, as well as some disaggregated NVMe storage nodes. We can dynamically create nodes out of compute and storage resources, using these fabric interconnects.
Without a doubt, it’s impossible to have a remote connection that is as fast as a direct-connected device. The goal of NVMe-oF is to add no more than 10 microseconds of latency to the storage system, compared to a direct-connected NVMe SSD, making the difference between local storage and remote storage very small. The NVMe-oF protocol is so fast because it uses RDMA.
Remote Direct Memory Access (RDMA)
So what exactly is RDMA? We must understand this clearly, because it’s a critical factor in improving performance. RDMA is a direct memory access from one computer’s memory to another computer’s memory, without using the CPU or operating system of either computer.
RDMA is a transport protocol that is similar to TCP/UDP, however, it bypasses the kernel stack and offloads the workload to network hardware, meaning that the transport layer takes place in the RDMA NIC itself. The end result is that data is transferred directly to or from application memory without having to involve the normal OS-level network stack. RDMA achieves low latency and low CPU consumption, which is especially useful for cloud storage.
So, the question is: how does RDMA benefit NVMe-oF? NVMe-oF sends native NVMe commands over the RDMA-based networking fabric, which enables RDMA to eliminate most of the processing that occurs with normal network communications. Using RDMA, the added latency is minimal, so that network communications can occur at a speed that is close to native NVMe communications.
Due to high performance, NVMe SSDs, like Intel's PCIe-attached NVMe drive families, are replacing SATA SSDs and hard drives in cloud storage environment. NVMe-oF allows NVMe storage to be accessed remotely, and can keep the same performance as local NMVe interface relying on RDMA.
NVMe-oF support is growing in cloud storage projects, and the NVMe-oF enabling work is primarily based on the Linux* kernel or Storage Performance Development Kit (SPDK). There are several popular open source storage projects that support NVMe-oF, including Ceph, OpenStack, and OpenSDS. Learn more by downloading and trying the projects listed in the repositories links below.
- Storage Performance Development Kit (SPDK): https://github.com/spdk/spdk
- Ceph: https://github.com/ceph/ceph
- OpenStack: https://github.com/OpenStack/OpenStack
- OpenSDS: https://github.com/opensds/opensds
- Intel® Solid State Drive Data Center Family with Non-Volatile Memory Express* (NVMe*): https://www.intel.com/content/www/us/en/products/docs/memory-storage/solid-state-drives/intel-ssd-dc-family-for-nvme.html
- OpenSDS details: https://www.opensds.io/
- NVMe* and NVMe-oF* Specifications: https://nvmexpress.org/resources/specifications/
- NVMe-oF introduction: https://www.nvmexpress.org/wp-content/uploads/NVMe_Over_Fabrics.pdf
- RDMA introduction: https://en.wikipedia.org/wiki/Remote_direct_memory_access
Qiaowei Ren, Cloud Software Engineer at NST/DSS/SSP, Intel Corporation
Intel and 3D XPoint are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.