Sorry, you need to enable JavaScript to visit this website.


Your feedback is important to keep improving our website and offer you a more reliable experience.

Open Visual Cloud Building Blocks

The four core building blocks used to build a visual cloud service include: Encode, Decode, Inference, and Render.  Each of these building blocks represents the underlying technology of the processes that make up a visual cloud service pipeline.

Developers can use the four basic building blocks and arrange them into a variety of pipelines for different services. For example, a simple transcode service is realized with decode + encode core building blocks. Insertion of an inference building block (decode + inference + encode) would result in a media analytics service relevant for digital security and surveillance or user generated content ad-insertion use cases where intelligent content analysis is required. 
Intel is contributing to each core building block with new and existing projects and enhanced performance. After observing that encode is a required building block across all the visual cloud services, Intel released several Scalable Video Technology (SVT) encoder core libraries, along with interoperability with x264 and x265 decoders, to support the ecosystem's needs. Additionally, the OpenVINO™ Toolkit and the Intel® Rendering Framework make up the inference and render blocks respectively. 


At the most basic, encode is compressing video data to reduce it in size. Since the visual cloud is run on video data, Encode becomes one of the key building blocks developers will use in constructing most visual cloud services and pipelines. As of 2019, AV1 has emerged as a new entrant now commercially viable, thanks to SVT-AV1. 
There are a variety of individual open source ingredients that make up each Open Visual Cloud Building Block. Some of these ingredients include:
  • Scalable Video Technology (SVT) Video encoding technology optimized for x86 processors. Supported codecs include HEVC, VP9, and AV1.
  • FFmpeg - FFmpeg is a open source project consisting of a vast software suite of libraries and programs for handling video, audio, and other multimedia files and streams.
  • x265 - x265 is a H.265 / HEVC video encoder application library, designed to encode video or images into an H.265 / HEVC encoded bitstream.
  • x264 - x264 is an open-source software library and a command-line utility developed by VideoLAN for encoding video streams into the H.264/MPEG-4 AVC format.
  • Open WebRTC Toolkit – Open WebRTC Toolkit is an open source real-time media delivery framework, which includes comprehensive media processing functions on video and audio streams.


Decode is defined as uncompressing encoded video data. Decode goes hand-in-hand with Encode, as once your video has been encoded, it needs to be decoded to process or view on a screen. Decode technology can be either hardware or software-based. As new HD video file codecs hit the market, we will see a greater adoption of  these new codecs into hardware such as TVs, video cameras, smartphones, and others. As of early 2019, some of the open source decoders include:

  • VLC Media Player - VLC is a free and open source cross-platform multimedia player and framework that plays most multimedia files.
  • dav1d - dav1d is a new AV1 cross-platform decoder, which is open-source, and focused on speed and correctness.
  • Mozilla Firefox - Firefox is an open source web browser that released AV1 support in Firefox 65 in Jan of 2019.
  • Android Q - Google* has announced that Android* Q will support AV1 decode.
  • Chrome - Google's Chrome/Chromium 70 web-browser is now shipping with AV1 video decoding support


An inference building block analyzes video content data. Inference is used by Artificial Intelligence (AI) to perform many tasks, from facial recognition to ad insertion, smart city use cases such as street corner traffic management, based on deep learning neural networks. 
As video become more ubiquitous, the need to analyze what is shown in the video becomes more and more important. Using AI models, it is possible to train applications to search for specific patterns within a video (a person, a vehicle, brand logo, etc.) and act on what it finds.
Individual open source ingredients that make up the Inference building block can be found in the OpenVINO™ Toolkit, and include:
  • Deep Learning Deployment Toolkit (DLDT):  A key component of the OpenVINO toolkit, DLDT provides a model optimizer and inference engine that supports Intel CPU and GPU (Intel® Processor Graphics) and heterogeneous plugins.
  • OpenVINO - OpenVINO™, short for Open Visual Inference and Neural network Optimization, is a toolkit that provides developers with improved neural network performance on a variety of hardware (CPU, GPU, FPGA, VPU) and helps them further unlock cost-effective, real-time vision applications.
  • Open Model Zoo - Pre-trained deep learning models and samples for use in OpenVINO.


Video rendering is the process by which a computer processes information from a coded data source and uses that information to produce and display an image. This is usually in the context of creating a video animation or visualizing a large data set.
Individual open source ingredients for the Render building block are found in the Intel® Rendering Framework, which includes: