Sorry, you need to enable JavaScript to visit this website.

How to create a 3D view from a depth camera in WebGL

BY 01 Staff (not verified) ON Sep 13, 2017

This tutorial demonstrates how you can use a depth camera to create a 3D view of a scene. Each piece of depth and color information is combined into a single point, and the collection of points (called a pointcloud) is then rendered into a picture or animation similar to the one shown below. Note that the camera itself is not moving; only the generated 3D scene is being rotated. The final source code of this article is in the depth camera pointcloud demo repository.

Screen recording

Before you begin, make sure that your system supports your camera by following the installation guide for your OS. This demo currently only supports the Intel® RealSense™ Camera SR300 series (and related cameras like Razer Stargazer* or Creative BlasterX Senz3D*) and Intel RealSense Camera R2001. Next, check that you are using Chromium* version 58 or higher; otherwise use the dev channel to install it. To test that the camera works in Chromium, open the demo page with the final version. You should be able to see a 3D view of what your camera is pointing at, similar to the animation above.

Briefly, the code works like this:

  1. Uploads raw depth and color data as textures into WebGL*.

  2. Creates a vertex (GL_POINT) for each pixel in the depth data.

  3. Properly positions each vertex in 3D by using the camera parameters.

  4. Finds out which RGB pixel corresponds to the 3D vertex and assigns this color to the vertex. 

Simple visualization of depth data in WebGL*

In this section, we will create a simplified version of the full demo where we only upload the depth data to WebGL and display them in flat 2D, with the depth shown as the green color of the point.

The article Depth Camera Capture in HTML5 describes how to use the depth camera in Chromium and how to upload it to WebGL, so we won’t be covering that in much detail here. I will be using the files depth-camera.js and gl.js from the demo source code. Additionally, the demo depends on the glMatrix library for matrix operations. To avoid filling the page with 100 lines of boilerplate, I’m just going to quickly describe the steps to set up the basics, and you can look at the source code for more details.

  1. Create the files index.html and script.js.

  2. Download the source files depth-camera.js, gl.js, gl-matrix.js and include them.

  3. In the HTML file, add a WebGL canvas and two <video> elements with the IDs colorStream and depthStream.

  4. In the JS file, get a WebGL2 context and create two textures: colorStreamTexture and depthStreamTexture (called u_color_texture and u_depth_texture in the shaders).

  5. You will also need the WebGL extension called EXT_color_buffer_float to upload the floating point data (depth) into a texture.

  6. Look at the index.html and script.js files from the demo to see how this is done in detail.

Setting up the camera

Next, we can use functions from the depth-camera.js file to set up the depth camera. The following function will get the color and depth streams and assign them to the two <video> elements that we created in the HTML file before. It then returns the parameters of the camera in an object containing items like the camera’s focal length and offset. Later, we’ll describe how to use the returned parameters.                                                                                  

async function setupCamera() {
   var [depth_stream, color_stream] = await DepthCamera.getStreams();
   var video = document.getElementById("colorStream");
   video.srcObject = color_stream;
   var depth_video = document.getElementById("depthStream");
   depth_video.srcObject = depth_stream;
   var parameters = DepthCamera.getCameraCalibration(depth_stream);
   return parameters;

The following code snippet shows how to execute code once both the color and depth stream are ready. While we don’t really need to use the color stream in this section of the tutorial, it’s better to have it ready. Now we just need to fill in the highlighted parts, A and B.

var colorStreamElement = document.getElementById("colorStream");
var depthStreamElement = document.getElementById("depthStream");
var colorStreamReady = false;
var depthStreamReady = false;
colorStreamElement.oncanplay = function() { colorStreamReady = true; };
depthStreamElement.oncanplay = function() { depthStreamReady = true; };
var ranOnce = false;
var animate=function() {
   if (colorStreamReady && depthStreamReady) {
       var width = depthStreamElement.videoWidth;
       var height = depthStreamElement.videoHeight;
       if ( ! ranOnce ) {
           // part A: initialization
           ranOnce = true;
       // part B: run this on every frame

Uploading the data to the GPU

Normally, when using textures in a fragment shader, WebGL automatically gives us information about the texture in the shaders. Therefore, we don’t need to index into it manually, and we don’t need to know what size the texture is. However, because we are using the texture as a raw data storage that we access in the vertex shader, we have to do these things ourselves. In the initialization part (highlighted as part A in the previous snippet), we want to upload the depth texture size and indices into the texture.

gl.canvas.width = width;
gl.canvas.height = height;
gl.viewport(0, 0, gl.canvas.width, gl.canvas.height);
var shaderDepthTextureSize = gl.getUniformLocation(program, "u_depth_texture_size");
gl.uniform2f(shaderDepthTextureSize, width, height);

var shaderColorTextureSize = gl.getUniformLocation(program, "u_color_texture_size");
gl.uniform2f(shaderColorTextureSize, colorStreamElement.videoWidth, colorStreamElement.videoHeight);
var indices = [];
for (var i = 0; i < width; i++) {
   for (var j = 0; j < height; j++) {
var shaderDepthTextureIndex = gl.getAttribLocation(program, "a_depth_texture_index");
var buffer = gl.createBuffer();
gl.bindBuffer(gl.ARRAY_BUFFER, buffer);
gl.bufferData(gl.ARRAY_BUFFER, new Float32Array(indices), gl.STATIC_DRAW);
gl.vertexAttribPointer(shaderDepthTextureIndex, 2, gl.FLOAT, false, 0, 0);

The code above first adjusts the canvas to the depth camera resolution and then uploads the depth and color texture size as a pair (width, height). The rest of the code calculates the indices, which look like an array [0, 0, 0, 1, 0, 2, ..] instead of [[0, 0], [0, 1], [0, 2], ..], since we can give WebGL the information about the stride in vertexAttribPointer. Thus, every vertex receives a vec2 that tells it which item in the depth data is it processing. You could also calculate this index directly in the shader by using gl_VertexId. This is all you will need in the initialization section.

Next, we want to actually upload the video frames into WebGL. Copy the following code snippet into part B:

gl.bindTexture(gl.TEXTURE_2D, colorStreamTexture);
gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA, gl.RGBA, gl.UNSIGNED_BYTE, colorStreamElement);
gl.bindTexture(gl.TEXTURE_2D, depthStreamTexture);
gl.texImage2D(gl.TEXTURE_2D, 0, gl.R32F, gl.RED, gl.FLOAT, depthStreamElement);
gl.drawArrays(gl.POINTS, 0, width * height);

The colorStreamTexture and depthStreamTexture items are texture references that we created earlier. Note that while the color data are stored in a typical RGBA texture, the depth data gets stored in a floating-point GL_RED texture, meaning each pixel is represented as a single floating-point number.

The last line with drawArrays is important. It creates width*height vertices, one for every pixel of depth data. Normally we would create triangles out of vertices, but GL_POINT is more useful for our purposes (and it wouldn’t be a pointcloud otherwise). We can proceed to parse each pixel in the vertex shader.

The simplified shaders

Now that we have our data uploaded to the GPU, we can try showing the data in some way before doing the full 3D pointcloud model. Keep in mind that we are creating a point for each piece of depth data; there are no triangles here. We could achieve a simplified version of the final model just by creating a rectangle and mapping the depth texture onto it, but the current method is closer to what the code will look like in the end and helps us understand what’s going on.

In the following vertex shader, u_depth_texture is our depth data, and a_depth_texture_index is the pair describing our position within the data (for example (0, 0) or (1, 2)). The u_depth_texture_size is simply the pair (width, height), and the v_depth is a temporary hack to visualize the data by passing the depth into the fragment shader. I have also declared u_color_texture_size, but we won’t need it yet.

uniform sampler2D u_depth_texture;
attribute vec2 a_depth_texture_index;
uniform vec2 u_depth_texture_size;

uniform vec2 u_color_texture_size;

varying float v_depth;
void main() {
   vec2 depth_texture_coord = a_depth_texture_index / u_depth_texture_size;
   float depth = texture2D(u_depth_texture, depth_texture_coord).r;
   v_depth = depth * 10.0;
   gl_Position = vec4(depth_texture_coord, 0, 1);

We first calculate the texture coordinate by normalizing the index (making it range between 0 and 1). The depth data are stored in the texture with the format GL_RED, so we get the red component from that. We don’t yet know in what unit it is. We will find that out later when we get the depth_scale parameter from the camera calibration data.

By putting this depth information into v_depth, we’re passing it to the fragment shader. This is just a temporary way to quickly visualize the data; later the fragment shader will not know anything about the depth. I’m also multiplying it by 10.0 (you can try playing around with this number if your image looks too dark or too bright), because the raw data looks a bit dark otherwise. Then we set the position of the vertex to be the simple 2D coordinate, without taking the depth into consideration. Later we will properly create a 3D model, but for now we just want to get started.

The following piece of code is the fragment shader, which doesn’t do much. It only grabs the v_depth passed from the vertex shader and sets it as the green color of the point. We don’t use the color texture yet.

precision mediump float;
uniform sampler2D u_color_texture;
varying float v_depth;

void main() {
   gl_FragColor = vec4(0, v_depth, 0, 1);

Once you have all of this working, your result should be similar to the picture of a coffee cup below. Don’t worry that the image is inverted and showing only in the top right quarter of the canvaswe will fix this soon. You can have a look at the full source code for this part under the tag tutorial-1.


Deprojecting 2D positions into 3D

In computer graphics, the word “projection” usually means taking 3D data and making them into 2D data. Taking a picture with a camera is a typical example of this. Here we do the opposite by trying to reconstruct the 3D view using the 2D information and depth data.

Right now, we only have a flat model of the data, and we want to use the depth information to add another dimension. We can’t just simply set the depth as the Z component of the position. We need to know how distant each point is relative to the distance between the other 2D points. For example, if the depth camera could recognize objects 100 meters away, the raw data of 0.5 would mean that that point is 50 meters away. The model would look squished or stretched if we didn’t take this into consideration. Additionally, the depth camera is not perfectly centered with the color camera (depending on how the camera actually works) and usually has a different focal length than the color camera (meaning it can see further to each side).

Of course all of these factors depend on the specific camera you are using. You can get the details for your particular camera by using the parameters returned by the setupCamera function that we described previously but have not used yet. Replace your current call to setupCamera with the following:

   .then(function(cameraParameters) {
       uploadCameraParameters(gl, program, cameraParameters);

Next, define the uploadCameraParameters function that will give us access to the parameters in the shaders:

function uploadCameraParameters(gl, program, parameters) {
   var shaderVar = gl.getUniformLocation(program, "u_depth_scale");
   gl.uniform1f(shaderVar, parameters.depthScale);
   shaderVar = gl.getUniformLocation(program, "u_depth_focal_length");
   gl.uniform2fv(shaderVar, parameters.depthFocalLength);
   shaderVar = gl.getUniformLocation(program, "u_depth_offset");
   gl.uniform2fv(shaderVar, parameters.depthOffset);

Here, we declare them in the vertex shader and also add a function that computes the proper 3D position of the point.

uniform float u_depth_scale;
uniform vec2 u_depth_offset;
uniform vec2 u_depth_focal_length;

vec4 depth_deproject(vec2 index, float depth) {
   vec2 position2d = (index - u_depth_offset) / u_depth_focal_length;
   return vec4(position2d * depth, depth, 1.0);

The depth_deproject function expects the depth to be in meters, so we have to convert it in the main function:

void main() {
   float depth_scaled = u_depth_scale * depth;
   vec4 position = depth_deproject(a_depth_texture_index, depth_scaled);

The position is the approximate 3D space position that we wanted.

Perspective and rotation

Next we want to fix the positioning of the image and add perspective projection, thus we need to upload a model-view-projection (MVP) matrix. You don’t need to calculate it if you have included the gl.js file into your project. Instead you can just call getMvpMatrix and it will do it for you. It will flip the image, center and scale it properly, and add perspective projection. Add the following snippet next to the code where the textures get uploaded so that it gets called on each animation frame:

var shaderMvp = gl.getUniformLocation(program, "u_mvp");
gl.uniformMatrix4fv(shaderMvp, false, getMvpMatrix(width, height));

As a bonus, the MVP matrix also enables mouse rotation if you add the following code somewhere in the beginning of your main function to update the mouse position:

var canvasElement = document.getElementById("webglcanvas");
canvasElement.onmousedown = handleMouseDown;

Now just add it to the vertex shader:

uniform mat4 u_mvp;
void main() {
   gl_Position = u_mvp * position;

The image below shows the same scene as the previous picture, but this time it’s facing the right way. I also slightly rotated it to the side using the mouse rotation to make it more visible now that it’s a 3D model. The full source code for this section of the tutorial is under the tag tutorial-2 and you can take a look at what code to change from the previous simplified version in the file diff.

Combining the color and depth data

Given a point in 3D space, how do we find out what color it is? The depth and color data have a different size and can be shifted. Luckily, the 4x4 depthToColor matrix does this whole transformation—together with information about the focal length and offset, which are usually different for the color and depth—in the camera calibration parameters. Update your uploadCameraParameters function to add them:

function uploadCameraParameters(gl, program, parameters) {
   shaderVar = gl.getUniformLocation(program, "u_color_focal_length");

   gl.uniform2fv(shaderVar, parameters.colorFocalLength);
   shaderVar = gl.getUniformLocation(program, "u_color_offset");
   gl.uniform2fv(shaderVar, parameters.colorOffset);
   shaderVar = gl.getUniformLocation(program, "u_depth_to_color");
   gl.uniformMatrix4fv(shaderVar, false, parameters.depthToColor);

Next, add the following code to your vertex shader. The first three variables are the parameters we just uploaded. The last variable, v_color_texture_coord, is the coordinate into the color texture that we will calculate and pass on into the fragment shader.

uniform mat4 u_depth_to_color;
uniform vec2 u_color_offset;
uniform vec2 u_color_focal_length;
varying vec2 v_color_texture_coord;

vec2 color_project(vec4 position3d) {
   vec2 position2d = position3d.xy / position3d.z;
   return position2d * u_color_focal_length + u_color_offset;

The color_project function takes the 3D point and figures out which position on the color texture belongs to it. You can think of it as a reversal of the depth_deproject function, but it will find the position for the color pixel, not the original depth pixel. Not all positions returned from the color_project function will be valid positions in the color texture. The depth camera usually has a wider field of view, and we simply won’t have the color data for some points. We can deal with that in the fragment shader in a moment.

void main() {
       vec4 color_position = u_depth_to_color * position;

   vec2 color_index = color_project(color_position);
   v_color_texture_coord = color_index / u_color_texture_size;


Add the code above at the end of your main function in the vertex shader. First we use the u_depth_to_color matrix to correct for different parameters of the color and depth cameras, and then we find the 2D position on the color texture. We normalize this 2D position and pass it onto the fragment shader as v_color_texture_coord.

precision mediump float;
uniform sampler2D u_color_texture;
varying vec2 v_color_texture_coord;

void main() {
   if    (v_color_texture_coord.x <= 1.0
       && v_color_texture_coord.x >= 0.0
       && v_color_texture_coord.y <= 1.0
       && v_color_texture_coord.y >= 0.0) {

       gl_FragColor = texture2D(u_color_texture, v_color_texture_coord);
   } else {
       gl_FragColor = vec4(0, 0, 0, 0);

If the color and depth cameras had the same field of view, we could just use the single line with texture2D, which assigns the color from the RGB texture to the current point.

However, since the color camera has a field of view that is smaller than the field of view of the depth camera, there will be points for which we have no color information. This code hides those points by giving them a black color, but you could choose to make them visible by giving them some other distinct color. The image below shows the result of this section. You can get the full source at the tag tutorial-3. This code also includes the distortion handling from the next section.

Adjusting for distortion models

Some cameras use distortion models that change the way the projection/deprojection works. Describing how this works is beyond the scope of this article, but you can read about the distortion models in the librealsense documentation. The source code includes the handling of the distortion.

Information about the distortion is already in the camera parameters, so what remains is only to upload a couple more numbers and adjust our projection and deprojection calculations. Look at the final source code for this tutorial to see what exactly needs to be done.


The pointcloud you have created can be used in multiple ways: for gesture recognition, reconstructing a full 360 degree 3D model of some object, recognizing object in the foreground and removing the background, or whatever else strikes your fancy. Have a look at the librealsense home page to see how the cameras are being used.



1 You can add support for more depth cameras by editing the depth-camera.js file and adding the camera calibration data, which you can find by querying them in librealsense. These are currently hard-coded because the Mediacapture-depth API is not stable yet.