# Volumetric modelling of data from Intel® RealSense™ camera

Let’s say you have an Intel® RealSense™ camera and you have already created a pointcloud, which shows a single frame of depth information in 3D. Now you want to combine multiples of these frames to form a single solid model of the objects in front of the camera. For this, you will need two things - some information about the motion of the camera, and a way to combine these data into a single model. To get information about about the movement, you can either use one of the newer Intel® RealSense™ cameras with a motion sensor, or you can estimate it from the depth data themselves, as shown in the 3D scanner project. This article is about the second part of the problem, on how to combine the data given that you already know how the camera is moving.

If memory and bandwidth were free, we could just store all the pointclouds and render them together. However, this would not only be very inefficient (we would have millions of points after just a few seconds of recording), it would also end up looking very noisy. You need some way to combine and reduce the information into a more compact form. There are many algorithms on creating a simplified mesh of triangles from a pointcloud. However, we want to continuously add thousands of frames of data together, in which case a volumetric model is a simpler and faster solution. One disadvantage of this method is that it only allows us to model a small area and not a whole room, but these cameras have a relatively small range, so they are not suited for that anyway.

You can imagine a simplified volumetric model as a 3D grid where we set a *voxel* (volumetric pixel) to 1 if a point lies within it. This would still be very inefficient and noisy, with the addition of looking too much like Minecraft, but it serves as a good mental model. In this algorithm, each voxel is going to be storing the distance from the center of the voxel to the closest point in the pointcloud. This is called a signed distance function (SDF), which will return the distance to the nearest surface. If you give it a position that is right on the surface, it will return 0, and if it’s inside the object, it will return a negative distance.

This article is divided into two main parts. We start at the end, where we learn how to render this kind of model (or any other signed distance function), and only then do we get around to creating the model.

## Rendering the model

Before we construct the model, it is useful to have a way to render it first. You can see an implementation of this in GLSL in the 3D scanner, but it’s pretty easy to implement yourself. The algorithm is known as raymarching, which is similar to raytracing.

We are going to represent our model as a signed distance function. A basic SDF for a sphere of radius 1 looks like this in GLSL:

float signedDistance(vec3 p) { return length(p) - 1.0; }

All the algorithm does is that if you give it a point on the sphere, for example (1, 0, 0), it will return 0. With the input of (1.2, 0, 0), it will return 0.2, which is the distance to the surface. You can create a function like this for many kinds of objects. An object represented this way is pretty easy to render, unlike in regular raytracing where you have to calculate the intersection of the ray with surfaces. For each pixel on the screen, cast a ray through the camera. In practice, this means calculating a starting position for the head of the ray and a direction. Call the above function to find out how far away you are from the object, and move the head of the ray further by that amount. Once the distance is 0, calculate the color at that point and you’re done.

*Source: GPU Gems 2, chapter 8*

The above image shows how the ray starts at p0 and moves forward until it reaches the surface of some object. The circles around it represent the signed distance, i.e. the distance by which we can safely move forward. The following piece of code shows the core of the algorithm:

vec4 raymarch(vec3 position, vec3 viewDirection) { for (int i = 0; i < MAX_STEPS; i++) { float dist = signedDistance(position); if (dist < EPSILON) { return calculateColor(position, viewDirection); } else { position += dist * viewDirection; } } return vec4(0.0, 0.0, 0.0, 1.0); // black }

I haven’t defined the function `calculateColor`

to keep this article short. For now, you can just return the color white and implement Phong lighting later. Here is how to use this function:

vec3 position = vec3(coord, -2.0); vec3 camera = vec3(0.0, 0.0, -1.0); vec3 viewDirection = normalize(camera - position); outColor = raymarch(position, viewDirection);

The `coord`

variable is the position of the pixel on the screen in the range of (-0.5, -0.5) to (0.5, 0.5). To simplify things, I gave the starting position and camera a fixed location in space, but normally you’d use a view matrix to move the scene.

The image below shows a simple sphere SDF rendered by the code above. There are many things you can do with raymarching, for example combine multiple objects, create complex scenes, and create effects that would be difficult to achieve with traditional triangle rasterization. For the purpose of rendering our volumetric model, we are pretty much done. The only modification needed will be to look up the SDF function from a 3D grid instead of calculating it on the spot.

## Creating the model

This method is described in detail in the paper A Volumetric Method for Building Complex Models from Range Images, but the basic principle can be shown without the mathematical details. Imagine the pointcloud in 3D space. Now imagine a cube that you put around these points, so that most of them are inside of the cube. The cube is divided into a 3D grid, and each *voxel* (volumetric pixel, or grid element) will store the distance to the nearest point of the pointcloud, with negative values if they are inside the object. The image below shows this in 2D. The dots are the pointcloud and the grid is our volumetric model. The zeroes are where we estimate the surface of the real world object should be.

The problem is that finding the closest point to the center of each voxel would require searching through all the points of the pointcloud. We can do this much faster by projecting the position of the center of the voxel, which gives us a 2D coordinate on the projection plane. Now imagine our raw depth data are sitting on this projection plane. We deproject the depth data at that coordinate (see the article on pointcloud creation), which gives us a point that is along this line. Of course, this doesn’t really find the closest point, this just finds a point along this line, but it’s much faster and works well enough. Once we have this point, we calculate the distance between it and the center of the voxel, which is what we store into it.

To add another frame of depth data into this model, we just do the same as for the first frame but add the previous values with the new ones. So the basic function looks like this:

vec2 coord = project(voxelCenter); vec3 point = deproject(depthData, coord); vec3 camera = vec3(0.0, 0.0, 0.0); float sdf = distance(point - camera) - distance(voxelCenter, camera); float newWeight = oldWeight + 1.0; float newSdf = (oldSdf*oldWeight + sdf*1.0)/newWeight;

The way you calculate `voxelCenter`

depends on how big the cube is and where exactly you want to place it. A good choice is to use a cube with one corner at (-0.5, -0.5, 0.0) and the other at (0.5, 0.5, 0.5), but you can also scale it or move it a bit further away from the origin so that it doesn’t capture some useless data close to the camera. The important part here is to transform the `voxelCenter`

by the movement matrix that you got either by using the motion sensor or by estimating it:

voxelCenter = detectedMovement * voxelCenter;

Eventually, as you turn the camera around the object, the grid will contain zeroes where the original surface was and negative values only on the inside of the object.

Each voxel of the grid will store both the sdf and the weight. A grid of 256 items per side works reasonably well, larger ones consume a lot of memory. The voxels should be initialized with the distance equal to the size of one voxel (1/256) and a weight of 0. We also don’t really bother storing distances higher than some threshold, for example 3 voxels away, which you can express using the weights (the code above just uses a weight of 1.0 to keep things short).

I recommend using some mocked-up depth data with which to test the model. You can use some captured data from the depth camera, but you could use the depth data generator in the file testdata.js in the 3D scanning project. Calling `createFakeData(100, 100, mat4.create())`

will create fake data that contain a sphere and a box next to each other as shown below. You can give it a different matrix to manipulate the position of the camera. Since you know the position in which the camera was when generating the data, you can calculate the movement between the two frames and use it to combine them into a single model. The following images show the two frames of generated depth data between which the camera moved a little.

To render the grid with the raymarcher, all you have to do now is to look up which voxel (part of the grid) corresponds to the current position in space, and read the distance from it. The following images show the volumetric model created from the fake depth data shown above. The first model doesn’t have any information about how the camera moved between the two frames. The second one was given this information and was therefore able to reconstruct a smooth model. As more data are added, the model becomes more precise.

## Where to go from here

This article describes just a sketch of the basic algorithm. It can be taken further - you could add color information into the model, interact with the model, or add some way to export it into a program like the Blender* toolset.

I recommend checking out the 3D scanning project, which uses the same algorithm and some of our other demos with the Intel® RealSense™ cameras if you’re on the lookout for ideas.

For more information, check out these articles:

- How to create a 3D view from a depth camera in WebGL
- Depth camera capture in HTML5
- Tutorial: Typing in the air using depth camera, Chrome*, Javascript*, and WebGL* transform feedback
- Background removal with Intel® RealSense™ depth camera, WebRTC*, and WebGL*