Sorry, you need to enable JavaScript to visit this website.

Bridging the gap

BY John Harrison ON Sep 28, 2020

Rigging Tools is software that integrates 3D keypoints with anything that consumes them; it is a sort-of "middleware" that helps bring the two halves of motion capture (mo-cap) together. The distance between these two halves is what I refer to as the "gap."

Mo-cap has been around for a long time, so how can there be a gap? To answer this question, let's take a look at mo-cap today and tomorrow.

Mo-cap today

Let's say you want to capture human motion for a movie, or video game, or anything. How would you do it? Most likely you would use a commercial marker-based solution such as Vicon, or maybe a sensor-based solution such as Xsens. If you're on a budget and have a small space you could use the Kinect or RealSense markerless solution with a software plugin. If you're the hacker type you could locate an old binary of the now-obsolete-and-proprietary NiTE SDK from Prime Sense and come up with your own solution. These are valid options, each with benefits and limitations, and if we stop here it's safe to say the problem of mo-cap has been solved for all use cases.

Mo-cap tomorrow

Except it hasn't. Traditional motion capture falls apart with:

  • new detection methods
  • cloud processing and distribution
  • novel use cases

I'll explain each of these problems in more detail and highlight where I see gaps.

New detection methods

Deep neural networks can identify key points from videos freakishly well. Add a few more time-synchronized views, triangulate and cluster the keypoints, and you now have 3D keypoints. These multi-camera arrays + deep learning tools can detect keypoints in massive open spaces such as outside arenas or busy streets - all without the need for markers. Traditional mo-cap solutions cannot do this; however, 3D keypoints are not the same as kinematically-correct and animatable rigs. This distinction between 3D keypoints and rigs represents a gap.

Now consider a moving path instead of a fixed space. Drones with depth cameras and auto registration can follow a subject anywhere the drone can go, providing mo-cap that's not limited to a single location but instead to a single subject. This solution also scales well by adding drones for accuracy or multiple subjects. This type of keypoint detector could follow actors through a live parkour chase scene - very cool! However, this also doesn't produce kinematically-correct animatable rigs. 

You know that face detection feature on your phone or tablet? It probably uses a depth camera, meaning many of us have mini mo-cap devices and don't realize it. These depth cameras, as well as external ones such as RealSense, have the power to provide high-accuracy near-field keypoint detection at a low cost, creating a potentially unlimited source of markerless mo-cap data. However, in every case I've seen so far, these don't produce kinematically-correct animatable rigs.

These new key point detectors have amazing potential but present a gap: they need the ability to create kinematically-correct animatable rigs.

Cloud processing and distribution

Existing mo-cap systems route data in a proprietary way over USB cables, ethernet cables, Bluetooth, WiFi, and even serial cables (yes, even today). The future of mo-cap needs to upload data to the cloud for rendering, processing, archiving, and distributing. This will require stream-agnostic, well-defined, and technically appropriate data formats capable of being handled by the smorgasbord of different services in the cloud. Basically, it's the opposite of what's currently implemented.

Ignoring the assumption that most mo-cap systems will require a major overhaul to support a cloud-based workflow, the lack of a cloud-friendly data format can be considered its own gap.

Novel use cases

Traditional users of mo-cap data are animators and game developers, but novel use cases are emerging that widen the audience. Autonomous vehicles can use mo-cap data to understand pedestrian behavior and intent. Servers can mine mo-cap data for new types of derived data metrics. IOT is spreading like wildfire and could use mo-cap data for human-device interaction and social avatar sharing.

These use cases don't align with traditional uses of mo-cap data and certainly won't route through animation software on a user's PC, so they present yet another gap.

Addressing the gap

Traditional mo-cap systems are not equipped to deal with new detection methods, cloud processing and distribution, and novel use cases; these collectively create the gap to which I'm referring.

One way to understand how Rigging Tools fills the gap is by observing where it sits in an end-to-end solution.

Imagine you are a team lead who has been asked to create a video. This video must take motion capture of anonymous people on a busy sidewalk and render them as aliens on Planet X. You are given multiple camera angles of the street (each connected to the cloud), a cloud budget for processing, an engineering team, and an animation team. Ignoring any privacy concerns, here is how you might approach a solution:

Engineering requirements

  • Markerless because random people on the street aren't going to wear markers
  • Server-based cloud solution since cameras connect to the cloud directly
  • Find and implement a SOTA 3D keypoint detector
  • Convert 3D key points to animatable rigs <-- use kp2rig from Rigging Tools
  • Share rigs with the animation team <-- rig files from Rigging Tools are easily shared
Animation requirements
  • Import animatable rigs into animation software <-- use rig2blender from Rigging Tools
  • Assign alien models to each rig
  • Create a visually-rich video with groovy background music
  • Export the final result as a video


...and get promoted because this is groundbreaking stuff.

Notice how Rigging Tools fits in this solution. It doesn't create 3D keypoints nor does it render the final experience; rather, it links the engineering team's output with the animation team's input. It creates animatable and sharable rigs, operates in the cloud, and supports the target use case without forcing its own requirements on the architecture.

Designed to grow

What if you want to use a different keypoint detector? What if you're building a proprietary product for profit? What if live streaming is needed?

Rigging Tools adapts as needed through extensible design, no frills open-source licensing, and support of time-segmented rigs for streaming.

Rigging Tools also defines its own rig format. Existing formats are available for representing rigs (see the list here) but most are text-based, proprietary (or have proprietary origins), and not well-suited for streaming over the Internet. In light of this, the future of Rigging Tools may include more output formats. For now, Rigging Tools imports rigs into animation software and game engines by using the rig2x plugins.

Conclusion

Rigging Tools is an important component in future mo-cap solutions, bridging the gap between 3D keypoints and anything that consumes them. It can adapt to new developments and standards without trying to do more than it claims. It's a small piece of a bigger puzzle that, when put together with other pieces, has the potential to bring tomorrow's mo-cap into today's reality.