VR inside-out vs outside-in tracking

I found sometimes myself discussing with other VR enthusiasts about if inside-out positional tracking is better or worse than outside-in one. Here are my considerations about this topic.

Just to make a little recap about these technologies, I’m talking about solutions to offer positional tracking for virtual reality, the so-called “room-scale”. Since the DK2 times I’ve realized how positional tracking is important in VR to offer a sense of immersion: if you move in the real world and you don’t move in the virtual counterpart, this is so weird and also causes discomfort. All the most advanced VR headsets (Oculus, Vive, PSVR) offer this functionality, while cheap mobile ones (Daydream, Cardboard, GearVR) do not and this is the reason why some people say that mobile VR is not true VR.

But how a device can offer that? There are two strategies:

  • Outside-in: the device ships with external cameras, that track its spatial position. Explain how it does work is quite long and you can have a look at my post on SteamVR tracking to get an idea. Basically, you have to install some external cameras (2+) in your room and then configure the system so that it knows the position of each camera relative to each other and relative to a world reference system (this is something that you inherently do during the setup process of the VR system). After that initialization, external cameras track some reference points on the headset and on the controllers (if any). Since the position of these points on the device is well known and the position of the camera is known thanks to the initialization, it is possible to reconstruct the position of the headset and the controllers at each frame thanks to some mathematical magic. Usually outside-in works with infrared light;
Oculus Rift outside-in positional tracking
Oculus Rift CV1: notice all its feature points highlighted by an IR camera. You can see them even by using some webcams while using the Rift. These are the points that get tracked by all the Constellation cameras to reconstruct Rift position in space (Image by iFixit)
  • Inside-out: the device ships without any external cameras. It has some amount of (usually RGB) cameras on his front side. Cameras usually are 2, but for instance, Hololens uses 4 (+1 depth camera). These two cameras do not require any true calibration since they’ve been put at a precise position by the hmd manufacturer, so cameras know their relative position. The only thing that has to be done during setup is defining a world reference system to put boundaries in VR (this is useless in AR, in fact HoloLens do not require such stuff). The two cameras scan the images of the world surrounding them and try to find in these images some special feature points. Notice that while in outside-in tracking the feature point on the devices are some bright points that are easy to be found, here the system has to find inside the images of the world surrounding us some points that are considered stable to be tracked . Given the position of the same feature points as seen by both cameras, it is possible to reconstruct the 3d position of such points and so infer a rough 3d model of the world surrounding the user. Seeing how these points move in time it is possible to infer how the headset is moving and so offer positional tracking through some SLAM mathematical magic.
Microsoft Windows 10 virtual reality headset
Acer Mixed-Reality headset prototype. Notice the two tracking cameras on the left and right side of the device? Well, these are the cameras that track its position (image by Windows Central)

What are the advantages of the two solutions?

Outside in has the following advantages:

  • Ultra-precise and ultra-fast tracking: at the moment inside-out tracking is still not as precise and fast as outside-in one;
  • Less need of powerful hardware: Analyzing some reference points is easier than extracting feature points from RGB images, so less computational power is needed. All inside-out solutions require an additional processor dedicated to spatial analysis: on Hololens this is called the HPU and stays in the device along the CPU and the GPU;
  • Better controllers tracking: controllers can be tracked everywhere, even if users have them behind his back. With inside-out tracking this is not possible, since tracking cameras are on the front side of the headset, so can’t track anything that is behind them. This is one issue that I’ve already discussed while talking about Microsoft Mixed-Reality headsets;
  • External objects tracking: for the above reasons, with external cameras it is easy to track whatever object is present in the gaming area. For instance, with Vive you can buy Vive Trackers and track all your full body and also game props like guns and bottles. With outside-in you can’t do this, because you can track only what you see and because tracking analyzing raw RGB images makes more difficult to detect and track external objects;

    VR gun Vive tracker
    A Rifle tracked in virtual reality thanks to a Vive Tracker put on it. This is the kind of stuff that is very useful inside arcades (Image by Polygon)
  • It works in the dark: thanks to IR lights emitted, Vive tracking can work even in the dark. This holds for Oculus too: some people report its tracking works even better in the dark. Good luck in doing it with inside-out tracking and its RGB cameras (see for instance these considerations on Microsoft Acer headset).

Inside out has instead these other advantages:

  • It just works out of the box: no complex calibrations to be performed, no cameras to install anywhere, no difficult setups. I mean, for us it is no problem installing some Constellation cameras, but for the average user it is pretty difficult;
  • Less hardware needed: no cameras, no mounts. If you have to go to an exhibition you can carry only the headset and nothing more (it’s super handy!);
  • Works everywhere: you are not restricted to a room-scale-area. You can move everywhere: theoretically, you can wear it and walk through all your house. Microsoft calls that world-scale. With Vive you’re tailored within a single VR play area, here you’re free;
  • Mixed Reality: Tracking cameras could theoretically be used for image analysis of what the user is seeing and to perform mixed reality stuff.

Which one is better? Well, it depends on the use, as always. For augmented reality of course inside-out is the only choice that actually has sense, since AR requires that the user can move freely everywhere just wearing a headset on his/her head. For VR, inside-out is surely better for the average user, since it is far more user-friendly and doesn’t require difficult setups. It has some problems in tracking controllers, but for most VR stuff having the hands tracked even if they’re not in sight is not necessary. The professional user may find interesting the outside-in tracking because of its better performance: most VR enthusiasts love praising the Vive for having the best tracking technology out there. Gamers too would prefer outside-in, thanks to its possibility to track the controllers in every condition, that is fundamental in fighting/action games (like taking the shotgun behind the back in Robo Recall). Arcades are all-in with outside-in since it offers the ability to track all the players and gaming elements inside the play area: it offers a flexibility that the other technology can’t offer.

So, I envision a short term future where standard users will all use inside-out for its simplicity, while enterprise users and professionals will use outside-in for its performance. I see these technologies evolving side by side. About the far future, well, I think that inside-out technology will evolve thanks to the use of more cameras and better algorithms and in the end, all people will use it. Outside-in will still be used in arcades and in some enterprise niches because of its ability to track all objects in the scene.

And you? What is your opinion about these two technologies? Have you something more to add? Let me know in the comments!

Skarredghost

AR/VR developer, startupper, zombie killer. Sometimes I pretend I can blog, but actually I've no idea what I'm doing. I tried to change the world with my startup Immotionar, offering super-awesome full body virtual reality, but now the dream is over. But I've still not waken up...