We need camera access to unleash the full potential of Mixed Reality
These days I’m carrying on some experiments with XR and other technologies. I had some wonderful ideas of Mixed Reality applications I would like to prototype, but most of them are impossible to do in this moment because of a decision that almost all VR/MR headset manufacturers have taken: preventing developers from accessing camera data.
My early start with MR
As you may know, I got started with passthrough mixed reality in 2019, far before Quest enabled the use of passthrough. I was using the Vive Focus Plus, and I hacked one of its SDK samples to transform it into a mixed-reality device. The weeks after, Max Ariani (my partner in crime at NTW) and I experimented a lot with this tech, and we managed to do some cool stuff, like:
- Make objects “disappear” trying to do (a very rough) diminished reality
- Applying a Predator-like filter to the environment
- Detecting a QR code to perform the login
- Detect and track an Aruco marker to make a 3D object appear on it
- … and many other things, including our fitness game HitMotion: Reloaded and the musical app Beat Reality made with Enea Le Fons
The tools we had were very limited: the Vive Focus had just a Snapdragon 835 processor, the image was black and white and low-resolution, we had to do everything at the Unity software level, and we had no environment understanding. Besides, at that time, AI was already there, but not growing as fast as today. But notwithstanding this, we managed to do a lot of crazy tests, and we dreamt about the moment that powerful standalone headsets supported high-quality mixed reality to bring these tests to the next level.
Quest and privacy
Those times we hoped for have arrived: the Quest 3 is a machine much more powerful than the Vive Focus, it has a color passthrough with a quite good definition, and AI is now flourishing. But, paradoxically, I can do now much fewer experiments than before.
The reason is that Meta is playing the extra safe way and it is preventing developers from accessing the camera feed seen by the user in MR applications, both as input (getting the image) and output (writing on the image). It is doing that for privacy reasons: if a malicious developer made a cute game and behind the curtains activated the cameras and streamed whatever they saw to its servers, that would be an enormous privacy violation. Evil developers could easily spy on our homes.
Meta had a lot of scandals about its privacy, so to avoid a new one from happening, or even from seeing the press complaining about a potential privacy issue, it has disabled camera access from developers. This camera lock can not be circumvented in any way: as I explain in this post, when you develop an application in Unity for the Quest, the application “flags” part of the screen to be painted with the passthrough view, and then it is the operating system that does this “painting” operation. For the application, the background of the app is pure black, it is only the OS that knows what data to put there. So unless you crack the Quest firmware and its SDK, you have literally no way to get the passthrough from inside your application.
After Meta started raising this privacy concern, all the other vendors slowly started to follow suit, and as far as I know, camera access is now also blocked on Pico and Vive headsets. It is only accessible on some enterprise headsets.
Why is this a limit for mixed reality?
You may wonder why access to camera images is so important. The reason is that mixed reality shines when it can bridge the real and the physical world. But if your application has no understanding of the real world, how can this bridge be created? As a developer, you have no idea where the user is, what he is doing, what he has in front of him. The only thing you can do is to show the camera feed, apply some lame filters, and detect planes and walls. It’s something, but in my opinion, it is not enough to make a whole MR ecosystem flourish.
We live now in an era where there are AI systems for everything, and one of the reasons why MR and AI are a match made in heaven is because AI can understand the context you are in (where you are, what you are doing, etc…) and provide you assistance in mixed reality. For instance, one classical example of our future in MR is having a virtual assistant that provides you with suggestions related to what you are doing. Another example could be an educational experience that trains the user in doing something (e.g. operating a machine) and verifies that the user is doing those actions correctly.
To do that, we should feed the camera stream into some AI system (running locally or on the cloud), but we can not because the operating systems of headsets are preventing us from doing that. So all the vibrant work that the AI community is doing can not be applied to MR headsets.
Another thing that would be possible to do is run computer vision algorithms. The easy idea to understand is detecting QR Codes and markers, which would allow many interesting applications (e.g. providing an easy login without a keyboard for applications). We could also potentially run Vuforia on the Quest and considering that Vuforia can track 3D objects, we could put a mixed-reality overlay on objects without needing to use any tracker.
The ability to write on the image would be cool, too: now we can only apply a colored edge filter and a color mapping operation, but it would be very cool to unlock the possibility of adding filters of any kind to the image. Creators would love this opportunity.
Giving these powers to the community would unlock a huge experimentation on mixed reality, making everyone exploit its full potential. I’m pretty sure that people would come with some amazing prototypes showing things that we didn’t even think about. Some very creative devs already managed to create something cool with the limited tools we have now (think about Laser Dance or Starship Home), so imagine what they could do by using the full power of AI and computer vision.
We could unlock a new type of creativity and enthusiasm in our space, and make the whole technology evolve faster. If you remember that some of the most successful VR games (e.g. Beat Saber and Gorilla Tag) came from small and unknown indie studios, you realize how important it is to let everyone in the community experiment with new paradigms.
How to preserve privacy then?
I hope I have convinced you about the importance for us creators and developers to have access to all the data that we can about the experience that the user is having. But at the same time, there are still concerns about the privacy risks of this operation: as I’ve said before, a malicious developer could harvest this data against your will. So, how we empower the developers without hurting the user?
Of course, since I’m not a security expert, I do have not a definitive answer for you. But I have some ideas to inspire the decision-makers on this matter:
- Most VR headsets are based on Android, and Android is an operating system that cares a lot about these problems already. We have cameras on our phones and we take phones even in private places where we currently do not take our headsets (e.g. in the toilet). But on phones, I can access the camera feed, so it’s a bit strange and I can not do that on a headset. It would be ideal to copy the strategies that Android already employs on the phones, where a popup asks you if you want to give some permissions to the app that you have just opened. If you do not trust the app creator, you can simply not grant this permission. Meta already does that with some features (e.g. for spatial anchors), so it may do that also for passthrough
- In general, as Alvin Graylin said during my interview with him, it’s important to give tools to let the user choose. Asking the user if he/she wants to give an app camera access is a powerful feature. Another good idea could be asking the user WHERE he wants to give camera access: since the Quest can detect which room we are in, the user may decide to consent to camera access in his VR room, but not in his bedroom, for instance
- Meta (or every other vendor… I talk about Meta because it has the most popular device) could use some AI magic to hide some sensitive details from the images: for instance, the AI could detect if there are faces or naked bodies in the frames, and those would appear as censored in the images provided to the application. This would come as an additional computational cost, though
- Meta could start by providing us developers the opportunity to develop “plugins” that use the camera images. For instance, the Meta SDK could allow the registration of a function that takes an image and returns a set of strings. This way I would never manipulate directly the image (so I can not copy or stream it), because it is the OS that just runs my algorithm over it without giving me direct access, but I could still get the results of the data analysis that I wanted to perform
- Alternatively, Meta could wire its SDK to many of its AI and computer vision services, so we could at least have a wide set of tools to use to do some tests and prototypes
- Since Meta reviews every application that goes to its Store, every developer submitting an application requiring the camera feed could undergo heavy scrutiny, with checks on the data transmitted by the app and to what servers, the history of the company, etc… This would make life harder for the malicious developers that want to get to the Meta Quest Store (or every other store)
- Meta could allow camera access only as a developer feature, available only on developer builds that can be distributed via SideQuest. While this is not ideal, it would at least let us developers start to experiment with it and share our work with other techie peers. Every user sideloading an application is most probably a skilled user, who has enough technical expertise to know if he is willing to take the risk or not
These are just suggestions. Probably my friends at XRSI have much better ideas to suggest to mitigate the privacy issues given by the opening of camera access. I care a lot about values like privacy and safety, so I’m all in for empowering developers in a responsible way. And I hope this article will help in triggering a dialogue among all the parties involved (I will share it with both XRSI people and people from headset manufacturers and see what happens), because in my opinion it is crucial that we speak about this topic.
What to do if you need camera access now
What if you need camera access today? What if you want to experiment with AI and MR and you don’t want to wait for Meta/Pico/HTC to provide access to the camera feed? Well, there are some (not ideal) ways that let you at least do some experiments:
- Use a headset that provides the access you want: some enterprise headsets give you access to the images the user sees. They are not many, but they are. For instance, according to its documentation, Lynx R-1 will allow for the retrieval of the camera images
- Use a PC headset: on PC things are much more open than on Android, and usually it’s easier to “find a way”
- Use additional hardware. If you use a Leap Motion controller, you should be able to grab the feed of its cameras according to its docs. And recently Leap Motion has become compatible with standalone headsets like Pico ones. Of course, you must be careful of calibrating the position of Leap Motion’s cameras to the headset’s cameras
- The poor-man version of the point above is to stick a phone in front of your headset and stream the images from your phone to the headset via Wi-Fi. If you want to go the hard tech way, you can connect a USB camera to your HMD and try to retrieve the camera feed by starting from this opensource project and heavily modifying it, hoping that Meta lets you do this operation
- You can also run ADB on a computer that is in the same network as your headset, and let it stream the screen content of your headset to the computer (the ADB commands listed in this old post still apply), where you can grab the frames, analyze them, and then return the results via Wi-Fi to the headset application again. This solution is complicated, adds latency, and requires a big part of the application to show the camera feed (because you stream the screen content, not directly the camera feed), but it could be used to start with some experiments.
UPDATE (2024.03.26): Leland Hedges, the Head of Enterprise Business at PICO XR EMEA, answered this post on Linkedin, saying that Pico allows access to the camera stream data on a case-by-case basis on its Pico 4 Enterprise headset. Reach out to him on Linkedin (or ask me for an introduction) in case you are interested in this possibility
As I’ve said, I hope that this post will trigger a debate in our community about accessing camera data from MR applications. So please let me know your considerations in the comments of this post or on my social media channels. Let’s try to push our ecosystem together, as always.
(Header image by Meta)
Disclaimer: this blog contains advertisement and affiliate links to sustain itself. If you click on an affiliate link, I'll be very happy because I'll earn a small commission on your purchase. You can find my boring full disclosure here.