What is mediated reality and how I experimented with it on the Vive Focus
Less is more. So, when everyone is trying to do augmented reality, I’m sitting here trying to do the opposite, that is Diminished Reality (and Mediated Reality in general).
Spend one minute and watch the two videos below, where I play as a magician and show you the results of my crazy experiments. Notice that there is no post-processing in the parts seen from inside the headset.
Ok, now, if my magician’s tricks got your interest, let me explain better what I am talking about and how I have experimented with these concepts 🙂
Mediated Reality
I discovered the term Mediated Reality thanks to the AR/VR influencer and investor Eduardo Siman. He sent me an interesting paper by Mann, Furness et al. that talks how the Milgram’s Virtuality-Reality Continuum is a good way to specify how you are mixing real and virtual elements, but lacks a component that indicates how you are actually modifying your reality.
Let me do an example: if I show you your real world, but your real black sofa becomes red through your glasses, can we really talk about augmented reality? I am not augmenting, I am changing. And if I remove completely your sofa from the room? I mean, you see your room as if it the sofa wasn’t there. In this case we have not added anything virtual, we have just removed a real element, so it is not a mix of virtual and real, so how would you define this kind of reality? Even in my app Beat Reality, when I show the real world as pulsing edges, I am more showing a modified version of reality rather than actually augmenting it with virtual elements.
Mann proposes a new way to classify realities that goes on two axis: on one axis (the X-axis) you have the Reality-Virtuality continuum, that indicates how much of real and virtual elements there are in your experience and on another axis (the Y-axis), you define how you are modifying the reality that you are presenting. Theys so propose the so-called “Mediated Reality Continuum”.
VR (Virtual Reality) replaces the real world with a simulated
experience (virtual world). AR (Augmented Reality) allows a virtual
world to be experienced while also experiencing the real world
at the same time. Mixed Reality provides blends that interpolate
between real and virtual worlds in various proportions, along a
“Virtuality” axis, and extrapolate to an “X-axis” defined byWyckoff’s
“XR” (eXtended reality), and Sony’s X-Reality™.From the paper “All Reality: Virtual, Augmented, Mixed (X), Mediated (X,Y), and
Mediated Reality goes a step further by mixing/blending and also
modifying reality. This modifying of reality introduces a second
axis called “Mediality”. Mediated Reality is useful as a seeing aid
(e.g. modifying reality to make it easier to understand), and for psychology experiments like Stratton’s 1896 upside-down eyeglasses
experiment.
Multimediated Reality” by Steve Mann, Tom Furness, Yu Yuan, Jay Iorio, and Zixin Wang
The term “mediated” comes from the fact that there is something in the middle that modifies the perceived reality. And here you are an image explaining the mediated reality continuum, taken from the paper:
So, in the above example of the sofa removed from your room, we are actually showing a modified real reality (X = 0, Y = ~0.25). Since it is a mediated reality where we are removing an object, some people love to call it Diminished Reality.
Diminished Reality
Diminished Reality is something really difficult to be made. Adding objects is already hard because they have to appear fixed in the real world while the user is moving, but the action of overlaying a virtual image onto the frame of the real world is something quite “easy”. But how the heck can you remove something that exists??
The problem can split in two parts: identifying the object and then hiding it.
How to identify the object
To hide an object, you have first of all to identify where is that object, so that you can remove its pixels and substitute them with something else.
This is the classical “segmentation” problem that all computer vision people already know well. You have to separate in an image the areas that interest you. This can be made in various way. For instance, if you know that your sofa is the only red object in the room, you can use color information and select only the pixels that are red. If you have a particularly fast classifier, you can use some AI magic to detect automatically if there is a sofa in the current frame and where it is.
Usually a mix of various features (color, edge, movement, corner features, etc…) can be used together with some classifier to do this task. Of course, the desired speed of execution is crucial to decide the proper algorithm: if you want to do some objects removal post-processing in a video, you can use a very heavy and efficient algorithm, but if you are in a headset running at 90Hz, you have to sacrifice performances for speed.
Segmenting an object in real time, especially in a cluttered environment, is really dificult. You have also to consider that in the real world there are a lots of possible complications: maybe on the sofa there are some objects, maybe your sofa is red, but with current lighting it appears more orange and so the detector fails in finding it well.
How to hide the object
Once you have found the object, you have to hide it. To hide it, you have to substitute its pixels with the ones of what the user would see if the object weren’t there.
I mean, if you want to remove a real sofa, you have to show the user the wall and the floor that the sofa is occluding and show them in real-time and with a credible illumination. But if the sofa is occupying that space… you have no information from the image about what there is behind the sofa, because you can’t see it. So, how can you show it?
Well, also thanks to Eduardo, I can tell you that there are various roads that can be taken to show what is behind the object, so that to make it to appear invisible:
- Have a model of the environment. If you can scan your room before putting the furnitures there, you know what actually there is behind the sofa: you know how the floor and the wall are, and so you can show the pixels of the model, insted of the pixels of the sofa. If the model could be updated in real-time, this would be far better, for instance to take in count changes to the environment (e.g. changing in the illumination of the room);
- Have some other complementary data. If you have other cameras and sensors installed in the room, you can use their data to fill the holes. For instance if there is a camera on a Roomba that is cleaning the room, you could use its data to understand how the floor under the sofa is made;
- Use some (AI) extrapolation. Looking at the world around the object to be removed, the system can understand how to fill the holes of what can’t be seen. For instance, if the system detects that the floor is made by a black-and-white checker pattern, it could easily understand that that pattern continues even under the sofa and so use this knowledge to reconstruct properly the floor.
A mix of all the above would be optimal, but in any case you have no guarantee that you will succeed: for instance, in the case of the extrapolation, it may be that in reality the floor below the sofa has been made with another pattern and so your Diminished Reality appears wrong. Since we are working with assumptions, it is easy to do things wrong.
Diminished Reality is very complicated and Eddie told me that very few researchers in the world are working on it (I hope to be able to interview one of them…). If you think about the applications, well: one example may be the IKEA app: what if you would like to see in your real house how a new sofa could fit in your room at the place of the old one? You could remove the real one and add the new virtual one all in mediated reality.
And then come on, being invisible is the dream of all of us! 🙂
My experiments with Mediated Reality
One day I was letting my mind wander while I was in the bathroom (the best place to have an inspiration) and suddenly a paper shared by someone on the web came to my mind. In this article, researchers made a ball disappear in a video made against a completely white wall. So, I asked myself: “why can’t I try to make something like that with my Vive Focus? I have already given the Focus augmented reality super-powers, so why can’t I give it also diminished reality superpowers?”
And so I started experimenting. I am not paid to do R&D, so I wanted it to be just a fun project done by myself in some days, just to see how it is diminished reality seen from a real headset.
To segment the elements to be removed, I chose the easiest road: segmentation by color. It is not the best approach ever, since I am not actually distinguishing the objects to be removed (a black PC mouse and a black shoe are the same for the system, using this strategy), but again, for a fun fast toy project, it can be ok.
The problem is that the Focus has black-and-white cameras, so segmenting by color is pretty depressing (you can’t for instance remove the hands because they are pink, since you don’t know what is pink). I so chose to remove all black-ish pixels, since the walls in my office are whitish and so this could help in experimenting.
I so prepared a Unity shader that took the frames from the cameras of the Focus and if the pixel wasn’t black-ish, it was showed to the eyes of the user, otherwise, if it was black-ish, showed something else, that is what was “behind the object” (doing everything in a shader, made everything run pretty fast).
The results of this were not bad, even if sometimes there was a clear halo around the removed object. I mean, if there was a black object shown on a white background, the pixels that were on the edge of the object were gray-ish and not black-ish and so were not detected by my algorithm. I used some tricks to reduce this effect, but never managed to remove it in all conditions. For sure morphological operators and edge detectors may help in future researches.
Done that, I had to see what to put in the pixels that I had removed. The Focus has not the SRWorks framework, so there is no native way to have the 3D model of the room I was in. Furthermore, I had no time and willing of messing around with photogrammetry, environment reconstruction and AI, so I went for an easy approach in this case, too.
The application had a preliminary stage where you could construct a very very rough model of the room by just taking automatic pictures. I mean, while you were moving, the system automatically shot images from all the various orientations and position. When the program then went in Diminished Reality mode, it just took from the database the picture shot from the most similar position and orientation wrt current one and used its pixels to reconstruct the missing part of the current frame, the one of the object that had to be hidden.
The most experts of you are surely thinking that it’s quite a naive approach and I agree. This solution first of all creates a huge database of images in the memory of a mobile device and then, experimenting with it, I noticed that it has also the problem that an image shot even from a similar position and orientation appears completely different from the current one. Similar is not enough. That’s why in the above videos, I mostly kept my head fixed… moving it, the emulation wouldn’t have been that good.
Another problem was given by illumination. The Focus’s cameras have continuous white-balancing and so the colors in the model images were always slightly different from the ones in the real frames, resulting in the canceled objects resulting as made by glass, so still slightly visible.
A better solution would require creating a 3D model of the room, or at least use some magic to warp the saved images to the current position and rotation of the headset (this may be something worth investigating).
Regarding the above video of the stairs, it was made in a similar way, but experimenting on what happens when the model used to fill the holes is not the one of the environment around the user, but another one. What you obtain is an incredible trippy effect of portals opening.
The results that I have obtained are not optimal at all, but I really had fun in experimenting with mediated reality anyway. And even with such a rough solution, I can assure you that diminished reality was something magical. It was really fantastic holding my black 8th Wall hat in my hands and actually see that in front of me it was almost non-existent. It was a real magic trick. And experiencing it in 3D, in front of my eyes… wow!
So now, I have my Vive Focus that can do realities that can span all over the X and Y axes of the Mediated Reality continuum! After AR, my lovely Focus can also do DR. And it is interesting to notice that this is something that is currently only possible with VR headsets… doing something like that with HoloLens or Magic Leap is impossible. This is something to take in mind when we develop the XR headsets of the future.
And I hope also that more people will experiment with Diminished Reality, because it is really an interesting field… so I hope that someone of you will be inspired by this post and will make a better job than the one that I did! Come on XR community, let’s make the technology go forward!!
P.S. would you mind subscribing to my mediated reality newsletter? 😉
Disclaimer: this blog contains advertisement and affiliate links to sustain itself. If you click on an affiliate link, I'll be very happy because I'll earn a small commission on your purchase. You can find my boring full disclosure here.