Jeremy Bailenson talks about 1,000,000 Walmart employees training with VR, presence, and the VR “empathy machine”
Today I’m very proud to host on my blog what is probably one of the best interviews I have ever had in these 6 years of career in VR blogging. I promised to celebrate this blog’s birthday with a few cool interviews and this is one of them. I have been able to speak for almost one hour with Jeremy Bailenson, which is the professor at Stanford that is a legend in the VR field for his studies on interactions, psychology, and training in Virtual Reality. His lab was even visited by Zuckerberg before he decided to acquire Oculus.
I spoke with him about many topics, like why VR is so good for training (and he told me that his company Strivr has trained more than ONE MILLION Walmart employees in VR), if it is true that VR is the “ultimate empathy machine”, how much is graphical fidelity important for presence, and of course also about his encounter with Zuck. Initially, the interview should last 30 minutes, but he was saying so many interesting insights, that I went on until the end of the time I booked with him. I thank him so much for all the time he has dedicated to me.
I loved speaking with him. And I also loved his attitude: notwithstanding the fact he’s literally a legend, he accepted the interview request immediately and was very friendly with me all the time. He’s a great man, and I came out of this interview as a big fan of his.
But now, enough talking! Here you can find the video of the integral interview I had with him, with below the slightly-edited transcription. I really suggest you watch (or read) it all. Enjoy 🙂
Hello Jeremy! I’m super excited about this interview… I’m a bit of a fan of yours since a few years, so I’m super happy to be here with you today. For the very few people that still don’t know you, can you tell us a bit about your story?
Jeremy: Yes. Let me begin by thanking you, for hosting this. Thank you for all the work you’ve been doing in the field. I enjoy reading your writings and keep up the great work, Tony. So thank you for having me. In the late ’90s, I was getting my Ph.D. in cognitive science. I was running experiments on humans, building low-level AI models to try to understand how the brain forms categories, how it makes decisions, how it reasons. In 1999 I was having a tough fifth year of grad school. I didn’t know what I was going to do with my life. I read the novel Neuromancer, which I’m sure many of your readers have read.
I was trying to build AI and model the brain and what Gibson does so brilliantly in that model is creating an illusion of intelligence. You’ve got these avatars and these virtual worlds and something just really grabbed me about a future in VR. What I did was: I applied for a few different jobs that were in the VR space. This is 1999. I was lucky enough to get a job at UC Santa Barbara in a combination social psychology lab and computer science hub, where we were using VR to run experiments, to simulate the social world. We were using VR not as a thing to study in and of itself, but we were using it as a tool to run more rigorous social psych studies.
If you wanted to understand conformity, you could have five avatars around a poker table, and you could see if somebody conforms to the others in terms of how they bet, or if you’re trying to study interpersonal distance and nonverbal communication, it’s a nice way to study that because of how rigorously you can track people’s gestures. For four years, I stayed at UCSB and I learned how to program VR. We were using something called Visa, which was a very low-level library language, very different from what Unity is right now. Or Unreal if that’s your language. I learned how to do the coding. I learned how to build hardware because back in the late ’90s, we weren’t using beautiful inside-out tracking systems.
We would have to set up our own camera systems and all sorts of calibration and repair on the camera. There was a lot of hardware work. At the same time, I shifted from questions about how the brain works to larger questions about society, social interaction, how people learn, how they train, and how they communicate. In 2003 I was lucky enough to see a job that was advertised at Stanford in the department of communications. I was in the psychology and computer science department at UCSB. Stanford’s department of communication is a place where we study media and VR. We all think about VR as a medium today, but in 1999 or 2003, not so much.
I applied for this job and a comp department at Stanford and was lucky enough to get it because the forward-thinking of people on that search committee saw that VR could be a medium. In 2003, we set up the Virtual Human Interaction Lab where we study how people use VR, how it affects the mind, how you can build applications… and we’ve been here now for 20 years doing it.
I am always fascinated when I speak with someone that is here before the Oculus Kickstarter campaign. You were really there building hardware and using very low frame, low-resolution headsets. How different was it from today?
In 1999 to set up VR… I like to make a comparison to an MRI machine, a magnetic resonance imaging machine. To have an MRI, you need a dedicated room. To use VR, it’s got to be a very special room. There couldn’t have been lights and there couldn’t have been reflections off the walls. You needed a dedicated engineer to run it. It wasn’t as if there were these amazing pieces of hardware that are robust. You had to have somebody on call to tinker with it. It cost seven figures, not quite what an MRI machine costs, but… between the first National Science Foundation grant we got at UCSB, in order to render the scenes, we needed to have a Silicon graphics workstation for $100,000 each.
That was what we needed to do in order to render the scenes. It was very expensive back then and very clunky. We were doing work where we’re putting people in these very different systems from what we see today. Although, Tony, as you know, from having been doing VR for quite some time now, sometimes you don’t need a super fancy system to have a high presence. We were certain the illusion we could inspire even with these older systems was quite substantial.
I want to understand from the point of view of interaction psychology, human interactions, etc… how was different to use those old systems with respect to the new ones? Is there any difference or the effect is more or less the same?
I would say the two major differences are: number one, scale. If you’re running a psychology experiment, if you want to know the answer, meaning the real answer. How does your particular type of avatar affect your well-being? Or how does resolution or latency affect simulator sickness? If you want to know these basic usability questions, you need to have a large sample. What research looked like back then was bringing people one at a time to the lab, setting up the system, and spending an hour and a half on a single participant.
Now we just convinced Stanford to buy us 180 Oculus Quest 2 headsets, and Pico Neo 3s, for those that didn’t want Oculus 2. We’ve got this set of a couple of hundred headsets.
We can just move them around and collect lots and lots of data so we can start having the large sample sizes that this type of work deserves… because Tony, it’s not just having a lot of people, but you want to have different types of people… you want to spend different ages and different socioeconomic statuses and different amounts of time having spent using computers. Now that our samples are starting to get large, we’re running studies, we just published our first study that had about 1,000 people in it, which is a very large sample size for VR. One of the big differences is scale.
The second difference is: because of the advancements of these “social VR platforms”, the ability to network participants in VR has become trivial. In my lab, we published networked VR studies as early as 2001 where we networked three people together in VR and saw how the collaboration went and how avatars changed them. That was hard coding work and getting packets to arrive at the same time and making sure the latency wasn’t too horrible to cause synchrony to break down. Now, it is so easy to just take whatever your favorite platform is and to put people together and study that.
It’s an incredible thing as a social scientist, to be able to just drop eight people together in a virtual room and to study all sorts of things. I’m happy to talk more about what we’re looking at, but in general, in my mind, the two biggest changes are: headsets are cheap, so you can run people at scale, get diverse samples, and you can now study people together in VR. Before it was really hard to do that.
That’s very interesting. I’m intrigued by the fact that you haven’t mentioned that the sense of presence, the emotions, and the effects on the mind are different. That part has not changed as much as far as I understand.
I spend a lot of time looking at the research on presence, on realism, and how realism affects presence. We’ve published a meta-analysis in 2016. A meta-analysis is a paper that combines every paper that’s been published to find out the exact relationship between immersion, which is what psychologists call the hardware and software affordances that produce objective features, and presence, which is the psychological experience that VR is real. What Jim Cummings in his landmark paper showed is that the relationship, when you look at their effect size, the effect size is medium. I believe R squared is 0.33 on immersion and presence, or somewhere in that nature (I might be off by a single point). What that means is that when you increase immersion, it does increase presence, but not at an extremely high level, which I believe matches anecdotally what we’ve seen over the years.
In 1999, when I did my job interview at UCSB and Jack Loomis put me on a plank, just like if our users look at Richie’s Plank or whatever their favorite plank is, or their fear of heights demo in VR, I was terrified when I interviewed. I didn’t want to take that step.
Now the resolution, I was, I think, 640 by 480 in each eye. We were running at 24 Hertz. The latency, I think, was about a quarter second. This was not a super high res experience, but the brain is very forgiving of low resolution. If you think about it, if you’re walking around at dusk and you can’t see details that well, you can still feel like you’re there and you can still navigate. In 1999, my low res plank experience was enough to make me commit to a change in my entire field of academia. Remember I was in CogSci, I was building these models.
Back then there were a lot of jobs in cognitive science, you could go do usability research at tech companies. Interfaces were flourishing at that time. I was so inspired by the pit, the virtual plank that Jack Loomis showed me, that even the low-res one made me change fields because it was powerful.
Wow. That’s very interesting. It also makes sense because many VR games now are low-poly, but people still enjoy them. Of course, good graphics are good to have, but not a must-have.
There’s a joke I tell at cocktail parties, and it’s not really a joke because it’s not funny at all, but if you’re going to ask me, what are the five most important features that cause presence in VR? My answer for you is going to be tracking, tracking, tracking, tracking, tracking. I’ve nothing against resolution… it’s always nice to see a high res image, and, of course, field of view is important, but if you’re going to inspire presence, the thing that’s most important is low latency, high frame rate, high accuracy of tracking (which is really important). Back then even in ’99, despite the fact that we couldn’t get too high in latency, we always worked hard on accuracy.
We did our best on frame rate. We would often choose to render a mono instead of stereo to double our frame rates because tracking was much more important than stereoscopic vision. Now, that being said, when I see an incredibly-detailed high-res beautiful high-field-of-view, PC-driven VR experience, my jaw drops. It’s incredible, but you don’t need it in order to have high presence.
I want to return a bit back to your story and all your work in the field. I want to ask you: what makes VR special for your job? Why VR and not just using just a flat screen? Why have you fallen in love with this technology for your work and your research? Why are you still here after many years of difficulty?
Yes. I love that question. It’s not one I get asked that often. When I entered the field in ’99, the VR field, most people who were doing research were computer scientists who were running social science-type studies. They were running usability studies. I would say the bulk of the field was trying to make VR mimic the real world as close as possible. In other words, how can we make VR feel like the real world as much as possible?
Remember that I come to VR inspired by William Gibson’s novel, Neuromancer, which is just a mind-bending journey through what’s not possible in the real world, in a future that’s a very strange place. When I arrive, I say, “Okay, everyone else is trying to make VR mimic the real world. What I want to study is what can you do in VR that you can never do in the real world? What are the things that are impossible to do in the real world, that are easy to do in VR, that actually are useful for people and produce positive experiences?” That’s basically a frame for almost all the work we’ve ever done. Even as early on as 1999, one of our first studies we ever did was something called Augmented Gaze.
Imagine that you are a teacher and there’s a classroom. As a teacher, you’re trying to be as effective as a teacher as possible. We built an algorithm that was laughably simple. It might have been 10 lines of code in our Python database that we were working with, we were using a language called Visa, which is a library function of Python. All that the code did was this simple thing: you’re networking three people together (the teacher and two students). Instead of sending the same packets to each of the students about the yaw of the teacher’s head, you’re sending simultaneously different packets about her nonverbal tracking data that are tailored for each student.
Imagine that you’re a student in a class of two people, it’s like a tutor session, and you’re receiving the gaze of the teacher 95% of the time or 80% of the time. You feel as if the teacher is looking at you more than the other person. Both people are having that experience simultaneously. You’ve just scaled up the ability of the teacher to maintain eye contact with students.
What our early research showed is that students return the gaze of the teacher more, then they pay more attention to the teacher. They feel more connected to the teacher. Depending on the study that we’ve run, and we’ve run many, you either learn more or you agree more with what they’re saying and you certainly pay attention. Just a simple tweak to an outgoing packet of tracking data is causing real social consequences. In general, when you look at our work over the years, the bulk of it says: “What are the things you can do in VR that you couldn’t do in the real world?” In recent years we’ve honed our thinking on this and I’ve come up with the DICE acronym, D-I-C-E, which is: save VR for things that if you did them in the real world, they would be Dangerous, Impossible, Counterproductive, or Expensive.
Dangerous: training firefighters. One of the culminations of lots of early work in VR. Impossible. A lot of our work is on perspective-taking, where you go in front of a virtual mirror and you change your skin color and you become a different race. Or you change your gender or age or body shape, and then walk a mile in another person’s shoes. They can’t do that in the real world. Counterproductive. One of our more famous studies, which has now been replicated in a number of labs, is: asking someone to cut down a tree. We find people that don’t recycle paper. They go in VR and they physically cut down a tree. They get haptic feedback while they’re using the chainsaw and they understand viscerally the link between deforestation and using non-recycled paper or not recycling. Why this is counterproductive? If I were to teach you about deforestation by forcing you to go into the forest and cut down real trees, that would be a ridiculous way to do it. VR gives you the best of both worlds. The final, the E in DICE is expensive, and expensive, for example, is when we train our football players in VR or when we worked with the German National Soccer team and we used VR to train the soccer players… that was a really fun project.
Why would you use VR for that? If you’re trying to teach a goalie spacing, it’s expensive to have lots of players on the field all the time. The players, if they’re trying to give mental reps to a goalie… you don’t want to force all your players to be on the field for so long. A long-winded to answer your question, which is: I love VR because it allows you to do things you can’t do in the real world and we’ve come up with a framework for how to think about when VR is useful.
This is why, I guess, you started this experience with Strivr, which is one of the leading companies for training in virtuality. How is your experience there? Have you really seen the practical effects of your research there?
Yes. The story of Strivr begins in 2013. Derek Belch was my graduate student and he was also a former football player. He was a field goal kicker on the Stanford American Football Team. When he came back to get his master’s degree, he wanted to use VR to train athletes. His thesis was working with the Stanford Football Team to train quarterbacks. What we were training quarterbacks to do, for those that aren’t familiar with American football, is this: before the play starts, everybody’s stationary on the offense, and the defense is running around a little bit. The quarterback is looking around back and forth, and he has about three seconds to look at what the defense is doing and make a decision. He can keep the original play, or he can change the play to a different play that he’s got in a queue. They typically have three plays and he can “kill, kill, kill”, which is go to the next player in the line, or he can let it roll. Let it roll means he’s going to keep the original play. He has a couple other decisions he can make, but it’s not teaching them how to move. It’s leveraging head tracking in VR and allowing them to look at patterns and practice their decision-making skills.
We have had a lot of success in sports. Derek graduates January 1st, 2015, and then shocks everybody by signing six national football league teams in a couple of months to multiyear six-figure contracts. Then Strivr is born, the company. Strivr begins as sports training in VR. He proceeds to work with lots and lots of different teams across the world. Olympic skiers, basketball, football, of course, baseball, cricket, every sport that you can think of… but then something strange happens. One of the teams we work with is the Arkansas Razorbacks. In Arkansas, there is the largest company in the world. Walmart headquarters are there. Brock, the head of training from Walmart, decided to come to the Arkansas Razorback football office.
He was a football fan and everybody who’s a sports fan when gets to see his teams playing VR, it’s a super fun experience to be the quarterback. Brock puts on the goggles and he’s looking around and he says, “Huh, what you’re training here, which is somebody should look around, recognize a visual pattern, make a decision based on what they see and then tell others about that decision. That training is very similar to what every Walmart associate employee does every single day“. What began there is a journey from 2016 until today, where we have been working with Walmart to train their employees at scale.
When I say at scale, one of the things we know from psychology is that humans have a very hard understanding extremely large numbers. Tony, we have trained over 1 million Walmart employees in VR. People don’t understand that the moment where VR actually scales is with Strivr. Yes, there’s a lot of gamers that are using Quest to play all the wonderful games that are there. However, when “normal people”, people who are not VR enthusiasts, learn to use VR, it starts with Walmart.
What do we train in Walmart? There’s a day in the United States called Holiday Rush. It’s the day after Thanksgiving, it’s the most crowded shopping day in the stores. It’s an utter cacophony. There’s people everywhere. It’s an intensely stressful day for employees because it’s hard for them to interact with everyone who’s trying to shop on a day when there are all these sales. Because of turnover at Walmart, one or two employees have never experienced this physically because there’s lots of turnover in the store.
The first demo that we built for training was a simulator where we used 360 videos for this one and they got to experience Holiday Rush and practice where they’re going to look and how they’re going to talk to different customers, and where they’re going to stand. That was the first demo. One of the more recent ones, which is an intense one, but I do want to talk about… it is… in the United States, we’ve got problems with active shooters. We’ve got people who are just going to stores and killing and murdering people with weapons. It’s a terrible horrific thing.
One of the things that we’ve been building for lots of companies, Verizon is another one, but I’ll talk about Walmart, is an active shooter demo. In the active shooter demo, you put on the goggles and it’s very intense. A gunman comes in and he sticks a gun at you and, all of a sudden, the scene pauses and you’ve got to make a decision and those decisions can be ranged from where do you tell your coworkers to go, to how do you look. Do you look them in the eye? It’s a 30-minute experience where it’s just teaching when a gunman comes in, how to make those decisions, and then how to practice them.
You take the goggles off and you just say, I can’t believe that anyone in the history of the world has ever trained what I just did using PowerPoint… because that’s how they were training before. I’ve just had the experience. In 2019 in El Paso, Texas, there was a horrific event where a gunman killed many people, just walked into a Walmart, and killed many people. However, a lot of the associates who were working on the floor that day had done our VR training. There’s an article in Fortune magazine where the CEO of Walmart who was interviewed says, “Lives were saved that day because so many of our associates made proper decisions, and they made those decisions quickly because they’d been there before in VR.”
I’m spending a lot of time talking about Walmart. I think when the history of VR is written, where VR really enters the normal world, meaning not just people like you and I, Tony, that think about this all day long, is with Walmart. I’ll close on this by saying, if you walk into any Walmart in the United States (there are 4,700 stores in the United States) and you walk back to the locker room, there you find three or four VR stations. They’re sitting there and the associates just cycle through and they’re training.
Every associate doesn’t do it every day. They tend to train about once a week, but these things are getting used. It’s been an incredible journey and strange to think about where VR really gets its footing is with corporate training.
I remember very well the story about the shooting because I read about it in some VR magazines. I like a lot playing games, I have worked a lot in, and I still work in entertainment. It’s all very cool. But when you see VR saving lives, it’s where the real magic happens. I think it’s amazing.
You’d asked about the applied work and all the things that I ever do, note that I had a ridiculously small role in this demo, Strivr is a 150 people company right now, but the fact that I get to even be associated with a company that might have saved a couple lives is about as much I could hope for as a VR guy. I continue to focus on use cases where you can have a tangible impact and sometimes that’s in a situation like that. One thing that Strivr recently did is: the first FinTech company we worked with was Bank of America. Why that’s important?
If you’re working with banks, when I say “working”, I mean that we literally install VR in every single one of the 4,300 Bank of Americas in the United States. They all have VR and they’re being used. In order to do that, you’ve got to solve intensely rigorous protocol for the safety and privacy of data. We’re now in FinTech and no one in VR had solved that before. Now that we’re in BFA we’re now working with a number of other banks and it’s different types of applications. These are more around fraud detection and customer service. One of the best data points with Strivr that’s public is we work with Fidelity, and we use VR to increase empathy and social connection of people who work in call centers to the people they’re talking to.
Imagine you work in a call center, your job all day long is to talk to Fidelity customers. What are you talking about? Well, you’re telling them, you’re giving them advice on what percentage of their income should they take out of their paycheck and put into retirement savings. Over time, those small amounts, 30 years from now with compounding interest are going to make their lives very different. Our VR demos are designed this way: we beam them into somebody’s living room. They look around and they see the bills piling up, and they see the person on crutches and they can’t really walk around. It’s just designed to give the call center workers a sense of what the prototypical customers, what their lives feel like.
The data point that’s exceptional, and Fidelity has published this themselves, is that after they started using VR, customer service ratings across the board, meaning when the customers agree to do that survey at the end of the call and say how the call was, they’ve increased across the board by 10% points in terms of how well these calls have gone. Again, that’s not saving a life, but it’s a hard conversation where you’ve got a call and decide about your income. If that goes a little smoother, it’s just a nice use case.
I remember when the Bank of America news was released… more or less on the same days also Accenture announced that this was going to use VR for onboarding its new employees. Lots of the use cases for enterprises are getting huge. Since you mentioned this topic, how difficult is training so many people? How difficult is training 1 million people?
It is so difficult. I can’t even begin to describe how different it is from a couple dozen people. Where to begin?.. Let’s take Walmart, for example, you’ve got 17,000 VR systems in the field. They all need to work. Okay. First of all, forget things like charging. As you know these headsets. They need maintenance, they need to be cleaned. They need to have content pushed and pulled, but the content needs to be pushed and pulled A), in places that often don’t have great Wi-Fi. B) They’ve got to be done, not on a Wi-Fi network because this is employee data and it’s private. We’ve got to make sure that the way that it’s captured and stored and delivered is in a way, not just using normal Wi-Fi. That infrastructure alone to be able to push and pull content and data without allowing it to be on a normal Wi-Fi is just utterly incredible. The next is safety. Think about this. I just told you that in every single one of Walmart’s locker rooms, we’ve got three or four VR training stations. These locker rooms aren’t huge spacious places. Anyone who’s done hashtag #VRfail knows how you have to be careful with physical safety VR. One of the reasons why Strivr is focused, not exclusively, but we tend to focus on 3DoF as opposed to 18DoF, meaning head rotation only as opposed to pitch and roll X, Y, and Z in hands.
Of course, we do 18 DoF but we don’t always do it, and one of the reasons is that we got four people working together in a locker room in the back, and we don’t want them hitting walls, we don’t want them smashing each other. Yes, of course, the guardian is good, but I like to joke, sometimes the guardian is just good enough to be dangerous, right? We learn to trust it, and so there’s that issue on safety.
There’s inaccessibility, so not everybody can wear a headset. Not everybody can have two-hand trackers, so how do you allow for healthy use of VR without overusing it? How do you come up with alternatives for people that can’t? Customer service again, think about that, there’s 17,000 VR systems just from Walmart alone, and another 5,000 for BNA, BFA, so you’ve got to have a whole customer service line. When I say a customer service line, you’re getting calls 24/7, because we’re working with a huge food chain in Australia called Wally’s.
When people talk about Strivr, we have approximately 150 employees and people are often surprised to figure out how many of them are not VR coders. Because so much of the work is just about the infrastructure to make sure that VR always works. Let me just harp on that point right there, that “VR always works” point, because Strivr began working with athletes. I spent a lot of time with professional sports teams and quarterbacks and other athletes. They don’t have to use VR. It’s not in their contract, and if it is not incredibly smooth, they’re not going to use it because they don’t have to. All it takes is one glitch and that quarterback’s not going to put it back on. Now the Walmart employees, the Verizon employees, the FedEx employees, all these big companies that we’re working with, we want it to be smooth for them because actually, they don’t have to do it, but they do have to train and that’s one of the ways they can train. We just want to make their lives better, not worse.
It’s a long shift for all of us and our work, but especially so if you’re working in retail. We want to make VR something they look forward to, and, therefore, it’s just always got to work, always got to work. Having a standard like that it’s different from anything I’ve ever done in the lab because in the lab we tend to push the boundaries on the tech side of VR… the #VRfail (sorry, I don’t want to use that hashtag)… we have tech problems, one in every four sessions. Something goes wrong, you can’t have that level of everything’s got to work all the time.
I want to switch now to your job as a professor. You made headlines a few months ago because you started a classroom in VR. When I read just the title, I wasn’t sure why a classroom in VR was needed. If you just have to speak, for instance, you can do that also via Zoom, so why did you take this decision? What were the advantages that this gave to you?
Yes, so thanks for that question. In March 2020, every professor at Stanford had to figure out the following year, what classes they were going to teach, and when they were going to teach them. I volunteered to teach in the summer of 2021. I pushed all my classes to the summer… professors never want to teach in the summer, right? It’s summer, that’s when we want to have a relaxation. I did it because I was hoping that hardware would mature and that platforms would mature, so that I could teach my landmark classes called Virtual People.
I’ve been teaching it since 2003, it’s a class about VR, it’s a class about the history of VR, it’s a class about the psychology of VR, it’s a class about engineering. How one does tracking, rendering, and display, and finally it’s a class about the applications of VR. It’s like got four components, and I was hoping to try to teach it in VR. Fast forward 15 months, we did a ridiculous amount of work on hardware, on platforms, on content, on curriculum. We’ll talk about that in a moment because that was the bulk of your question, which is what’s the point of teaching in VR.
We did an incredible amount of work to make it so that we believe we crafted a class that was going to work in VR, and was going to justify the use of the medium. Now remember, Tony, this is a class about VR. If it wasn’t about VR, in my opinion, it wouldn’t have justified using the hardware. In other words, I can’t imagine a biologist, or an art history teacher moving their entire class to VR. I can imagine them doing the field trip, where you get to look around the Statue of David, or you are moving molecules around in a safe way and producing explosions in chemistry so that I can imagine it as an every once in a while thing. For us, we decided to make the medium the main part of the class.
To give you a sense, it was a 10-week class, and each of the 10 weeks was a different topic. We had a week on climate change, we had a week on empathy, we had a week on avatars, we had a week on education, and so each of those weeks there was cadence. Sunday night they would do readings often from my books, but they would do enough readings so that they had a really good sense of that week’s topic. They read three articles on climate change and VR, they would write reaction papers to ensure that they’d done those readings. Then on Monday, just to really make sure that everyone understood all that content because it was important… there’s something in education called the Flipped Classroom. The Flipped Classroom says that for years, professors, and students come to the physical lecture room, professors, blabber at them, they take notes and then they leave. It’s the biggest waste of resources you could ever think about because that could be recorded, and they could just watch that recording. Why don’t we do something cool together, something active, something constructivist in the words of education? Sunday night they did the readings, Monday, we did what’s called a Student Panel, which is we’d randomly pick 12 students and they’d have to ask questions about that week’s topic.
If it was about avatars, they’d ask a question about avatar realism, they’d go really deep into one of the articles, and we did that on Zoom, because I’m a firm believer, we should not be in VR for talking heads. I don’t want you wearing goggles to see lip-flapping animation guided by an audio stream, we don’t need VR for that, so that was on Zoom. By the time Monday was done, we all just knew that week’s topic, we knew it. Wednesday, we would take what I call a VR journey, and the VR journey would be related to that week’s topic, For example, in the medical week we did medical VR, and for our VR journey, we went to Altspace and we did Evolve VR, which is a meditation pace breathing class taught by the Reverend Jeremy Nickel, and Caitlin Krause was there as well. Those two instructors, my students, we got together in VR, and for those of our viewers who haven’t tried this, you all sit on your own little mats, and the Reverend Jeremy starts guiding you through meditation. All of a sudden, your mats float up, and then you’re in the outer space, and you’re looking down at the Earth for the overview effect, and it was just incredible.
We spent a lot of time during these 15 months vetting different experiences for each week for our VR journeys day, which is Wednesday. Something about having this avatar layer, I’m a guy that I’m pretty hyper and I should meditate, but I don’t and I know I should. I was able to do pace breathing in VR because there were other people around and there was social motivation to do what the teacher was saying, but they weren’t video people. They weren’t staring at me, I didn’t feel judged by them, they were all space avatars with not many features and that best of both worlds: there was social feedback, but not embarrassingly.
Another way of saying this… if you were to ask me, “Jeremy, in your physical classroom, would you ever do meditation and pace breathing?” I’d say “No way, you’re crazy, I would never do that. That’d be mortifying,” so that’s one example of a Wednesday journey. Another example is we would descend upon a blank place in Engage, and we would build something together. Imagine dozens and dozens and dozens of people. In fact, over 100 together, building a city… for those that are not used to Engage, you can build stuff… they’ve got a nice library of 3D models and then you’re building stuff together, so that was Wednesday. It was a VR journey and it was actively, do things together.
Then on Thursday and Friday, we’d have small group discussions and why this is important? Is that they were short. They are about 30 minutes because I’ve got a 30-minute rule with VR. We don’t want to keep people in for too long, and get them sick. We would have five or six students and they would be together, and why this is important… if you have a Zoom grid and there’s nine people in the Zoom grid, if the person in the middle looks all the way to the left, we think that she’s staring at the person next to her, because that’s what we see. Of course she’s not looking at the computer anymore. There’s something in her real world that she’s looking at.
The problem with Zoom is that you don’t get any of the spatial cues that make conversation special. As another example, if I lean forward to you right now on Zoom that doesn’t actually have meaning. Maybe I’m just trying to read a message on a shared screen more carefully, but in VR, when you move up towards someone, that interpersonal distance is important as a cue that shows intimacy and other people see it in VR. When person A points to person B, person, C sees person A pointing to person B, and all the space works.
The purpose of the small group discussions was to really leverage the “groupiness” and to make people feel socially present together, and so that’s why we did that in VR. To sum up, Sunday, reading. Monday, Zoom to make sure everybody gets that topic. Wednesday, some wild journey, and then Thursday and Friday, small group discussions were reflected upon the week, and enjoy the social presence VR offers in small groups.
Now, I just want to ask you some random questions about VR, given your big experience in the field. One is… there is a bit of debate lately because a few years ago there was this sentence by Chris Milk, “VR is the ultimate empathy machine”. It’s repeated everywhere, but actually in the last months, I started reading some articles that say that it’s not so true. That you have an immediate effect, but it decreases over time. It’s like a bit of journey in which you see, for instance, that if you have another ethnicity, you might have discrimination, but then when this ends, your life is like before, and so it’s not that effective. From your experience, where is the truth: is VR an empathy machine, or not?
The first time we did a VR Empathy Study, we had a virtual mirror where you changed identities. The first study we’ve ever published was in 2005. We ran the study in 2003, and we had young people become older people. We looked at how it reduced ageism and showed a reduction in bias against older people, when 18-year-olds saw themselves in a virtual mirror as 60, 70-year-olds. Obviously, when Chris Milk has this amazing Ted talk in I think 2015, and the word, “the ultimate empathy machine” comes out, there was a bit of group hyperbole of which I am guilty too, around what we can use this for. I have been more careful with my own language as of late.
I think I was always trying to be careful. I’m incredibly careful now. I believe our position has always been fairly consistent, which is VR doesn’t solve racism, doesn’t solve sexism, it doesn’t solve prejudice. However, it does give you a tool that changes the way people tend to think about other groups. If you go to our lab’s website, we have a publications tab. You can search by empathy and diversity as a tab, you will find 30 academic papers published in the area. What those papers generally show is that when you compare VR to another medium, it could be reading, it could be watching a movie, or it could be active imagination role-playing, In general, VR tends to produce more pro-social behavior change compared to the other media. It’s not every study, it’s not every single time.
To your point, not enough of our work looks longitudinally. Some of it does though: Fernanda Herrera has published an amazing paper with one of our free downloads. Anyone can download it, it is called Becoming Homeless. What she looked at with Becoming Homeless was people’s behavior over time. So, you did Becoming Homeless in VR, where you learn about the situational causes of being homeless. For example, you lose your job and you’re trying to actively sell things in your apartment, and then you get evicted. You’re trying to live in your car, but that doesn’t work because it’s illegal, and then you’re trying to get some sleep on this bus, and there’s a man harassing you.
Pretty intense 6 DoF VR experience, 18 DoF VR experience. Fernanda looked at behaviors two months later and showed that compared to watching a video or reading statistics or doing mental role-playing that VR still had a longer effect two months later. There’s a piece that some of our viewers might be familiar with called 1,000 Cut Journey. 1,000 Cut Journey is the genius of Courtney Cogburn. She’s a Columbia professor that studies race and health. You become Michael Sterling, a black male, and you experience racial discrimination as a seven-year-old, as a 15-year-old, and as a 30-year-old. You learn about systemic racism over the course of one’s lifetime.
We haven’t published this work yet, but we’re close. Courtney Cogburn has run a longitudinal study where she makes it part of the curriculum for all incoming students in the Master’s program at Columbia. She’s got scaled use and looking at data over time. I agree, we don’t have enough data over time. I think the important take-home message for people is that, there’s a question that people used to ask, and I used to answer with a straight face that I don’t answer anymore. People say, “Does VR cause empathy?”. I say, “Well, you would never ask that about the medium of video or the medium of audio, it depends on what you do with it.” It seems so obvious, but in retrospect, the answer is VR can cause pro-social behavior from perspective taking. It’s not going to do it every time and we’re learning about it.
You mentioned some statistics and I see some of them all the time. When people talk about VR, they mention that for instance it increases the retention of learning. Sometimes I wonder how much the novel effect of VR impacts these numbers. Because, of course, if I try something new, my brain is all excited and so remembers things better. How does this novelty effect count?
I’m super excited to share with all of us a new paper that we’ve put out, I’m happy to send you the link. It’s the first ever long-term, large-scale study of VR. When you think about VR over time, there’s two competing hypotheses. One is there’s a novelty and it wears off. It was cool the first time, but at time 10… meh. The other is VR is a really hard medium, and it takes time to get it, and the more you do VR, the better you’re going to get at it. The more over time you use VR, the better you can actually leverage the medium to really embody the presence that’s in there.
Because of the class that we taught in our Stanford Internal Metaverse, our long-term class, we had 263 Stanford students spend over 200,000 shared minutes together over 10 week period. We can look at the data over time. We went in this, not having a strong prediction because no one had actually studied this at scale over time, whether novelty was more important or learning the medium is more important. The answer is quite robustly that the more people use VR, the better they get at it, the more they connect with the other people they’re with. So when you look at people time eight, they’re very different than even at time two, in terms of how they’re moving as a group, how they’re talking as a group, what they think of the other people in their group. Even when you look at low-level measures, for example, visual realism, how real the scene looked, people’s scores increase at time 10 compared to time eight compared to time five, it’s a nice linear pattern. We just released this paper, happy to share a link with you that shows this exact data over time. It’s preliminary work. The field needs so much more of understanding of all these effects.
To critique my own work “here, one and done”, where you just bring people in once, put them in VR, and then draw claims from it. It’s not good anymore, we need to start looking at that over time.
Now there is all this hype about, the metaverse, the M-word. What I want to ask you, from your experience is: are we, as humans, ready for it? I mean, going around all the day with part of reality that is real, part of it being virtual… I mean, we are not used to it. What’s your vision for that? Are we ready for it?
First off, for listeners that haven’t read Tony’s article on how to write a metaverse article, I all encourage you to read it. It was incredibly funny and brilliant, so congrats on that article.
Thank you!
Look, no one has studied long-term use of VR. There’s just not been any research. There’s an incredible article from 2014 from two German scholars Frank Steinicke and Gerd Bruder, who Gerd watched Frank while Frank spent 24 hours in VR. Okay. You can look at his signals over time. That’s the N equals one study of overuse. Of course, that’s a nice, amazing early study, but we just don’t know what happens when people spend hours and hours and hours per day in VR. Despite the hype of the metaverse, my lab continues to have a 30-minute rule.
A lot of our listeners can say “what, no, I can do VRChat for five hours!”, but you are self-selected VR enthusiasts… when you randomly take 263 Stanford students, what we found with people that are not VR folks, is they can’t be in VR the way that the people who’ve selected to be there can. It’s a different type of category when you go at scale. Your last question, sorry to be longwinded there.
Before the last one… you have helped in saving some lives, in shaping VR, but also you took a part in making Zuckerberg acquire Oculus. Very briefly… how was your experience in demoing VR to him and seeing what happened later?
Mark visited and we spent a couple of hours together. He did a lot of the tours… for those that don’t know we’ve got something called Virtual Becomes Reality. It’s a free download on Steam that has a lot of our lab demos packaged in one place. He did about half the demos that you’ll see in there. He grew a third arm and learned how to navigate that. He swam with sharks and did Marine Science VR. He walked the plank, of course, flew like Superman all the things you can do in VBR.
He came with a lot of questions. We did VR for an hour and we talked afterward about a lot of the technical specs, about latency, about frameworks, things of that nature, image, persistence. We talked about use cases and applications. I talked very similarly to how I did to you about the DICE model and things of that nature. Then, lo and behold, a week later we found out that he bought Oculus for what we now know to be $3 billion. This was part of his fact-finding mission. At the time, we didn’t know that. We host a lot of people in the lab. It could be third-grade classes on a field trip, it could be heads of state or governments. It could be just anyone that pretty much has to come. We tend to try to find a tour for them, so it’s not uncommon for us to do a tour. That one, what was unique about that is we just didn’t know. We were part of a vetting process for a moment that was going to change the history of VR.
Okay, so this is the end of this interview. If you have some last sentence that you want to tell to the viewers, to the readers, you are free to do it.
I’ve really enjoyed this hour and keep up the great work, Tony, and I look forward to our next conversation.
Thanks, Jeremy, and thanks, everyone, for reading. Have a good day!
(Header image by Strivr)
Disclaimer: this blog contains advertisement and affiliate links to sustain itself. If you click on an affiliate link, I'll be very happy because I'll earn a small commission on your purchase. You can find my boring full disclosure here.