We all know that these days the metaverse, the infamous M-word, is one of the hot topics in the tech world. We also know that this dream is very far away, and the estimates talk about between 5 and 15 years from now. There are many problems to be solved, technological and social to begin with, until we will be able to create the “next generation of the Internet”. One of these problems is the one of networking, that is how to make all people of the world live in the same shared 3D virtual space: well, a startup, RP1, claims that we don’t have to wait 10 years to see it solved, but that it has solved it already today. I’ve tested its prototype and I’m here to tell you everything about it!
The network problem
RP1 clearly takes the name from Ready Player One, so let’s imagine the scenario of the novel by Ernest Cline, which many people describe as one possible evolution of the metaverse. In Ready Player One, all the population of the world is inside the Oasis. The Oasis is a fully digital world where everyone can meet and interact with everyone else, in one vs one or many vs many experiences. In the current virtual worlds, though, this is not possible because the networking architectures are not able to support that.
VRChat supports up to 40 (80 if you go through invites) people in the same room. Engage around 50. Virbela up to 500. Horizon World around 20. When I say in the same room, I mean in the same “instance of the world”. That is, if you have created an event in VRChat, you can have 4000 attendees, but they will be split among 100 clone instances of your event, with every one of them having 40 people inside. The 40 people in the same instance can see, interact, and speak with each other. But they can not see the people in the other instances, which live as if they were in a parallel universe. This is, of course, a problem, because we want virtual worlds that can have city-scale, but this is hard if you can see a maximum of 50 people at a time.
Some open worlds (e.g. MMO) solve this problem using “sharding”, which means that you actually have an open space that supports thousands of people, but that open space is virtually subdivided into regions (the “shards”) and you can only see the people inside your same shard. This means that you can see and interact with 50-100 people at maximum. Everything works seamlessly so you have an open space full of people and you transition fluidly between the different shards, but you can’t actually interact with more than a certain number of people at a time. It is so not possible for instance to simulate a virtual stadium with 10,000 people in, with you seeing all the other people you have around you.
This is where RP1 comes in: on a random day, while I was organizing a digital event, Sean Mann, one of the co-founders of the company, told me that his startup was able to put 4000 people on the same server and in the same shard. I told him “no way this can be true”, so he invited me to his demo to prove that he was saying the truth.
RP1 Solution
RP1 is a startup co-founded by a few technology professionals, with the purpose of innovating networking inside virtual worlds. Among the cofounders, I have been able to speak with Sean Mann (Chief Executive Officer) and Dean Abramson (Chief Architect). The team at RP1 has years of experience in developing network solutions for online poker games, and since these games attract many players, they have developed new and innovative ways to handle many concurrent players. At a certain point, they had the idea they could modify what they have created for 10 years for multiplayer poker games and adapt it to create the backbone of the metaverse. RP1 was so started.
RP1 has until now developed a solution that lets a single server handle 4000 people in the same shard, with all these people having the ability to move, speak, and have detailed full body tracking, which includes facial and finger tracking. The server they use is even one that is six years old. They created a test world that is 1 square kilometer big to prove their technology works. In this world, you can move around and see all the other people, and also hear their spatialized audio: the more you are close to another player, the louder you can hear his audio, the more you are distant, the less you hear him, like in real life. Since so many players are supported at the same time, you can hear the audio of the dozens of people that are around you, exactly like in the physical world.
4000 people in the same shard means that you have truly 4000 people in the same instance, and you can move around and interact with them like in real life. And RP1 is actually not stopping there: they plan to expand the current network architecture so that the next demo should feature 100,000 people in 20 square kilometers, with also the ability to deploy gaming modules. The company says to have a plan to expand things so much that in the end, the servers can support billions of people in the same shard. This means having all the people of the world in the same virtual space, which sounds very close to how we imagine the metaverse. Of course, Sean told me that to do this, they require time and money (they’re looking for funds). With the right funds, they think they can already put many people in the same world one year from now, so we don’t need to wait 10 years as we were thinking.
Since the audio works so well and the system will be able to support many people easily, the company envisions that it could soon put even 10,000 people in the same stadium, with you being able to hear the crowd noise that is not pre-recorded, but that is generated live from all the actual voices of all the other people, properly spatialized. And apart from the audio, you could also see the movements of a few thousands people around you. This means that if someone scores a goal, you would hear the real crowd scream “Goal” and celebrate messily as if it was with you in the real world. To realize that, RP1 engineers should do some improvements to the current architecture, but they see it as totally doable. And I think that if they managed to do it, it would be incredibly cool because these kinds of big events are so fun because you live them with many people around you, and every person can enhance the emotions of all the other ones, so being able to feel them like in real life would make them much more exciting.
With RP1, the limit seems to be the graphical computational power of the devices more than the network. That is, with Quest, you can’t render 4000 full-fidelity avatars in any way, so even if the network supports them, the end hardware can’t show them to you. This is another achievement because until now the network was the bottleneck, not the device. If the network problem is solved, we have just to wait for the hardware performances to improve to take advantage of this interesting solution.
I’ve asked Sean if there are no other companies offering the same thing, and he said that at the moment only Improbable can offer similar performances: a few days ago, an Improbable-based world has just made a test with more than 4000 real people inside. But he says that Improbable requires a much more complex server architecture, while RP1 can make 4000 people stay in the same shard with just one old server. Plus, Improbable test was not made in VR and had not all the detailed body and facial tracking that RP1 instead offers.
He said that many people (me included) don’t believe him when he says he can handle 4000 users in the same shard, and he is doing demos with many big companies and a few journalists to show what he actually has in his hands.
All of this is made even more exciting by the use of WebXR: while the network architecture is per se application agnostic, RP1 is building its world completely on WebXR, so it is possible to access it simply by a link, and without the need of downloading any kind of application. This is how RP1 sees the metaverse: an evolution of the internet, where every 2D website may become a room or a building inside the RP1 universe, accessible in a fast way by just using a link.
He added that he is proud that RP1 is solving also the delivery of non-pre-compiled experiences since the future Metaverse will be too big to download. Current gaming engines like Unity and Unreal are not designed for this. “Our system is fully dynamic which is why you had no load times walking around the world. Our goal is to deliver a platform for Creators, Designers, and Developers to be able to deploy more immersed content as easy as you can build websites” he told me. RP1 will be agnostic of the building technology (web and native apps will be supported) and the device device (PC, mobile, AR/VR headsets).
I also liked Sean’s spirit: when talking about his product, he showed me to have a very open attitude, and the willingness to use his solution to create an open ecosystem that can be healthy for the community. I was happy to see that because I think it’s important to keep the future metaverse more open, and hopefully more healthy than the current Internet.
Hands-on with the demo
Of course, these great claims must be proven, and that’s why RP1 has actually a working demo of its solution, which I have been able to try.
Using my Quest 2, I was able to go to a certain website with the Oculus browser and then magically I found myself in a pretty crude 3D world. The RP1 prototype is clearly a tech demo, so it features terrible graphics. As a techie, I can easily abstract the visual appearance of an application from its technical workings, so for me, this was absolutely not a problem. But I know that many other people are not the same, and that’s why I have been asked not to share videos and photos about the demo here in this article (apart from the above screenshot): many people seeing the crappy graphics would think that the whole application is subpar. But here what is relevant is not the frontend, but the backend.
Sean greeted me and everything was exactly as he described it: I could see his hands with the fingers moving, and I could see his mouth moving while he was speaking. There was a lot of noise around us, and the noise was created by the voices of the 4000 users that were on the server with us. Since of course, Sean can’t ask 4000 people to jump in every time his company wants to do a test or demo, the 4000 users were not real people, but bots.
The bots have been made so that to simulate real users: they all connected from different machines to different ports of the server, they connected and disconnected at random intervals (to simulate people joining and abandoning the server), they moved around the world in a random way, and they spoke using the audio from Youtube videos.
I had 4000 fake people around me, and I could hear all the ones that were the closest to me. There were different settings for the audio spatialization: I could choose to hear less the people in the background so that to focus on who was closer to me, or I could try to keep a more realistic audio that let me hear the crowd chattering. Keeping the standard audio setting, I could hear Sean louder, because he was close to me, but I could still hear all the other people around me forming a constant crowd noise. This was great on the technical side, but it was also quite annoying, to be honest. I don’t know if the reason was the mechanical voice of the bots, but this background noise was not pleasant to hear, plus it was also louder than it should have been. It didn’t resemble the realistic chatter that would have been there in real life with that amount of people around me. I guess that some parameters have still to be tuned on that side. It was interesting anyway that the crowd voice was made by the real audio of the people in the environment, this is what we will need to make open worlds even more realistic.
It was also cool that I could move around the city and hear the voices of all the people I was passing by becoming louder and then becoming softer when I was going away from them. Sean told me that the audio solution alone is a great innovation for virtual worlds. And I agree that it was impressive that I could navigate around the city and have no seam in my experience, keeping hearing the audio changing appropriately while I was moving close to different people. The audio was also spatialized, but honestly speaking, this is not something that marvels me, since spatialization on VR headsets has been available for many years.
Keep in mind that also the movements were all synchronized, so I could see the legs, hands, fingers, and faces of all the other users moving in real time. The framerate was decent, sometimes higher (50-70fps), sometimes lower (30-45fps), depending on what I was doing.
Since visuals are a bottleneck, the system lets you choose how many people you want to see and hear. In the beginning, I had a setting that let me see the 60 closest people, but then changing a value in a menu, I was able to see around 200. The system still worked, but the Quest started flickering, with the framerate going to 19fps. This proves what the company stated before, that is that the bottleneck has become the graphical visualization and not the network anymore. RP1 tries to improve on this side by offering various LODs for the avatars around you, that is, you see with better details the avatars that are close to you, and then these details degrade with distance until, at a very large distance, you just see the players as cylinders. The network is also optimized, with the server sending you the information only about the avatars that you see.
Sean guided me to different spaces. He took me to a shop, which was in an “exclusion zone”, meaning that it softened the audio of the crowd outside it so that to simulate a closed space. He told me that he envisions spaces like that one to become the future “websites” of the metaverse. He made me watch a video, and we both could see it in a synchronized way. Then he made me play a game in which I should fly inside some hoops on a track in the shortest time possible. There was a global leaderboard, and I managed to get to the 8th place (damn, I could have done better). The leaderboard was synchronized with all the other players, and it worked well, but it took ages (i.e. various seconds) to synchronize and show my score.
The solution also was able to simulate a day and night cycle inside the virtual city.
What was impressive about this demo
I was surely surprised by how it felt natural navigating inside a small city, going around and seeing and listening to people around me. It was like in real life. And I know that in VR if something is so natural that you find it obvious, then it means that it is a great technology.
I’m so used to having limited close spaces with 40 people inside, that the possibility of just going around in a 1 squared kilometers space and meeting how many people I could, and listening to their voices create a compound quasi-realistic audio, was so liberating. It was refreshing. I would like all virtual worlds to be like that.
For sure RP1 is up to something interesting here.
My doubts about this solution
I described to you how RP1 has great technology, an interesting demo, and a fantastic vision. And I guess that reading this article, you are as impressed as me. But I think that it is also fair that I balance this opinion with some doubts I still have about this product.
First of all, I find it weird that no other person in this world has found a similar solution. I mean, we are in an age where there are many computer science engineers getting a degree each year, many universities doing research on all the topics, and many private companies investing dozens of millions in multiplayer games and/or the future metaverse. I find it weird that no one else in the world has found a solution with similar performances. It’s possible, of course… throughout history, we had many people that invented something before all the others, and we may be in front of a similar case. But it may also be that some major companies have already similar solutions in their R&D labs, or that these solutions have already been found and then discarded for some drawbacks I personally don’t know about.
Then, what I’ve been shown is just a prototype. Having a test with 4000 bots is not like having it with 4000 real people, because real people always create more mess than you imagined (trust me, I’m an engineer, and I know that very well). So until a test with real people validates this solution, we have to take it with a grain of salt.
The visual problem is also remarkable: what is the point of having 4000 people on a server on Quest if I can just see 60 anyway? Of course, this doesn’t impact the credibility of this solution but makes us think if this is the real bottleneck we should focus on now in VR. For other platforms, like flatscreen PC, this shouldn’t be a problem, though.
Consider that the virtual world offered by RP1 on WebXR is very basic (as you can see from the screenshot), but I still had less than 70fps most of the time anyway. I wonder why the framerate was so low, considering that the visuals were so low poly. Was it because of WebXR? Was it because of some optimizations needed? I had no time to profile everything, so I can’t tell you, but it is another problem that must be fixed.
The spatial audio is cool, but honestly spatial audio is a feature I have had in Unity for a lot of years. And the crowd simulation audio was not that accurate: as I’ve said before, I found it quite loud and annoying. Probably I would have preferred not to hear it: this is a classical case where a feature that is technically very cool may not always be desirable from a usability standpoint.
Also, I would like to add that the world is very basic, and while Sean told me that they are working on creating games and synchronizing physics and interactions, nothing of that was present. And the only synchronized element I was able to use, which is the leaderboard of the game, took a lot of time to synchronize. So it’s not clear how performant is the system with interactions, and especially with physics, which are super-hard to synchronize in multiplayer games.
What I want to mean is that a single feature, while remarkable, may not be enough for a project to succeed. Just to make you understand the concept with a real-life example from my job as a developer. In a recent project, I had to choose the networking library for an application. I evaluated Improbable, which supports up to 4000 people, and Photon, which supports up to 200, in the same instance. Improbable has very poor documentation on its website, and it is not clear from the docs how well it works in synchronizing physics. The application I am building is for Quest, where the system does not even support seeing thousands of people at the same time. Photon is well documented, and the gold standard for networking in gaming, with proven success in many shipped products. It synchronizes physics and interactions in a good way. For these reasons, I’ve gone for Photon, notwithstanding the remarkable results obtained by Improbable. It is just that to do a complex game with physics involved, a more complete solution is more suitable.
I have written this paragraph not to tell you that you should not trust RP1, but to give you the suggestion to see everything with a critical eye.
Final impressions
At the end of the day, I came out “intrigued” by my demo with RP1. The results they have obtained at this stage are remarkable: going around a small virtual city and seeing and hearing all those people around me in a natural way was definitely cool. Their tech demo shows that they can deliver what they promise. Their technological stack is very promising, and it has the potential to become the foundation of a future metaverse. If they really manage to put billions of people in a single shard, they have solved the networking problem of the metaverse. Even if they just succeed with the 100,000-users test this year, this would already be great.
But at the same time, the project is still in a too prototypical stage to tell if it will actually succeed, or if there will be some unexpected unsurmountable obstacles down the line. They still need to test the solution in real conditions and evolve it to add all the needed features to create a complete interactive virtual world. They need to go from a prototype to a real product, which is one of the most complex operations for all the companies in the world. I loved the passion and the positivity of Sean, so I for sure will keep an eye on this company and truly hope for the best for it. And I suggest you do the same.
Good luck, RP1!