Today I want to take you on a journey on one of the coolest prototypes I’ve ever worked on, which is about creating Unity applications that can dynamically modify their logic at runtime depending on the user’s needs thanks to generative artificial intelligence. And it will have a happy open-source ending, so make sure to read this article until the very end!
Working with AI
As a CTO of a startup dedicated to virtual reality concerts, my job is to be up-to-date with all the most important technologies of the moment. But while I am quite skilled in XR, I noticed that I had never done any real experiment with the technology of the moment, that is generative artificial intelligence. I mean, I use ChatGPT like every one of us, I have asked it technical questions, I’ve chatted with it just to spend my time, I’ve even asked him to create a story based on the title of the most popular video on PornHub (I did it for real :P)… I have used Dall-E or Midjourney to create some images… I even use Copilot every day for my dev work now, but I never did some real dev experiments with it. I never called OpenAI APIs as a developer. So I thought it was time to get my hands dirty with the technology.
The problem was that I had no idea what to build with them. I thought about what I had seen around, and most of the experiments I had seen were about ChatGPT-powered avatars or places where you go, write a prompt, and there appears an image generated by the AI. All of this is nice, but it’s so overused that was exciting for me like watching a documentary on the mating of warthogs. I wanted something that could make me feel motivated, something that elicited my passion (like the one of warthogs during the mating process). Something so cool I could try and fail big. And then I had an idea.
Dynamic generative applications
The lightbulb in my brain powered on by reminding my interview with the amazing John Gaeta (when he still was at Magic Leap). When I asked him about the future of storytelling in XR, he told me that in his opinion the future will be experiential, with people living interactive stories around them. My brain created a short-circuit with an article I read in the past about the future of movies, that was hinting at the fact that in the future you may have a system that crafts the movie story around you: you tell how you feel that day, and the system, which knows you, creates a full-fledged interactive story for you, to be lived immersively. That’s a vision I really believe in: I think that having content that shapes around us is the long-term future of XR. Even more, I think that this content should also be reactive to what I’m doing: depending on how I feel, what interactions I’m doing, and the choices I’m making, the whole experience should change to adapt to it. A bit like when I tried the Metamovie: Alien Rescue and the actors that were with me were changing their behavior and what they were doing depending on how I was behaving.
This vision can be expanded to all fields. For instance, in games you could have a horror application that generates a totally new type of enemy that caters to your deepest fears; or a solo music experience, like the ones we could make at VRROOM, could change the effects and interactions depending on your mood and the type of the music you are playing; or an educational experience may change the way it explains you things depending on how well you are learning.
The idea of AI-powered dynamic applications excites me. For me, it is the future, and I would love to be part of it. But at the same time, it’s a very long journey, probably something that requires even more than the usual 5 to 10 years to become a reality. For sure I couldn’t embark on such a long adventure, but I thought that even if I could make 1 step out of the 10,000 needed to make this become real, I would have been happy enough (more than making the n-th speaking NPC), and that would have made me taught a lot.
The first step, or maybe two
I thought that if I should have started this kind of journey toward dynamic applications, the first step would have been necessary to evaluate the feasibility of the idea, and so I did.
One of the biggest obstacles for me was evaluating if it is actually possible to change the behavior of an application at runtime. Usually, applications are developed, then built, then executed. This is the logical succession of events, in this exact order. Once things are built, you can’t keep developing to modify the executable, which was exactly what I wanted to do.
Some applications allow for modding or for the addition of plugins, which means that even after the application is built into an executable, its functionalities can be expanded or modified by external elements. This is the most similar thing to what I wanted to obtain but made automatically by AI.
When people talk about “developing a game with ChatGPT” or something like that, usually it is about using ChatGPT to generate the logic of the game while inside Unity. ChatGPT intervention happens in the “development” stage, not later. This is all cool and useful, but what I wanted to obtain was an application that could modify itself later. Because at development time, you can’t predict all the possible cases related to the user. Of course, you can simplify and then make the system perform choices at runtime (e.g. if the user is happy, generate a cake, if she is sad, generate a napkin), but here I wanted to evaluate the case where AI could be totally free to generate what it wants depending on the particular exact condition the user is in. So basically I wanted AI to auto-mod the game to add the needed logic on the fly depending on the situation. This was for me the first step of the research.
Then I thought about a second one: if we want such a generative AI, it should not only be able to create the logic of one element, but it should be able to coordinate the generation of whole experiences. So if there is a generative horror game, the AI shouldn’t be able to create only one dynamic type of enemy, but to create whole environments with wholly coordinated enemies all dynamically generated, only starting from a sensor detection about the user like “the user is very scared”. This is a very complex topic, so for this personal experimentation, I said I wanted to just scratch the surface of it.
Now I was on a mission. An ambitious plan, during a very busy working period, and the almost certainty of failing. So how did it go?
Well, not that bad if I’m writing this post…
Dynaimic Apps
I decided I wanted to give a name to this kind of generative applications, and I called them “Dynaimic Apps”. If people can invent names like “Chief Futurist Wizard”, I feel entitled to create something like “Dynaimic Apps”. Don’t you agree?
For this prototipying, I needed to find the basic blocks to make the dynamic generation of code to work. And surprisingly, I did:
- OpenAI could assist me on the generative AI side. It’s the hot company of the moment, so I wanted to play with its APIs. I found a very good plugin to call OpenAI APIs directly from Unity, so the integration effort was zero. OpenAI models can generate code (also for Unity), and that’s why many people use ChatGPT to assist them in their development effort
- Roslyn C# Runtime Compiler assisted me with the logic generation. Doing a Google search, I found out that since Microsoft opensourced the C# compiler (called Roslyn), many things have been made with it. One of the applications people have found is to generate code on the fly in Unity. And the guys at Trivial Interactive made a great asset for the Unity Asset Store which cost only €20 and that lets you integrate the Roslyn Compiler directly into Unity.
You can connect the dots: if I write a prompt to OpenAI, and that prompt returns a Unity script, and then that script is compiled on the fly by the Roslyn C# compiler and then loaded into memory… the application is having its logic created on the fly by AI! This was exactly what I wanted to do.
On paper everything made sense, so it was time to make a cool test app to put this assumption to test…
Cubic Music
I suck at making graphics, and that’s the reason why whatever I do, it involves always cubes. Cubes are easy to do. And somewhat also marked some of the most successful apps I did:
- The Unity Cube is an application I did and published on Meta Quest App Lab: it just shows a Unity Cube in front of you. It got thousands of downloads.
- My tutorial on spatial anchors was about generating cubes in the room, and in Shanghai, I have been told by my friend Davy that his daughter loves playing with it.
So I decided my prototype should have been about cubes. And since I’m working in a company that makes musical experiences, I decided to go with something that could have some connections with music too.
This way “Cubic Music” has born. I envisioned it as a sandbox where you are in mixed reality with your Quest Pro (and maybe in the future, the Apple headset) and can spawn cubes around you, and then using a vocal or textual prompt you can tell cubes what to do. The logic of the cubes should be generated at runtime by the AI and then loaded into memory by the Roslyn compiler. There would be music playing, and you could ask the cubes to do also something about it: e.g. by saying “make the cube green when the volume of the music is high, red otherwise”.
I started playing with this concept, starting with a flatscreen version because it was easier to test (and also can be distributed to people that don’t have a VR headset). As soon as I assembled the pieces in a rough prototype, I wrote “make it blue” as a prompt, the AI generated a script that instructed Unity to change the renderer material color to blue, Roslyn generated a managed assembly out of it, loaded it into memory, and bam! The cube became blue! I was incredibly happy… from the outside, I looked like a crazy guy that got excited by a cube becoming blue, which is one line of code in Unity… but on the inside, I know I was happy because it wasn’t me writing that code, but the AI at runtime. That was the specialty: my application had no logic at all to make the cube blue, it was the AI that added it at runtime.
I so polished the code and worked on the final version of the prototype, which you can see in action in the video here below.
As you can see, it is exactly as I wanted it to be: you wander around the room, you put cubes, and then you use your voice to tell them what to do! And the cubes can react to your voice and to the music that is playing… cool, isn’t it? It’s a cool mixed reality sandbox to express your creativity around music, without knowing anything about coding.
I made an application with the code generated at runtime. The first step of my mission was complete! Time for the second one…
(If you want to know some technical details about how I made this work… don’t worry, there is plenty to read… be patient!)
Emotional Cubic Music
If you remember well, I had a secondary mission: making the AI coordinate a whole scene generation around just the user mood. In the first step, it was me telling the AI what kind of logic I wanted, but this can’t be the future of Dynaimic Apps. For instance, during a horror game, it could not be me saying “now make me a zombie with the face of Scarlett Johansson that runs very fast and can dodge attacks”, but is the system that should create this automatically.
So I had the idea of another kind of “Cubic Music” experience: instead of me spawning the cubes and creating my own experience, it should have been the AI to do that. I just wanted to give the AI a piece of high-level information about myself, like “I’m very happy” and then let the it do all the rest, generating cubes and their logic in a scene that was coherent with my mood. Let’s say this was a simplified version of creating an interactive scene of a game on the fly. No Scarlett Johansson zombies, just cubes.
But even in this simplicity, this proved to be a very hard task. So in the end, I had to simplify it even further a lot, and basically ask the AI to generate a fixed number of cubes (only 7), and modify of them the only properties of position, rotation, scale, and color to express the mood. Doing these simplifications, I managed to obtain something that somewhat worked.
You may wonder what it does mean “having cubes that express a mood”. Well, to make this happen, I asked ChatGPT how I could convey a mood through a cube, and ChatGPT pointed me to the fact that brighter colors, bigger scales, and faster movements could be a way to do that. And so we worked on a prompt that could work exactly that way.
The “Emotional Cubic Music” experience works as follows:
- You, the user use your voice (or text) to just say your mood (e.g. “I’m very happy”)
- The AI creates a scenography of 7 cubes to express this mood. For every cube, it specifies the position, and then a behavior. But the behavior is not written as code, but with a readable AI prompt. Here I wanted to create a hierarchy of AIs, a bit like in AgentGPT. A first-level AI decodes the mood into what the cubes should do to express that mood. And then the second-level agents take the instructions of the first-level ones and generate the actual logic of the cubes
- The application reads these instructions created by the first-level AI, and spawns (second-level) AI calls to generate the actual logic of the cubes. Every cube gets generated, then receives the logic that it should have to express the overall mood. Some of the cubes may react to music or to the microphone volume, others not.
- After the generation, you just stay there admiring this abstract work of art created by the AI. Most probably, it looks bizarre, with big cubes that rotate and change colors. But it’s better that you say that you like it, because when Terminators will conquer the world they will remind that day you liked their art.
Mission accomplished also for the second point! I had scratched the surface of the possibility of having an experience being generated dynamically by just starting from the user’s mood!
Difficulties, solutions, and technicalities
You know that I don’t shy away from speaking about technicalities. And trust me, I have written a lot of material about this project. I will tell you where to find it in the next chapter. For now, let me tell you a few big technical problems and how I solved them:
- AI prompt generation: I just started with AI, so I suck at prompts. Especially for Emotional Cubic Music, all the prompts I could think of didn’t work at all. I was going to give up, but then I had an idea: who can help me with writing prompts better than all the others? Of course ChatGPT! ChatGPT is an AI by OpenAI so it knows very well how generative AI works, so can give great suggestions in this sense. I so opened a chat with ChatGPT and asked it to help me with writing prompts. Then I tried its solutions in my application, saw what worked and what did not, and asked ChatGPT to fix what was not working. In the end, I added some small fixes myself. I found ChatGPT very valuable to have support in writing prompts
- AI automatic code generation: OpenAI/ChatGPT is used to write Unity scripts that accept external parameters. For instance, if you ask it to write a Unity script to move a cube, it will probably create a “public float speed” parameter that you can set up in the Inspector. The problem is that here I wanted to load scripts on the fly, there was no possibility of setting the parameters in the inspector. So some effort has been required to write in the AI prompt to auto-initialize all the properties for the generated scripts. Notwithstanding this effort, sometimes the created script still requires external initialization or expects components on the gameobject that don’t exist, or invents some functions that do not exist. That’s why not always when you input a prompt, the CubicMusic actually does something to the current cube. What happens under the hood is that a script is actually created and loaded, but it fails at runtime because the script has not been written in the right way
- Sound reactiveness: Theoretically, OpenAI could generate the code to make the cubes reactive to music, but in practice, I saw that the code it generated to detect the volume of audio was usually horrible. So I created myself the logic to calculate the volume of the music and of the microphone input and then instructed OpenAI that if it needed to use the volume of one of the two, it should have used my own classes. With this trick, sound reactiveness finally worked. This is a little cheat with regard to the idea that all the logic should have been generated by AI, but it’s just a temporary patch: when the AI will have better capabilities at writing code, this can be easily removed
- IL2CPP support: the more technical people among you would have noticed that I created a passthrough AR application for Meta Quest Pro, which requires IL2CPP backend support. IL2CPP means building into native code, while Roslyn outputs managed assemblies. How did I make it work? Well, the creators of the Roslyn plugin on the Asset Store pointed me to an opensource project called dotnow that aims at creating a managed runtime on top of the native IL2CPP code, so that even if the app is built with IL2CPP, you can still run managed code on top of that. Of course, this solution is not ideal, and it’s full of problems (for instance, I couldn’t make scripts with OnTriggerEnter to work), but it was enough to make me finalize this prototype.
Some considerations about AI
All the quirks I had when working with AI made me realize that artificial intelligence has still a long road to go before becoming what we dream of. Writing prompts is complicated, and the AI seems not always follow them how it should. It keeps rebelling from instructions sometimes.
If you try to give very long instructions in the CubicMusic experience, you can be sure that something at a certain point will fail, and the script won’t execute what you want. That’s why I advise using short prompts like “make the cube change the color between blue and green every two seconds”: the more you add, the more the risk for the script to fail.
This confirms my impression that at the current stage, generative AI for many tasks is good to experiment, to test, but is not good for production. Especially, it is not good for production without human intervention, yet.
Opensource repository
Since I made all this effort, and the result in my opinion was cool, I thought it could have been a great idea to share all of this with the community. So following the values of openness and knowledge sharing that I have, and that I share with VRROOM, I decided to release all the work I did as opensource at this Git repository:
https://github.com/Perpetual-eMotion/DynaimicApps
(Perpetual-eMotion is the legal name of VRROOM)
In the repo you may find a very long and detailed README.md file, that points also to a 1-hour long video where I comment in detail about the architecture of the code, the problems I had, and how I solved them. There you can find all the juicy technical details that you were looking for.
What about the builds?
I have not published any build of the application, for a couple of reasons: 1. It is a rough prototype, so not ready to be an app on the Quest Store; 2. In the current code, the OpenAI keys are not well protected, and this would be a security concern for us.
Anyway, I’m open to giving the APK to selected friends, VIPs, journalists, and warthogs that want to take a break from all their mating. Just reach out to me and let me know!
Let’s build together
The journey about dynaimic apps (and concerts!) is long and complicated, but this experiment has been cool to start evaluating them. If the work I did in this sense during the evenings and the weekends can inspire someone to pursue this goal (or another goal that mixes AI and immersive realities), I will already be happy.
Let’s all revolutionize the future of AI, immersive realities, and music together! The metaverse is just waiting for us…
(… and if you want to reach out for collaborations, feel free to get in touch with me or my company VRROOM!)
(The header image uses the logos of Unity, OpenAI, Oculus/Meta)