Tea For God is a VR game. This means that it has to render two images for each eye. And it has to do so in 90 frames per second. There are several approaches to rendering and there are more and more being developed. I tried a few and one that worked the fastest for Tea For God was the simplest one, rendering two separate images. Note that for different engines, for different projects there might be different solutions that are the best.
But it’s not a post about rendering. It’s a post about multithreading. If you know a thing or two about multithreading, skip the next few paragraphs. One more thing before we go, initially I was considering breaking it onto a few posts but decided to just have one bigger post.
What is multithreading? A long time ago, I exactly don’t remember when, but it was more than 10 years ago, maybe more like 15 or even 20, most computers had a single core. Most of the games were running single threaded. This means that everything was happening after one another. Every single thing. In general, first the input was read, then game logic was run, everything related to it and then that very frame was rendered and presented to the user.
Then multiple cores came and developers realised that they can run few things at the same time. Imagine that each core is a person and they are given specific tasks. With just a few cores, the easiest approach was to have one frame do rendering, the second do the game logic and if there is another one, it can stream things or maybe do some background tasks. This shift meant that we had to prepare a game frame but we couldn’t render it while we were preparing it, so we only could render a previous frame. So when we present a rendered frame to the player, there might be already another game frame ready. The player sees past. But that is exactly what happens in our brain anyway. What we see is not there really is, but what happened some very short time ago. This happens due to the synchronization of different signals that come from our body. So it’s not such a big deal when there is high enough framerate.
We shifted to rendering a previous frame, but we were still preparing a game frame on one thread. With 2 or 4 cores available, this was not a big deal. But as many computers now have 8 cores and soon will have more, this puts us in a situation when some of the cores remain not used by the game. Some of the developers decided to change that and there are numerous different approaches on how to deal with multithreading.
This post covers how I dealt with multithreading in Tea For God.
In the game, I try to use all available cores (I have a system that tells how many cores can be assigned to a given type of job, each core may do few different things). This required most of the task to be broken into most basic, simple tasks and to keep communication between them. Back to the “cores are people” example. Imagine that two different people want to use the same thing. For example, take a drawing board.
While two different people are able to look at it or read anything from it, they should avoid situations when they both draw or write on it. Sometimes nothing bad will happen, but sometimes they will run into each other. And with this understanding, they can’t use it at the same time. The simplest solution is to have them to queue. But this means that one of them will idle. In my opinion, this is acceptable as long as they don’t idle for too long. “Too long” here means that we are a sole user of something just for the amount of the time that we deal with it. We should not block anyone else while we do something occasionally and we do lots of other stuff.
But this is acceptable only when it happens occasionally. And many things during a game frame happen not just every frame but dozens or hundreds of times during each frame. We just can’t lock everything all the time. But – we can arrange the work in such a manner that in a given period of time, all tasks will just read common data without modifying it and will prepare stuff in separate parts of memory so following tasks could read it and do something else.
And with this idea in mind, I divided a game frame into lots of small steps. This is kind of obvious because the game frame is divided into a lot of steps, but I pushed it a bit more. Divided it into steps that have a common input and separate outputs (and don’t require to know what’s in other tasks output). For example, there is a step that is responsible just for gathering data about collisions. It checks the world, for each object it checks if it collides, how it should move to resolve the collision but doesn’t resolve any collision. This happens in the following steps. The next one prepares movement. Takes into account what AI wants, what player wants and what collisions tell us is possible. But it still does not do any movement. Up to this moment, we don’t modify any general state. All we change is the internal state of objects. What’s more, we don’t change their actual velocity, because that might be used by other objects to determine how to behave. We actually just set up the “next” velocity. When everything is solved, we know how we move and where we move, we… just move. And this is where we modify anything in the world. This is also when some conflicts may occur. But they may happen only sometimes. When we move objects from one room to another. And that’s when I have locks. Because movement between rooms happens rarely (in terms of a frame, with 90 frames per second, with a few dozens of NPCs, some of them standing, some moving within a room, changing rooms happens really rarely).
This should explain it to you, the basic approach of how I deal with a game frame. Now, I will present you a list of all parts, everything that happens during a game frame. The list is quite big at the moment but I want to show you have fragmented the frame is. This is only the game frame, without system reading input, rendering, audio etc:
- process AI messages
- get collisions information
- get presence information (gravity, a surface we’re walking on etc)
- AI perception (check what AI sees)
- AI logic (this is when AI thinks, this is done with latent functions, I shall cover the AI in a separate post, the most important thing here is that
- AI does not change anything in the world, just gives orders what should happen)
- locomotion (change from how and where AI wants to move into actual velocity we want it to move, this is also a step in which navigation data is used, but the navigation paths have to be prepared in the “advance logic” part)
- doors logic (decide whether to open or shut them)
- prepare movement, the first part (take everything from above, collisions, AI requirements, also what is possible for given object and calculate velocity that we would like to use in this frame)
- animation logic – at the object level
- calculate preliminary poses (that results from the animation logic)
- animation controllers’ logic (this is when animation controllers advance but they are not doing actual IK calculations etc, they just plan their actions)
- prepare movement, the second part (take any feedback from the animation and get the actual velocity)
NOTE: up to this point we haven’t changed a thing in the world. We did a lot of work, but we haven’t changed a thing in the world. This is quite important and I will describe why, later on. Everything before this point is called “a logic part of a game frame”. Everything later is considered “a physical part of a game frame”.
- move everything using velocities prepared in previous steps
- post move actions (this is for presence module only, just for storing some data and triggering kill Z)
- open and close doors
- advance Points of Interest in rooms
- advance Points of Interest in objects
- post move actions (this is for gameplay modules)
- calculate final pose (this is the final animation advancement) and update attachments
- custom modules
- advance temporary objects (these are particles, projectiles etc. basically, this part is everything above, but done only for those special objects)
- build presence links (“presence links” define in which rooms objects are)
- finalise frame (in this step, objects are readied for rendering: bones’ transforms are stored, materials are prepared; also sounds are advanced)
But this is only the gameplay side of a game frame. We’re still left with rendering, sound and things that do not fall into a game frame.
Rendering and sound are handled in a similar way:
- build a scene (create a proxy for everything that there is in the world and its state (that is relevant for rendering/sound). For the rendering, we actually create two scenes. One for each eye. For the sound, we’re ok with one scene. This has to be done within a logic part of the game frame. All scenes are prepared on different threads. This allows to deal with them as fast as possible and not stop game frame advancements.
- render the created scenes or update the audio system (FMOD in my case). For the rendering, we do one extra thing. Just before we render a frame, we read the latest pose from the VR system, to have the most actual camera location. Because we already have built the render scene, we have to alter it a bit if required (especially when because of the pose difference, we have moved through a portal.
You should now notice, that I try to pack as much work that doesn’t affect the actual state of the world in the first part of game frame advancement. The latter happens on any free core as the render/sound threads have the highest priority. Also, the rendering thread is the main thread. There is a one selected thread for game frame advancement but it only creates tasks/jobs and deals with a few other things that I will soon cover.
It is important to mention that job management takes some time. Switching between jobs/tasks also takes time. That’s why I batch them. Batch size depends on how many jobs there are. The more data to process, the bigger batches. This way I have as low idle time as possible and I also benefit from batches. At some point, I was considering storing the advancement time for each step but it at the moment this is not required. Game frame advancement fits nicely with a huge extra buffer.
There is also one more optimization that I have. Some of the tasks are not advanced every frame. If an object is not visible (actually, if a room the object is in, was not visible for some time), it may skip some advancements, collisions, AI (which is latent anyway), animations. This saves a lot.
When I was working on AAA games, we had to resort to such optimizations when it came to animations. The more distant an object was, the more frames were skipped for it. This led to very strange bugs. It turned out that other parts of the code were dependent on animations advancing each frame. Sometimes it was for particular states, sometimes for particular objects – we had to make sure that in these cases, animations were advanced every frame. To avoid having such bugs that result in disabling the optimization, I decided to use this approach so early in the project time. The performance gain is significant. The game loop time goes down for about 20-40%.
Oh, there’s one more thing similar to this. AI code uses latent functions. This means that for many frames the AI is just waiting. Then does a bit in one frame and waits more. If you have lots of AI characters, you may end up with doing some heavy stuff related to logic only for a very few of them in every frame.
Back to the topic. I showed how game frame advancement, rendering and sound are connected to each other. But this is not everything that is happening in the game. There are also extra tasks that may span over a few frames. Nav-mesh building, various asynchronous tasks.
Nav-mesh building is quite obvious. When I create a room or change something in it, nav-mesh is built. When the nav-mesh is built, we have a request for a new nav-mesh building task, we manage all that at the beginning of a game frame. Before we create scenes and advance game related stuff.
Various asynchronous tasks are anything that takes some time and does not have to be done immediately. These might be a generation of the world, adding details to the rooms, spawning NPCs etc. Most of the work in these tasks does not affect the world. They are completely separate from anything that happens in the game. Most of the time. Because sometimes we have to sync with a game frame. We want to read something from the world or put a newly created object into the game. Any asynchronous task may switch to synchronous mode. Just for a very brief moment. This moment is just after nav related tasks management and before scenes and game stuff. Most of the times, when an asynchronous task wants to do that, it will be the one waiting. The game frame is much more important at actually it just allows an asynchronous task to do something.
There are a few additional mechanisms on top of that. One of them is related to object activation. Because we may have multiple objects or nested ones, we don’t want to activate (put them into the world) one after another but all at once. We queue them and then mark them to be processed. The actual activation is divided into two parts. One, getting objects ready to be activated, this is an asynchronous job and it may require creating new objects (attachments etc). The second part is a very simple synchronous job – to add readied objects to the world in a batch.
Another example of such a system is “delayed object creation”. It just helps to create objects in order (because each object may have sub-objects that should be created together). If we would like to create three different objects and we would add asynchronous tasks for each one AND each one would require to have more asynchronous tasks created (because we want something to be done, but not right now, after we finish doing the current thing), all of them would interweave. To avoid that, object creation tasks are put into the delayed object creation queue and a new object is created only if there are no current asynchronous tasks running/queued.
Very important thing is to remember, that during asynchronous tasks we should not try to access anything that exists in the world, or the world itself. The world is constantly changing and at different periods of time, different things in it are being modified, added, deleted, replaced. That’s why asynchronous tasks run beside the game and hop in during the synchronous window to do something.
Problems I run into? Many. I have lots of concurrency checks to make sure I run either in a synchronous task or in an asynchronous task. I have lots of mechanisms to make sure that something is read/written when it is expected to be read/write. All that stuff is only running in a development mode. As I already mentioned, I also have locks (spinlocks and multi-read/single-write locks) that might result in short waiting times. They are done to make it easier to wrap your head around what’s going on and to avoid adding extra mechanisms to queue stuff, process and distribute.
One of the problems I run into quite early during the development was trying to avoid breaking frame into more and more steps. Especially when I noticed that time required to just get all required objects and process them was getting bigger and bigger and at one point, there was more time spent administrating tasks. I solved that by batching jobs but also selecting and marking objects that require something to be done in a particular step. Ideally, it would be great to avoid having a very little work being done, but if it is not possible to put it somewhere else, it’s better to waste that extra administration time, but have a clean code.
I had one single moment when I was devastated and wanted to give up. It was when I added lots of background objects to the game. Two things happened. The level generation time rose to two minutes. From 10 seconds. And the framerate dropped from much more than 90 fps to 20 fps. I didn’t know which one was worse. And I didn’t want to get rid of all those background objects. Solutions came quite quickly:
First, I decided to share vistas between windows. Most of the times you see same stuff when you look out of the window. Having different light direction taken that is applied during rendering only, helps too. Hey, smoke and mirrors!
Second, I divided everything into “we need that to have the level running” and “this can wait”. This sped up level generation time a lot. It was then just 5 seconds to generate the required content. Everything else, the NPCs, decorations, vistas etc. are created when the level starts. Because the station door takes a while to open and because the player movement speed is limited (how fast can you run?), there is a lot of additional time to create all that stuff. And in the final game, a player may want to buy/sell stuff, craft something etc. This is when I added more complex asynchronous tasks management (before it was just “add asynchronous task”, after that I had the synchronous window, world jobs (asynchronous tasks) and “delayed object creation” queue). Right now, the levels are more complex and there are more NPCs but the shorter level still takes just a few seconds to be created.
I still had to deal with the framerate. That’s when I introduced a “static object” marker. If an object does not change during its lifetime, it’s advanced just once and then left as it is. It can be switched back to an active state, if there is such a need.
I was back to short level generation time and framerate back to more than 90. With lots of additional objects in the scene.
I heard many times that multithreading may result in the worst horrible kind of bugs. And there are a few kinds:
1. Deadlocks. Everything freezes because there is a task waiting for another task that actually waits for the first task’s results. Avoid such dependencies. It is tempting to just add a bunch of tasks and tell them what they require, what they provide and have a system auto solve that. Sooner or later someone will add a task that will break that careful chain of dependencies and put a loop in it. And it will get triggered just once in a while. Break the game frame into discrete steps with a defined purpose. Add more steps if it is required. Avoid putting some stuff inside something else just because it’s easier. If you can’t be sure what is happening in a particular step, this will hit you. Badly.
2. Deadlocks may happen also because there are lots of stuff locked at the same time. A task wants to lock a few different locks. And another task wants to have access to the same stuff but maybe uses locks in a different order. Keep locking in the same order? Maybe, but better is to lock as little as possible. And for as short period of time as possible. There might be still issues with memory access and stuff though, you have to remember about that too. Such locking should be considered a solution only for cases that happen rarely and locks are there to prevent those one-in-a-million situations.
3. Something is broken somewhere because something else is doing something. That’s a lot of “some”s but when you look at multithreading bugs, you will end up with lots of stuff happening at the same time. And sometimes when you are notified about an issue, you check the code and you don’t see anything suspicious in the thread the bug occurred. To realise that there are 5 or 6 different threads doing their own duties and that one of them could do something just a fraction of millisecond ago and you can’t even tell which one because the call stack just tells you what is happening right now. This is kind of a similar problem to the spaghetti code. You just add one more dimension to it and you have a multithreading spaghetti. Logging stuff may slow you down (especially when you mess up something and introduced locks there, I had the locks in my performance measuring tools which resulted in just one task doing actual work and others waiting for performance tool to be available).
My approach is to try to prevent such things from happening. Clear input and output that don’t mess with each other. Have things separated. For most of the cases, it should be quite easy to do so. The biggest offender here is the gameplay code. Dealing damages etc. You can either queue stuff to process in a separate step or lock. Some of such bugs are easy to repro (with dual wielding, shooting from both guns at once at the same target resulted in damage code running for the same object at two different threads).
One thing that is good and bad at the same time is that with multithreading working properly, having lots of optimizations, you may end up using 30% CPU. Which may make you write some odd code that is not the best, the fastest one. Because why you would care if you still have lots of time to waste? At least make it easy to read. And limit such cases to AI and gameplay. They go way beyond what’s happening right now and sometimes require a bigger picture to understand them. But for example, collision detection code? Single purpose. Detect collisions. Do whatever it takes to make it run as fast as possible.
That’s it. This should give you an insight into how I managed to get the game running at 90fps, avoid common multithreading issues (at least neither me nor anyone who tried the game run into these – yes, I had crashes, but they were not related to multithreading). If you’d like to ask me something, share your experience, please do so. Even if you want to tell that this system is a pure nonsense. Well, it works, but I am sure that there might be better approaches and if not me, others will benefit if they learn about a better one.