Published: May 22, 2023
Somewhat unexpectedly, I took a break from working on levels this month to focus instead of **performance and memory optimizations**. This was brought on by the fact that I made some release builds for the first time in a while and found that my iOS build crashed on startup because it was running out of memory loading the main menu!
The main culprit? These huge backdrop texture atlases...(this one is 64 MB!)...
Your first thought upon seeing these atlases is that they're really wasteful. Why is there so much empty space in the upper-right? Well, that one is because the texture atlases need to be even powers of 2 in dimensions (1024, 2046, 4096). I could, of course, have each layer be separate, without packing them into a single atlas, but then I'd lose all the performance benefits of being able to batch the draw calls for all of the background layers together.
The better question is why does each backdrop layer have so much vertical padding? Well, I didn't want to make any assumptions about the player's aspect ratio, resolution, or zoom settings, so the easiest way for me to solve that was to just author all of my backdrop layers with loads of vertical leeway, so that they'll always be fully in view.
Each separate layer is exported at 500x1200 pixels (very tall!), and then tiled horizontally by the game. Some of the levels have upwards of 10 or 15 separate backdrop layers, so that's quite a lot of pixels...
The first thing I wanted to do was see if I could just store the textures more efficiently without changing anything about my authoring workflow. You may have noticed that the texture atlases are all grayscale (no color). This is a change I made a long time ago, back when I decided to use a palette shader for the backdrops. Essentially, I only really need to represent indices into my color palette (currently, one of 11 colors), so during my export I just use grayscale colors that the pixel/fragment shader can read and then interpret as color 0, color 1, etc. I also sometimes have partial transparency, so the alpha value is also important.
However, the textures are still encoded as 32-bit RGBA, which means 8 bits are assigned to each of the red, green, blue, and alpha channels! That's pretty wasteful, so I wanted to look into whether Unity supports other lossless texture formats (across multiple platforms). It does, in fact you can actually use the "R 8" texture format, which exclusively encodes a red channel (nothing else!), and only uses 8 bits per pixel (25% of what I was currently using!).
That seemed perfect, as really all I needed was grayscale values anyways. The one problem was that I still needed to store alpha values to handle partial transparency. Could I somehow pack both the color index, and the alpha information, into 8 bits?
Since I only have 11 different colors in my color index, 4 bits is enough to encode that (2^4 = 16). That would leave the other 4 bits to store alpha information, which would mean I could have 16 different possible alpha values. That's more than enough for my purposes, so I went ahead with this strategy of using 4 bits for color encoding and the other 4 bits for alpha values:
To get this all working, I needed to first write a python script to take all of my original backdrop exports and encode them into an 8-bit red channel like you see above. Then I needed to modify my palette shader to do the reverse: take the 8-bit encoding and parse it into a color index and an alpha value.
After a bunch of shader math debugging and fussing around with bit arithmetic, it was all working (everything looked the same as before) and the iOS build was no longer crashing. Hooray!
We can still do better, of course. The next step was to see if I could get rid of all of the extra padding on the top and bottom of many of these images. Take this cloud layer for instance:
Ideally we could only store the actual texture data that really matters (the middle section). The top half is all transparent, so we can just discard that, and then for the bottom half we can just "clamp" the texture lookup so that the bottom-most opaque row is essentially repeated indefinitely.
Doing the crop itself is simple enough -- I just modify my python image-processing script to analyze the rows of the image and trim it accordingly. We end up with this nice cropped version of the image:
The trickier part is that we now need to render this in the same way as the original texture. There are a couple of problems with this...
First, the new origin/center point of the sprite is different than before, since we trimmed an unequal amount of rows from the top and bottom, so it's going to be offset from where it was supposed to be drawn. To fix this, I added processing to my script to keep track of how much the new cropped sprite is offset by. I also track some other important metadata, such as whether the top or bottom sections (or both) should be repeated transparency, or a repeated opaque row. Then I output that all to a C# file that I can read in:
{ "level2-5_background_4", new Entry { Offset = -62.5f, TopTransparency = true, BottomTransparency = false, OpaqueBelow = 1, OpaqueAbove = 55 } },
My backdrop tiling script is responsible for taking the stored offset metadata and shifting the center position of the rendered sprite accordingly.
The second issue is that while Unity supports texture coordinate clamping, there's no way to do that when the sprite in question is one of many sprites packed into a texture atlas! Unity's sprite renderer only handles tiling in a very specific way, which no longer applied to what I wanted to do, so I had to modify my fragment shader to handle the texture clamping part.
In order to do this texture clamping correctly, I also needed my fragment shader to understand what UV texture coordinates it was supposed to be working with inside the texture atlas. Normally the fragment shader is completely oblivious of this -- the Sprite renderer is responsible for handing it a set of UVs to render and then the shader just does the texture lookups blindly.
It also turns out that you don't actually have access to the sprite UV metadata from within your fragment shader =/. So I needed to pass those into the shader, =and= I couldn't use uniform variables since that would break batching. Luckily, Unity happens to expose a SpriteDataAccessExtensions class which allows you to write to the UV texture coordinates of the sprite mesh used by a sprite renderer internally.
In addition to allowing you to modify the main UVs, it also lets you set additional texture coordinates on the mesh (TexCoord1, TexCoord2, TexCoord3, etc.). I used those to pass extra data to the vertex shader -- and then through to the fragment shader -- including the sprite UVs from the texture atlas.
This took a lot more debugging to get right, but at the end of all that, it was working! Here's the new version of the texture atlas from before (in all its red-channel glory), which is 1024x1024 instead of 4096x4096, and 1 MB instead of 64 MB!
Rhythm Quest isn't really a performance-intensive game, so it runs fine on most systems. That said, there are a couple of areas where it can get into performance issues on lower-end devices (surprisingly, the Nintendo Switch is the main culprit of this so far).
One major performance bottleneck involves overdraw, which is a term used to describe when pixels need to be rendered multiple times -- typically an issue when there are many different transparent / not-fully-opaque objects rendered in the same scene (*cough* backdrop layers *cough*).
Unlike in a generic 3d scene (where we would try to render things from front-to-back, to minimize overdraw), for our backdrop layers we need to render things from back-to-front in order to handle transparency correctly:
Unfortunately, this results parts of the screen being rendered to many times over and over again, particularly the lower areas (all of those overlapping cloud layers...). The good news is that the cropping we did above already does some work to alleviate this a bit. Before, the large transparent portions of backdrops would still need to go through texture lookups and be rendered via the fragment shader, even though they were completely transparent (i.e. didn't affect the output). But now, we've cropped those areas out of the sprite rendering entirely, so they aren't a concern.
We can still do a little more optimization, though, for opaque backdrop sections! Take this layering of opaque cloud layers from level 2-5 as an example:
There's a lot of overdraw happening on the bottom sections of the screen. What if we were smart about this and kept track of which portions of the screen are being completely covered by each layer, front-to-back? That would let us render smaller screen sections for all of the back layers:
We can handle this by having our image processing script store some additional metadata (the "OpaqueBelow" and "OpaqueAbove" fields) so we know at which point a background layer obscures everything above or below it. We then need to modify the backdrop script to adjust the drawing rect and UVs accordingly (easier said than done...)...
The end result of all of this is...that everything looks exactly the same as before...
But! It's significantly more efficient both in terms of memory usage and rendering time. I'll have to patch the existing demo builds with this optimization at some point, but the Switch build is already showing some improvements, which is nice.
We're not completely done with performance though, as right now the rendering of the water sections are also quite inefficient! I may try to tackle that next...
<< Back: Devlog 53 - Level 6-1
>> Next: Devlog 55 - Gamepad Rebinds, Odds and Ends