Optimizing a 2D Godot Game for the Nintendo Switch. My first-hand experience

Switch version
Over the past few months I’ve been hard at work on my homebrew game for the Nintendo Switch, Warnel Chawpiovs. While initially made for the PC, I knew I wanted to leverage the flexibility of Godot to bring my game to consoles, specifically homebrew ones.
After reaching a satisfying “playable” stage for my game on my PC, I tried to compile it for the Switch (note: compiling Godot games for Homebrew-enabled Switch consoles can be done with the excellent Homebrew port of Godot 3.5). To my surprise, it worked out of the box, with the touch screen simulating mouse clicks. Less to my surprise, the game was running at an unplayable 10 FPS, going as low as 1FPS in computation-heavy frames. Starting the battle itself froze the console for a good 10 seconds while the cards were loading.
It would be easy to just give up at this point and say “well it works fine on my PC, the Switch is just too underpowered”. The reality however, is that a 2D game like mine, with so few elements on screen at a given time, should have zero issues running at 30FPS or more on a console such as the Nintendo Switch. So it was time to look for optimization tricks. With some effort, I was able to bring my game to a much more playable 20-25fps. Still not great, but a satisfying outcome with what I consider to be “low hanging fruits”. GDScript being a scripting language of course adds a lot of overhead, but I think 30FPS should be achievable somewhat easily for a simple 2D Godot game on the Switch.
This is what it looked like before any optimization:
So, while the following article is about Godot 3.x (I’m using specifically 3.6 and 3.5), and my work was actually compiled using the homebrew port for the Nintendo Switch, I believe the advice in this article is valid for all versions of Godot including 4.x, if your intent is to optimize a 2D game for the Nintendo Switch and other portable devices. Keep in mind however that “2D” is a key factor here, as there are many ways to optimize for 3D rendering that I will not be discussing in this article.
Generally speaking, I’m trying to show the easiest techniques at the beginning of the article, with more “in depth” stuff as we progress. But all of the points below have been useful in my research.
To some of you, many of these points will probably be too obvious, but I’ve been surprised at my own stupidity in many cases. It’s easy to get lost in the code and miss some very simple optimizations.
Also worth pointing out that this is basically my first time using Godot, and some of the improvements I talk about might have existing mechanisms “handled natively by the engine” that I’m not aware of.
0. TL,DR
Busy? Here are my recommendations overall:
- Lower the base resolution of your game to 1280×720. The Switch will run a 1920×1080 game, but it will be significantly slower
- Cache stuff as much as possible (data, Nodes,…), whenever appropriate, instead of recreating Nodes constantly and/or recalculating the same stuff at every frame
- Remove Nodes from the tree if/when you don’t use them, and cache them somewhere else in a variable if they are still needed. Also, don’t use a complex Node when you could use a simple one
- Move code away from _process(). Lots of things don’t need to run every frame (even if you think they do, you might have other approaches)
- The illusion of speed:
- Lazy Loading might help at game startup
- Updates that rely on _process() can sometimes be updated less frequently than “every frame”
- Improved controls can make navigation easier, and compensate for sub-par FPS by removing some frustration
- Use the profiler to find expensive function calls
- Consider compressed textures
1. Lower the resolution
This one is probably obvious, but lowering the base resolution of your project can go a long way. In my case, it was actually one of the most impactful changes I did to my project in order to get it to run on the Nintendo Switch.
My project was initially designed for 1920×1080, and ran perfectly fine on my PC. When I compiled it for the Nintendo Switch, the console did not complain about the resolution (the Switch will run in 1080p in docked mode, after all) and happily ran my game at… 10FPS.
It actually took me a while to figure this out, until I ran a comparison test with some demo code from the Card Game Framework (the template I’m using for my card game), which ran at about 30 FPS. Granted, it was an empty demo with only a few cards, but the difference was significant. After some digging I remembered that the demo was designed for 720p, while my game was running at 1080p.
Changing the resolution of my game wasn’t technically easy (I wanted to support both 1080p and 720p, and due to my inexperience with GUI design on Godot that ended up being quite complicated), but eventually I got it to work and easily gained about 7 to 10FPS by simply lowering the game’s resolution to 720p.
2. Caching
There are many ways to cache data in your game, I’ll address how I did this in my case.
First of all you probably need to figure out which data is important to cache, but the basic idea is that if you have an expensive call somewhere, that gets done regularly only to always return the same result, either you need to figure out a way to not call it so often (e.g. move it outside of _process as discussed later in the article), or you need to cache it.
Figuring out where and when those calls happen is half the battle. Using the profiler is probably your best option here (see below), but gut feeling can probably help as well.
I’m not sure I can give generic advice here, so I’ll give a few examples of things I cached in my own game.
Warnel Chawpiovs regularly loads text data from dictionaries (the card scripts) and interprets/modifies them in real time. I realized that with all the calculations done in the background (to see if you can pay for the cost of a card, to calculate its valid targets, etc…), the same scripts were being evaluated thousands of times each frame! Modifying a string in itself isn’t super expensive for computer programs, but when it’s done on multiple strings in a dictionary, thousands of times for each dictionary, it starts to become (very) significant.
I could of course figure out ways to optimize how often I call those string replacement functions, but the easy way out was to try and cache the result. I implemented a quick and dirty cache, which turned out to work first try and get something like a 95% hit rate. In other words, 95% of my calls happened more than once and were cached. Even better, the cache is valid for all the duration of the game and doesn’t need to be invalidated until the game is over.

Simply rename your old function to “my_function_no_cache”, then make a wrapper for it with its old name, which will have a cache
Another issue I noticed is that loading grids full of cards (for example when choosing which card to discard when playing another) was very slow. It turned out the original engine was constantly creating copies of these cards (and all their children nodes) to display them.

Warnel chawpiovs often creates copies of cards for various display purposes. I figured that keeping one cached duplicate of each card was more than enough, and better than recreate copies all the time
There are very good reasons to not to want to reuse the same object here and create a copy, but instead of making a copy every single time, I decided that each card holding a single duplicate of itself, and returning it as needed for display purposes, was reasonable enough. It’s worked well so far, and has provided significant speed boosts whenever copies of cards need to be displayed (which is almost all the time in this game, ha).

A simple way to limit the amount of duplicates I create
Caching can lead to hairy bugs. But if used appropriately, it is an essential and easy way to optimize the speed of your code
3. Remove Nodes from the Tree when you don’t need them
GDScript makes it very easy to add Nodes to your tree, to the point that you forget how expensive these structures can be. It’s great to develop stuff quickly, but the poor Nintendo Switch is having a hard time rendering my cards when every single one of them is actually made of hundreds of graphical components.
I can’t overstate how easy it is to forget that you might be creating hundred, sometimes thousands of complex objects in your tree, that end up being not rendered, but still doing some heavy computations.
Specifically for my game, each card is a Node2D with many children that are in charge of displaying its front/back, icons, name, and more…

Things can get heavy quite quickly
Most of these children are actually unused most of the time when a card is in play. And there are hundreds of cards in a given game session, not to mention the duplicates that are being created for other display needs.
Furthermore, most cards on the board are actually not visible, being either facedown or in a deck somewhere. Now, I trust that Godot is doing the right thing most of the time, not rendering stuff that isn’t supposed to be visible, but imagine calling thousands of children nodes every frame just for every single one of them to reply “Nah I’m good, just do nothing for me”. Worse, many of them might actually be doing some heavy processing, only for it to be discarded because the card isn’t rendered on screen.
Look, my code is dirty and I create hundreds of objects in a terrible way, ok ? I get it.
Bottom line is, if an object is in the Tree, I have little visibility into how much load it is putting on the CPU/GPU. And my conclusion here is that if a Node isn’t going to be displayed or used, it probably shouldn’t be in the tree to begin with.
This isn’t to mean the Node should be deleted, you might need it at some point but it can be kept in a variable owned by its parent, for future use (ha, cache again!).
For example, cards in my game display a bunch of stats about themselves (attack, defense, etc…). but most cards actually don’t have any values for most attributes. In my initial design I had all cards have these icons as children, that would be visible or not. It turned out I was doing a lot of expensive calculation on these children nodes even when they were not going to be displayed. I now have logic where I will just not add these children when they aren’t needed, or event explicitly delete children that won’t be used in a given scenario (such as some debug display nodes when running in production)
There are of course many ways to address this, but the rule of thumb is that if you’re not going to use a Node in the current frame, just don’t have it in the tree.
4. Move code away from the _process() function
The _process function in Godot runs basically at every frame. It is useful to calculate movement and display updates, but in many cases you have to ask yourself “do I really need to calculate this value every frame, or am I doing things wrong here?”
It is easy to use _process() for a bunch of things. And unfortunately, I still use it a lot of times. For example I have 2D objects that keep being resized the wrong size by some other component, and the “easy” way out is to resize them correctly at each frame. Of course, the good solution would be to figure out which other piece of code resizes them the wrong way, but that’s hard work… I’m lazy and I’ll use the _process() function until I figure this out(tm)
My point is that it’s always easy and comfy to put code in _process(), but I’ve found that oftentimes, it isn’t the right approach. For example, using signals for your components to be made aware of some critical variable update, is often a better (less performance heavy) approach. When everything else fails and you *need* to use _process() for something that really shouldn’t be in there, see the caching section above, to a least lower the pain.
5. The illusion of Speed
When everything else fails, I’ve also found that if your FPS remains too low, you can compensate for it with a few tricks. Of course that might only be relevant if your game isn’t action packed, but in my case of a “turn based” board game, the following has helped
5.1. Lazy Loading
In its first iteration, my game was loading all textures for all card pictures when a battle would begin. That’s hundred of image files to load on the very first frame. Again one of those things that was imperceptible on PC, but made the game lag terribly at startup on the Switch. It froze for so long when starting a new game that it gave the impression that the console had crashed.
Lazy loading the card textures, that is, loading them only when they are needed (when a card is shown faceup for the first time in the game) has addressed this issue.
Lazy loading doesn’t help with FPS on average during gameplay, but it can help with loading times
5.2 OK, so you still have code in the _process() function. Does it *really* need to run all the time?
Another good optimization I came up with for my game on the Nintendo Switch, was making the decision to not run some parts of my _process() functions in every frame.
Specifically, I have a piece of code that measures the FPS of the game in real time using Godot’s internal functions. If the FPS goes below a certain threshold, my code then decides to not update some parts of the GUI (or e.g. to update them every 10 frames instead of every frame) to save on rendering cycles.

In this example I’m skipping an expensive GUI update for some card stats either if the FPS is below a certain threshold or if some larger GUI animation is going on
The result is that some values such as the total life points of a given enemy might not appear with its correct value for a few frames. It’s definitely not great, but in my tests it has been doing wonders for the overall framerate of the game. Again, here, it’s up to you to determine what is ok to delay and/or not calculate at all. In my case, as long as the internal engine’s values (the actual data used by the game) are up to date, what is displayed to the user can get a bit delayed. Of course, what you can delay or skip depends heavily on what type of game you’re making so I can’t be super specific here.
Tying this up to the framerate in my case allows me to just be sure that on “high end” machines such as PCs, the experience won’t be sacrificed in favor of a bit of unneeded performance.
Some of your code might need regular updates through the _process() function, but if it’s CPU/GPU intensive, you can choose to not run it every single frame
5.3 Improving the rest of the experience to compensate for poor FPS
Playing early iterations of my game on Nintendo Switch was a terrible experience (it still isn’t great, don’t get me wrong, there’s a lot of room for improvement). I blamed it on the poor framerate (and that was a major component of course), but that prevented me from seeing other issues. The controls in particular were awful. They compounded the effect of the poor performance: for example, targeting an enemy card with an attack required me to navigate through all other cards on the way there before I could target the enemy. This meant 3 to 5 unnecessary clicks on the gamepad to reach my target.
I quickly figured out that limiting navigation to only valid targets was a great improvement of the experience. That didn’t improve the framerate, but because less interactions were needed to reach my attack’s target, it made it more bearable.

Automatically choosing a default target, letting the user switch easily between valid targets, etc… better navigation can help compensate for the poor framerate
The bottom line is, simple improvements in the interface can help mitigate the performance issues, in many cases.
6. Use the Profiler
Using the profiler in Godot is actually the first piece of advice that everyone hears when they’re trying to optimize a game for Godot. I’m not putting it at the bottom of the list just to be a contrarian!
The profiler has its uses and has helped me find a few bottlenecks in my game. So I definitely recommend you use it as it can be of tremendous help.
However, I found that what was slow on my PC was not necessarily what was slow on the Nintendo Switch. For example, the profiler would probably never have given me the idea to lower my game’s resolution.
I’ve also have had a hard time reading some of its results sometimes… I did find a heavy function for which I ended up adding some caching, thanks to the profiler, but in hindsight I feel I stumbled upon it a bit by luck: there was nothing in the profiler that told me the function was slow or CPU intensive (But this might be me misinterpreting how to read the profiler’s results). The frame usage of that specific call was actually 0% according to the profiler. What caught my attention was that it was being called thousands of times per frame when I thought that number should be closer to 10.
So I’m not saying the profiler is useless (definitely not), but in the case of the Nintendo Switch, I can’t say it has helped me find the real bottlenecks that were impacting the console experience.
If you plan to use the profiler for Nintendo Switch Homebrew development (I don’t know what that looks like for official developers), one approach is to try and run Godot on an old computer with a crappy hard drive and GPU, to see if you can catch similarities.
7. Consideration on compressed textures
My game makes heavy use of textures with card images. (And by the way textures should be cached in your code, as you can’t assume that Godot will cache them for you in particular if they are loaded from external files.)
Your game might also end up using a lot of images. For everything that’s loaded internally (or via ZIP or PCK resource files), Godot imports images as textures, that are “optimized” for your use case. that “optimization” is up to you, and potentially a process that most people ignore or gloss over. I certainly was clueless about it.
But it turns out that you can (and should) tell Godot how to import these textures and how they will be used in game. There are lots of optimizations, from compression to how and where they get loaded at runtime (e.g. using Video RAM or not). Some of these optimizations are not compatible with the Nintendo Switch’s Homebrew port, others might make it faster to load textures by leveraging the GPU of the console.

There are lots of ways to tell Godot how to use textures once loaded/imported
I have played a bit with some of those settings, and, for my use case, I haven’t found any optimization that seemed significantly faster than the others. Whether I load images from the hard drive, or optimized textures from Video RAM, I have not seen an impact on framerate (it seems however pretty obvious at this point that my game is limited more by the CPU than the GPU so that might be the reason I’m not seeing significant improvements here).
I have, however, seen an impact on the overall size of the files, with my currently “optimized” textures roughly 10 times smaller than their equivalent png counterparts. That’s 5MB for about 250 cards, compared to 50 MB for the same images as png files on my hard drive.
Compressed/Video Ram textures didn’t help performance in my use case, and only helped with assets size on the disk. YMMV
Conclusion
Your homebrew game might run like a potato on your first attempt on the Nintendo Switch, and that might be discouraging. But I personally went from roughly 10FPS to about 20-25FPS with a few “low hanging fruit” changes as described above. Not all were super easy to implement, but they have considerably improved the framerate of the game without any massive changes to the code or the gaming experience.
This is what the first release of my game looked like, after these optimizations. Not perfect of course, but now playable.

Super happy to see you posting again. I enjoyed this read as well.
Welcome back!
First off, Welcome back!!!!! While I haven’t programmed anything since playing with an original Arduino board back in college (I made an automatic cat treat feeder) and I’ve never done any kind of homebrew, I enjoyed this article. It’s awesome how you’re always willing to share you knowledge and take us through your process. And again, glad to see you back posting.