Wednesday, July 17, 2019

OpenGL debugging under GNU/Linux.

For debugging OpenGL under GNU/Linux, there are three tools that are very similar to each other. They can capture the stream of OpenGL commands, and let you examine the state at each command. These are the tools:


Renderdoc probably has the easiest interface, and is quite powerful in its capabilities. It lets you view both input and output meshes from your vertex shader, for instance. I do hit on what seems to be a bug in renderdoc if I use uniform buffers, though.

Intel Graphics Performance Analyzer

Intel GPA comes with a graphics frame analyzer. The capturing and analysis of a frame is done with two different commands, though. To capture, run gpa-monitor to launch your app and press Ctrl-Shift-C to capture a frame. Then quit the app and monitor, and run frame-analyzer.

NVidia Nsight Graphics

To capture with NVidia Nsight Graphics, do:

$ cd NVIDIA-Nsight-Graphics-2019.2/host/linux-desktop-nomad-x64/
$ ./nv-nsight-gfx

Choose "Quick Launch" and select "Capture for Live Analysis" in the nsight UI when your app is running.

Thursday, May 23, 2019

Do Space Aliens have Fiscal Quarters?

So, my Hexa Trains indie game features some outlandish planets with freaky colours. Obviously, they cannot represent Earth, as the continents are all wrong too. So, railroad building on Alien Worlds, it is then.

So, when the time came to implement loan interest for the game, it made me realize I need some time-accounting for my game. An extra-solar planet will still have years (as it orbits a star) and days (as it will likely spin as it orbits.) So years, and days. But what about monthly interest payments?

Well... there it breaks down. A random alien planet does not have months, and even if it does, there wouldn't be 12 of them. For starters, a random planet may or may not have moons. And if it does, the moon orbit will not be an integral number of times faster than the planet's orbit.

So no moon then. But what about seasons? Yes... our Alien Planet will most likely have seasons, because the spin axis is probably not aligned with the orbit axis. This means that there will be a spring, summer, autumn and winter. Nice!

And because our Alien Planet has four seasons, which are perfectly in sync with the orbit (exactly 4 seasons per year) we can reasonably assume that our aliens are familiar with dividing their year into four equal parts.

And there you have it, ladies and gentlemen... Our Aliens will probably track corporate performance on a quarterly basis. And hence, will have fiscal quarters.

So, this means that I will implement my economic simulation in such a way that interest on debts is charged four times a year, and revenue/investments/operational-costs/assets are tracked on a quarterly basis.

Wednesday, May 1, 2019

My magic bullet for writing multi-threaded code.

When doing my game development there is often a task or two that are just a little too computationally intensive for a smooth framerate. And I put my framerate targets very ambitious: typically 120Hz for proper operation on an iPad PRO. At 120Hz, there is only 8.33ms to do everything: rendering, simulation, ai, physics, etc. So the larger computations that threaten to exceed this, need to be moved off the main thread.

Examples off things that I have computed on their own threads, in my previous projects include: AI action planning, iso surface generation to simulate deformable terrain, crowd flow and path finding.

The biggest hurdle to take when doing multi threaded code, is to avoid race conditions. If two threads are writing to something, which write will persist? Or if a thread reads and another writes, will the reader see the old or the new value? Tricky stuff.

I've found a neat trick to make this whole MT programming thing a lot more manageable. And frankly, after repeatedly using it, I've come to regard it as some sort of magic bullet. In this blog post, I will explain my approach in the hope that it will be useful to other (game) developers.

So the mechanism of choice for me can be summed up in one sentence: concentrate all the synchronization + semaphores + condition variables in just one place: a thread-safe work queue.

This thread-safe queue is then used to set up a producer / consumer system for units of work. The producers live on the main thread and spec well defined pieces of work that need to be performed. Those work specs are put in the queue.

The consumers of these jobs live on worker threads, with one consumer for each thread, and often one thread per CPU core. The job is consumed and the worker thread goes to work, thereby performing the outsourced service that the producer on the main thread did not want to do itself.

Note that the task-queue consumers are the actual service-providers, and that the task-queue producers are the clients of these services that want the computation out-sourced to another thread.

To communicate back the results of the computations, I use nothing fancy. All I do is have the worker write a boolean in memory to signal that the specific work was done. This is not protected by any construct, because there is a well defined order in which things happen: The worker (and only the worker) writes the boolean. The client on the main thread (and only that client) reads the boolean to see if work is done. I do this polling once every simulation frame. Every 'entity' in the simulation that has work outstanding, checks the boolean for completion once per frame, and if it is set, it can safely, use the results that have been stored in main memory. If a client just misses the completion, no biggie! The next simulation frame, it will be picked up.

Note that the magic bullet comes with one big drawback, which may or may not be a big deal in your personal case: You lose determinism. But frankly, deterministic code is elusive anyway. For instance, there is hidden precision in the FPU registers that may be set randomly, so in practice deterministic floating point code is not possible anyway.

The actual implementation of a thread safe queue is beyond the scope of this article, but it involves one mutex, and two condition variables. One condvar to signal that the queue is not empty (wakes up consumers) and one condvar to signal that the queue is not full (wakes up producers.) Of course the queue depth needs to be large enough so that it is never full, because a full queue would temporarily freeze your main thread.

Finally, I want to stress that this approach does not absolve you from being careful. You still need to make sure that a thread is not overwriting nilly willy in the main memory. But the fact that you can safely communicate the work that is req'd and the work that is completed, is at least half the battle.

Tuesday, February 12, 2019


After a successful exploration of unity earlier, I though I would test-drive Unreal Engine 4 to see what that is all about. Here are some random observations I made:

  • For linux, the engine and editor have to be built from source. Which is fine, but it does take an hour, and 60Gbyte of disk space.
  • First time starting the editor takes a very long time, and comes with scary warnings to boot.
  • Once you actually try to launch a sample, everything is very slow. Thousands of shaders need to be compiled.
  • Be careful with that quality slider. If you change it from "High" to "Medium" be prepared for another lengthy shader recompilation step.
  • The vulkan renderer would crash at startup with an out-of-memory error. But switching to OpenGL renderer, helped.
  • Out of box performance when launching the "Advanced Vehicle" template was dreadful. I estimate the FPS well below 10 for that. I need to figure out if this is a GPU or CPU bottleneck. Although performance is a lot better if I first quit the editor, and then start the demo application by itself.

Wednesday, February 6, 2019

Crashing apps.

I'm on a mission to reduce my Android game's crashes as much as possible. And it turns out it is incredibly hard to approach a zero crash rate.

First off, I would like you to consider this: If 1% of today's users of your app experienced a crash today, is that bad?

Well, according to Google's metrics system, it gets labelled 'bad' as soon as you exceed 1.09% of users. Which is kind of a weirdly specific number. I don't know where that number comes from.

I love the Google Play Developer Console. It's so much better than any other developer console I've used. These include Valve, Apple and Amazon's developer sites. Google Play defines the crash rate as follows: Crash Rate: Percentage of daily sessions during which your users experienced at least one crash. A daily session refers to a day during which your app was used.

So here you can see how my quest for zero crashes is progressing. The trend looks good:

Now the "Benchmark" is interesting... they compare the crash rate of your app, to a game-category. And the Google Play Developer Console interface let's you change it to a different benchmark. By comparing my game to all the different benchmarks, I could do a comparative study of crash rates in different categories! With out further ado, here's how likely a game of a given genre is to crash:

One thing that stands out to me, is how high these levels are. Crashes in Android apps are actually quite common. If I had to guess, a lot of it could be attributed to fragmentation of the Android devices. So much diversity, and limited means of testing them. Although I have to give a shout-out to the incredibly useful "Pre Launch Reports" that Google offers. The team that does the Developer Console is one hell of a team.

Another source of frustration is the dubious quality of Android's frameworks. Especially the NDK native frameworks. If your app crashes beyond your control in a buggy In-App-Billing framework, or Google-Play-Games framework, there is little you can do.

Still, it is an interesting experiment to see how low I can get my crash rate. Stay tuned for updates on my progress towards my goal of zero crashes.

Thursday, January 31, 2019


This call stack, captured from Google Play for one of my apps, shows what's fundamentally wrong with Android.

The app didn't even finish launching, and this is how deep the complexity runs?

  #00  pc 00000000000264f4  /system/lib/ (hb_ot_layout_has_glyph_classes+28)
  #01  pc 0000000000049af0  /system/lib/ (_hb_ot_shape+192)
  #02  pc 000000000001e008  /system/lib/ (hb_shape_plan_execute+140)
  #03  pc 000000000001db30  /system/lib/ (hb_shape+84)
  #04  pc 0000000000011b05  /system/lib/ (minikin::Layout::doLayoutRun(unsigned short const*, unsigned int, unsigned int, unsigned int, bool, minikin::MinikinPaint const&, minikin::StartHyphenEdit, minikin::EndHyphenEdit)+3780)
  #05  pc 0000000000013991  /system/lib/ (minikin::LayoutCacheKey::doLayout(minikin::Layout*, minikin::MinikinPaint const&) const+148)
  #06  pc 0000000000010bf5  /system/lib/ (void minikin::LayoutCache::getOrCreate(minikin::U16StringPiece const&, minikin::Range const&, minikin::MinikinPaint const&, bool, minikin::StartHyphenEdit, minikin::EndHyphenEdit, minikin::LayoutAppendFunctor&)+468)
  #07  pc 0000000000010717  /system/lib/ (minikin::Layout::doLayoutWord(unsigned short const*, unsigned int, unsigned int, unsigned int, bool, minikin::MinikinPaint const&, unsigned int, minikin::StartHyphenEdit, minikin::EndHyphenEdit, minikin::LayoutPieces const*, minikin::Layout*, float*, minikin::MinikinExtent*, minikin::MinikinRect*, minikin::LayoutPieces*)+234)
  #08  pc 000000000000fcd5  /system/lib/ (minikin::Layout::doLayoutRunCached(minikin::U16StringPiece const&, minikin::Range const&, bool, minikin::MinikinPaint const&, unsigned int, minikin::StartHyphenEdit, minikin::EndHyphenEdit, minikin::LayoutPieces const*, minikin::Layout*, float*, minikin::MinikinExtent*, minikin::MinikinRect*, minikin::LayoutPieces*)+1160)
  #09  pc 0000000000010609  /system/lib/ (minikin::Layout::measureText(minikin::U16StringPiece const&, minikin::Range const&, minikin::Bidi, minikin::MinikinPaint const&, minikin::StartHyphenEdit, minikin::EndHyphenEdit, float*, minikin::MinikinExtent*, minikin::LayoutPieces*)+1292)
  #10  pc 000000000022ca17  /system/lib/ (android::MinikinUtils::measureText(android::Paint const*, minikin::Bidi, android::Typeface const*, unsigned short const*, unsigned int, unsigned int, unsigned int, float*)+82)
  #11  pc 00000000000e1cd5  /system/lib/ (android::PaintGlue::getRunAdvance___CIIIIZI_F(_JNIEnv*, _jclass*, long long, _jcharArray*, int, int, int, int, unsigned char, int)+204)
  #12  pc 0000000000a4758b  /system/framework/arm/boot-framework.oat (
  #13  pc 0000000000a48e0d  /system/framework/arm/boot-framework.oat (
  #14  pc 0000000000a48cab  /system/framework/arm/boot-framework.oat (
  #15  pc 0000000001482e99  /system/framework/arm/boot-framework.oat (android.text.TextLine.getRunAdvance+128)
  #16  pc 0000000001483d89  /system/framework/arm/boot-framework.oat (android.text.TextLine.handleText+312)
  #17  pc 0000000001483367  /system/framework/arm/boot-framework.oat (android.text.TextLine.handleRun+566)
  #18  pc 0000000001484f49  /system/framework/arm/boot-framework.oat (android.text.TextLine.measure+216)
  #19  pc 0000000001485bcd  /system/framework/arm/boot-framework.oat (android.text.TextLine.metrics+44)
  #20  pc 00000000017ac2c9  /system/framework/arm/boot-framework.oat (android.text.BoringLayout.isBoring+384)
  #21  pc 0000000001af2d8f  /system/framework/arm/boot-framework.oat (android.widget.TextView.onMeasure+430)
  #22  pc 000000000180ff23  /system/framework/arm/boot-framework.oat (android.view.View.measure+826)
  #23  pc 0000000001a7c3bb  /system/framework/arm/boot-framework.oat (android.view.ViewGroup.measureChildWithMargins+178)
  #24  pc 0000000001c900a1  /system/framework/arm/boot-framework.oat (android.widget.LinearLayout.measureChildBeforeLayout+64)
  #25  pc 0000000001c904ed  /system/framework/arm/boot-framework.oat (android.widget.LinearLayout.measureHorizontal+1060)
  #26  pc 0000000001c91f41  /system/framework/arm/boot-framework.oat (android.widget.LinearLayout.onMeasure+64)
  #27  pc 0000000001f4f553  /system/framework/arm/boot-framework.oat (
  #28  pc 000000000180ff23  /system/framework/arm/boot-framework.oat (android.view.View.measure+826)
  #29  pc 0000000001d3747d  /system/framework/arm/boot-framework.oat (android.widget.ScrollView.measureChildWithMargins+228)
  #30  pc 0000000001c87141  /system/framework/arm/boot-framework.oat (android.widget.FrameLayout.onMeasure+296)
  #31  pc 0000000001d384f7  /system/framework/arm/boot-framework.oat (android.widget.ScrollView.onMeasure+46)
  #32  pc 000000000180ff23  /system/framework/arm/boot-framework.oat (android.view.View.measure+826)
  #33  pc 0000000001f4eaf3  /system/framework/arm/boot-framework.oat (
  #34  pc 0000000001f4f13f  /system/framework/arm/boot-framework.oat (
  #35  pc 000000000180ff23  /system/framework/arm/boot-framework.oat (android.view.View.measure+826)
  #36  pc 0000000001a7c3bb  /system/framework/arm/boot-framework.oat (android.view.ViewGroup.measureChildWithMargins+178)
  #37  pc 0000000001c87141  /system/framework/arm/boot-framework.oat (android.widget.FrameLayout.onMeasure+296)
  #38  pc 000000000180ff23  /system/framework/arm/boot-framework.oat (android.view.View.measure+826)
  #39  pc 0000000001a7c3bb  /system/framework/arm/boot-framework.oat (android.view.ViewGroup.measureChildWithMargins+178)
  #40  pc 0000000001c87141  /system/framework/arm/boot-framework.oat (android.widget.FrameLayout.onMeasure+296)
  #41  pc 000000000180ff23  /system/framework/arm/boot-framework.oat (android.view.View.measure+826)
  #42  pc 0000000001a7c3bb  /system/framework/arm/boot-framework.oat (android.view.ViewGroup.measureChildWithMargins+178)
  #43  pc 0000000001c87141  /system/framework/arm/boot-framework.oat (android.widget.FrameLayout.onMeasure+296)
  #44  pc 0000000001d65865  /system/framework/arm/boot-framework.oat (
  #45  pc 000000000180ff23  /system/framework/arm/boot-framework.oat (android.view.View.measure+826)
  #46  pc 00000000018368f7  /system/framework/arm/boot-framework.oat (android.view.ViewRootImpl.performMeasure+246)
  #47  pc 00000000018353ed  /system/framework/arm/boot-framework.oat (android.view.ViewRootImpl.measureHierarchy+500)
  #48  pc 000000000183733b  /system/framework/arm/boot-framework.oat (android.view.ViewRootImpl.performTraversals+2330)
  #49  pc 000000000183dbb5  /system/framework/arm/boot-framework.oat (android.view.ViewRootImpl.doTraversal+188)
  #50  pc 0000000000d5e83d  /system/framework/arm/boot-framework.oat (android.content.ContextWrapper.getBasePackageName [DEDUPED]+52)
  #51  pc 0000000001515df7  /system/framework/arm/boot-framework.oat (android.view.Choreographer.doCallbacks+966)
  #52  pc 0000000001516675  /system/framework/arm/boot-framework.oat (android.view.Choreographer.doFrame+1396)
  #53  pc 00000000017d6413  /system/framework/arm/boot-framework.oat (android.view.Choreographer$
  #54  pc 000000000130e659  /system/framework/arm/boot-framework.oat (android.os.Handler.dispatchMessage+64)
  #55  pc 0000000001313e8b  /system/framework/arm/boot-framework.oat (android.os.Looper.loop+1162)
  #56  pc 0000000000c905d3  /system/framework/arm/boot-framework.oat (
  #57  pc 0000000000417575  /system/lib/ (art_quick_invoke_stub_internal+68)
  #58  pc 00000000003f125f  /system/lib/ (art_quick_invoke_static_stub+222)
  #59  pc 00000000000a1043  /system/lib/ (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+154)
  #60  pc 000000000035093d  /system/lib/ (art::(anonymous namespace)::InvokeWithArgArray(art::ScopedObjectAccessAlreadyRunnable const&, art::ArtMethod*, art::(anonymous namespace)::ArgArray*, art::JValue*, char const*)+52)
  #61  pc 0000000000351d85  /system/lib/ (art::InvokeMethod(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, _jobject*, _jobject*, unsigned int)+960)
  #62  pc 0000000000302b15  /system/lib/ (art::Method_invoke(_JNIEnv*, _jobject*, _jobject*, _jobjectArray*)+40)
  #63  pc 00000000006ab577  /system/framework/arm/boot-core-oj.oat (java.lang.Class.getDeclaredMethodInternal [DEDUPED]+110)
  #64  pc 000000000165417b  /system/framework/arm/boot-framework.oat ($
  #65  pc 000000000165da49  /system/framework/arm/boot-framework.oat (
  #66  pc 0000000000417575  /system/lib/ (art_quick_invoke_stub_internal+68)
  #67  pc 00000000003f125f  /system/lib/ (art_quick_invoke_static_stub+222)
  #68  pc 00000000000a1043  /system/lib/ (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+154)
  #69  pc 000000000035093d  /system/lib/ (art::(anonymous namespace)::InvokeWithArgArray(art::ScopedObjectAccessAlreadyRunnable const&, art::ArtMethod*, art::(anonymous namespace)::ArgArray*, art::JValue*, char const*)+52)
  #70  pc 0000000000350759  /system/lib/ (art::InvokeWithVarArgs(art::ScopedObjectAccessAlreadyRunnable const&, _jobject*, _jmethodID*, std::__va_list)+304)
  #71  pc 000000000029493d  /system/lib/ (art::JNI::CallStaticVoidMethodV(_JNIEnv*, _jclass*, _jmethodID*, std::__va_list)+476)
  #72  pc 000000000006d819  /system/lib/ (_JNIEnv::CallStaticVoidMethod(_jclass*, _jmethodID*, ...)+28)
  #73  pc 000000000006f9cf  /system/lib/ (android::AndroidRuntime::start(char const*, android::Vector const&, bool)+466)
  #74  pc 0000000000001b21  /system/bin/app_process32 (main+880)
  #75  pc 00000000000a2b3d  /system/lib/ (__libc_init+48)
  #76  pc 000000000000176f  /system/bin/app_process32 (_start_main+46)
  #77  pc 0000000000019da9  /system/bin/linker (__dl__ZN6soinfoD1Ev+16)
  #78  pc 00000000007fea64  [stack:ff3b8000]

Wednesday, January 16, 2019


Two days ago, I started using the Unity3D game development tool. Interestingly, Unity3D is the most pervasive tool in the Game Dev industry... it is everywhere. I've been making games since 1982, but have never used it before.

I'm one of those dinosaurs keeping to the old way of doing things, and making my own game engines. Think of a manual 3-gear transmission Willys Jeep instead of heated seats, 7 cup-holder, navsat SUV.

So far, I like it better than I thought I would. In this blog article, I will collect some gotchas, insights, surprises that I will undoubtedly encounter.

1. Project loads are slow. Once inside the editor, make sure you stop "Playback" before editing anything... all edits during playback will be lost.

2. Unity uses degrees, not radians for angles.

3. You can only use const for compile-time constants. Not for run-time constants, like C and C++.

4. Unity physics is a wrapper over PhysX. Neither uses ERP and CFM like OpenDE and Bullet Physics do.

5. Unity debug output prints vectors with low precision. So (0.0, 0.3, 1.0) can actually be unit length!

6. My bicycle wheel would not spin faster than 7 radians per second, no matter how much torque I applied to it. It turns out that there is a rigid body property that caps it.

7. There is no include directive in the C# language. Instead, files that using the same namespace will automagically find each other's definitions.

8. Centre of mass and inertia tensor are automatically computed from the collision geometry! Beware of this: simulation results will be different just by changing collision geometry, unless you set c.o.m. and inertia explicitly.

9. Monobehaviour.Awake() is called in random order for the Gameobjects. Ugh.

Saturday, January 12, 2019

Sprinkle, Sprinkle, Little Star.

I released a new software application. It is named "Sprinkle, Sprinkle, Little Star" and is available for free for Windows and Linux.

It is a 2D simulation of Gravity acting on a galaxy of stars. It is completely interactive, and you can paint your own galaxy onto a grid, and see the galaxy evolve under Newton's law for gravity.

I recorded a video of myself with a play-though.

I started it during the holiday break, and had fun implementing it. The so called N-Body problem is pretty hard to solve interactively for large numbers of stars, because of its O(N*N) nature: Each star is influenced by each other stars. This means calculating 100M distances if you have 10,000 stars. To do it efficiently, I had to aggregate stars at larger distances. In addition to this, I vectorized my code with AVX intrinsics, that calculates 8 forces in a single go using SIMD. Because my compiler did such a bad job on the scalar code, the speed up was even better than 8, I saw a x18 increase in framerate (which also included rendering, so the computation speed up was even a little more than 18x.) Below is the spatial structure I use to aggregate stars at large distances.