Tuesday, September 17, 2019

Sanitizing addresses of your Android NDK application.

In my attempt to squash all bugs, I ran my Android app with clang's address sanitizer. Because the steps are not trivial, I thought I would documents the process here.

This recipe makes some assumptions, though:

  • I'm on Linux.
  • I use AndroidStudio with CMake.
  • I run on the Android emulator. (A real device shouldn't be much different.)

First, we need to pass the -fsanitize=address flag to both the compiler and the linker. To compile your C/C++ code with this flag, use this in your CMakefile

target_compile_options(foo PRIVATE
        "$<$<CONFIG:DEBUG>:-fsanitize=address>"
)

And to link, add "-fsanitize=address" to your target_link_libraries.

Ok, now your shared-object has been built with the address sanitizer. But if you try to run it, you will see: dlopen failed: library "libclang_rt.asan-i686-android.so" not found.

Luckily, the Android NDK comes with the correct libraries, you just need to package them with your build. First locate where this library is on your system, doing:

$ locate libclang_rt.asan-i686-android.so
/home/bram/Android/Sdk/ndk-bundle/toolchains/llvm/prebuilt/linux-x86_64/lib64/clang/8.0.7/lib/linux/libclang_rt.asan-i686-android.so

You simply need to copy this alongside your own shared object file, as generated by cmake. In my case, it goes in: ./app/build/intermediates/cmake/debug/obj/x86/libclang_rt.asan-i686-android.so

If you now build and run your app on the emulator, your addresses will get sanitized. In my first run, it found a free() of a pointer that was never malloc()-ed in the Google Play Games SDK.

Note that if you want to do this on a ARM device, you would need libclang_rt.asan-arm-android.so and on a 64-bit ARM device, look for libclang_rt.asan-aarch64-android.so instead.

Thursday, September 12, 2019

Keeping user directory names out of your binary.

So, if you use ASSERT macros a lot, you typically want to point to the offending line of code, and use __LINE__ and __BASE_FILE__ macros for that.

Now, where it gets tricky: VisualC does not have __BASE_FILE__ so what I did was use __FILE__ instead. But that has the nasty side effect of putting your home directory in your build, like so: C:\Users\bram\src\.. which is bad practice.

Stackoverflow.com to the rescue. I found this neat solution for it.



UPDATE

Gregory warned about this affecting parallel builds. Personally, I do see all cores used for my single project solution, but your mileage may vary. An alternate approach would be to ditch VisualC compiler, and use clang instead.

Monday, September 2, 2019

High volume, low friction art work for video games.

I am a one-man indie game studio (I like to use the term INDIEvidual.) The lack of employees means that I need to shoulder all the tasks, including art work. With so many tasks, I can not afford to spend a lot of time on it. Here are some tricks to make it manageable.

So many pictograms!

One tedious type of Art related tasks, is creating pictograms for Achievements, or Leader-boards. Here you see some (very minimal) pictograms for four such achievements. To illustrate the volume of work involved, imagine that the game has 100 achievements. How is a part-time artists supposed to generate the 200 images required for this?

It starts with the right Art style. Steam wants these in 64x64 pixels, and believe me, you do not want to do this art as pixel-art in Photoshop. Instead, it will have to be vector art, which is rendered at the required 64x64 resolution. Additionally, this will have to be low-poly vector art. I also chose a two-colour palette to reduce the work even more.

Ok, fine, now it comes down to generate vector images in low-poly style, and just two colours. Massively easier, but still a crazy amount of work.

This requires a ruthlessly reduced work-flow.

Open Source to the rescue, and more specifically, Inkscape to the rescue.

With Inkscape, we can put all our pictograms in just one SVG file, each pictogram in a separate layer. Never opening/closing/copying files, just work in that one achievements.svg file, and create layers.

With everything in one file, Inkscape will let you invoke an export to PNG from the command-line, and pick each layer in isolation for each export command. This is achieved with the --export-id= flag and the --export-id-only flag.

To identify each layer, we need to name the layer object, and set its ID. Unfortunately, renaming the layer is not enough, you need to use the XML-edit widget in Inkscape to actually set the ID of the layer object, as shown below.

Command Line Conversions

For the actual generation of the PNG files, we can leverage a UNIX-style Makefile like so:
IMAGES = \
 a_pentdrive.png \
 a_coaster.png \
 a_millennialbuilder.png \
 a_eagerlearner.png \

all: $(IMAGES)


%.png: achievements.svg
 inkscape --export-id=$(basename $@) --export-id-only --export-png=$@ --export-area-page --export-width=64 --export-height=64 $<
 convert -colorspace Gray $@ lo-$@

In this example, a single SVG file with 4 layers is used as input, and the Makefile will generate 4 pictograms as PNG, and 4 additional grey-scale pictograms to serve as the images for uncompleted achievements. Note that to generate the grey-scale, the Makefile uses the Imagemagick convert command.

Before adopting this workflow, I would export the PNGs using the Inkscape GUI. Sure, doable if you have 4 achievements, but now, with this streamlined workflow, I can pump out a 100 achievements, as 200 images, in a single day, if I have to. Just draw them all in a crude Art style, list them in the Makefile, and run a single make command. And now, the bottleneck (apart from the drawing of the pictograms) is the actual uploading to the steam portal, as I still have to do that as one image at a time.

Future Work

It would make sense to auto-generate the IMAGES list in the Makefile from the SVG itself. Just search the SVG for layer objects. I should also look into the Steam portal, if it allows me to batch-upload the pictograms somehow, without going through the WEB interface.

Wednesday, July 17, 2019

OpenGL debugging under GNU/Linux.

For debugging OpenGL under GNU/Linux, there are three tools that are very similar to each other. They can capture the stream of OpenGL commands, and let you examine the state at each command. These are the tools:

Renderdoc

Renderdoc probably has the easiest interface, and is quite powerful in its capabilities. It lets you view both input and output meshes from your vertex shader, for instance. I do hit on what seems to be a bug in renderdoc if I use uniform buffers, though.

Intel Graphics Performance Analyzer

Intel GPA comes with a graphics frame analyzer. The capturing and analysis of a frame is done with two different commands, though. To capture, run gpa-monitor to launch your app and press Ctrl-Shift-C to capture a frame. Then quit the app and monitor, and run frame-analyzer.

NVidia Nsight Graphics

To capture with NVidia Nsight Graphics, do:

$ cd NVIDIA-Nsight-Graphics-2019.2/host/linux-desktop-nomad-x64/
$ ./nv-nsight-gfx

Choose "Quick Launch" and select "Capture for Live Analysis" in the nsight UI when your app is running.

Thursday, May 23, 2019

Do Space Aliens have Fiscal Quarters?

So, my Hexa Trains indie game features some outlandish planets with freaky colours. Obviously, they cannot represent Earth, as the continents are all wrong too. So, railroad building on Alien Worlds, it is then.

So, when the time came to implement loan interest for the game, it made me realize I need some time-accounting for my game. An extra-solar planet will still have years (as it orbits a star) and days (as it will likely spin as it orbits.) So years, and days. But what about monthly interest payments?

Well... there it breaks down. A random alien planet does not have months, and even if it does, there wouldn't be 12 of them. For starters, a random planet may or may not have moons. And if it does, the moon orbit will not be an integral number of times faster than the planet's orbit.

So no moon then. But what about seasons? Yes... our Alien Planet will most likely have seasons, because the spin axis is probably not aligned with the orbit axis. This means that there will be a spring, summer, autumn and winter. Nice!

And because our Alien Planet has four seasons, which are perfectly in sync with the orbit (exactly 4 seasons per year) we can reasonably assume that our aliens are familiar with dividing their year into four equal parts.

And there you have it, ladies and gentlemen... Our Aliens will probably track corporate performance on a quarterly basis. And hence, will have fiscal quarters.

So, this means that I will implement my economic simulation in such a way that interest on debts is charged four times a year, and revenue/investments/operational-costs/assets are tracked on a quarterly basis.

Wednesday, May 1, 2019

My magic bullet for writing multi-threaded code.

When doing my game development there is often a task or two that are just a little too computationally intensive for a smooth framerate. And I put my framerate targets very ambitious: typically 120Hz for proper operation on an iPad PRO. At 120Hz, there is only 8.33ms to do everything: rendering, simulation, ai, physics, etc. So the larger computations that threaten to exceed this, need to be moved off the main thread.

Examples off things that I have computed on their own threads, in my previous projects include: AI action planning, iso surface generation to simulate deformable terrain, crowd flow and path finding.

The biggest hurdle to take when doing multi threaded code, is to avoid race conditions. If two threads are writing to something, which write will persist? Or if a thread reads and another writes, will the reader see the old or the new value? Tricky stuff.

I've found a neat trick to make this whole MT programming thing a lot more manageable. And frankly, after repeatedly using it, I've come to regard it as some sort of magic bullet. In this blog post, I will explain my approach in the hope that it will be useful to other (game) developers.

So the mechanism of choice for me can be summed up in one sentence: concentrate all the synchronization + semaphores + condition variables in just one place: a thread-safe work queue.

This thread-safe queue is then used to set up a producer / consumer system for units of work. The producers live on the main thread and spec well defined pieces of work that need to be performed. Those work specs are put in the queue.

The consumers of these jobs live on worker threads, with one consumer for each thread, and often one thread per CPU core. The job is consumed and the worker thread goes to work, thereby performing the outsourced service that the producer on the main thread did not want to do itself.

Note that the task-queue consumers are the actual service-providers, and that the task-queue producers are the clients of these services that want the computation out-sourced to another thread.

To communicate back the results of the computations, I use nothing fancy. All I do is have the worker write a boolean in memory to signal that the specific work was done. This is not protected by any construct, because there is a well defined order in which things happen: The worker (and only the worker) writes the boolean. The client on the main thread (and only that client) reads the boolean to see if work is done. I do this polling once every simulation frame. Every 'entity' in the simulation that has work outstanding, checks the boolean for completion once per frame, and if it is set, it can safely, use the results that have been stored in main memory. If a client just misses the completion, no biggie! The next simulation frame, it will be picked up.

Note that the magic bullet comes with one big drawback, which may or may not be a big deal in your personal case: You lose determinism. But frankly, deterministic code is elusive anyway. For instance, there is hidden precision in the FPU registers that may be set randomly, so in practice deterministic floating point code is not possible anyway.

The actual implementation of a thread safe queue is beyond the scope of this article, but it involves one mutex, and two condition variables. One condvar to signal that the queue is not empty (wakes up consumers) and one condvar to signal that the queue is not full (wakes up producers.) Of course the queue depth needs to be large enough so that it is never full, because a full queue would temporarily freeze your main thread.

Finally, I want to stress that this approach does not absolve you from being careful. You still need to make sure that a thread is not overwriting nilly willy in the main memory. But the fact that you can safely communicate the work that is req'd and the work that is completed, is at least half the battle.

Tuesday, February 12, 2019

Unreal

After a successful exploration of unity earlier, I though I would test-drive Unreal Engine 4 to see what that is all about. Here are some random observations I made:

  • For linux, the engine and editor have to be built from source. Which is fine, but it does take an hour, and 60Gbyte of disk space.
  • First time starting the editor takes a very long time, and comes with scary warnings to boot.
  • Once you actually try to launch a sample, everything is very slow. Thousands of shaders need to be compiled.
  • Be careful with that quality slider. If you change it from "High" to "Medium" be prepared for another lengthy shader recompilation step.
  • The vulkan renderer would crash at startup with an out-of-memory error. But switching to OpenGL renderer, helped.
  • Out of box performance when launching the "Advanced Vehicle" template was dreadful. I estimate the FPS well below 10 for that. I need to figure out if this is a GPU or CPU bottleneck. Although performance is a lot better if I first quit the editor, and then start the demo application by itself.