Tuesday, January 12, 2021

CUDA on an RTX3070, in an nutshell.

CUDA terminology can overwhelm you. SM, Warp, Core, Thread, Block, Scheduler, Grid. I have been writing CUDA kernels for my RTX 3070, and I thought I would write down what I learned.

First things first: when you write a compute kernel, you write scalar code. So in my case, doing Photon Mapping, my kernel code handles a single ray, and one primitive at a time.

Compare this with writing SIMD CPU code: you explicitly work on arrays, writing vector code, not scalar code. In single-precision floating point operations, that means a vector of 8 (AVX) values at a time, or 16 (AVX512) values at a time.

This does not mean that the hardware executes this as scalar code, though. The GPU will execute your code in batches of 32. This batch is called a warp. Theoretically, warpsizes could be something different than 32, but for GeForces, they are always 32. So in a way, what the CPU does explicitly, 8- or 16-wide, the GPU does 32-wide, implicitly. Warps are executed (by 32 CUDA cores) in sync, much like the lanes of a SIMD register in a CPU. (Caveat emptor: only half the CUDA cores of Ampere GPUs can do integer operations, so throughput is halved for that.)

The workhorses inside an Ampere GPU, are the Steaming Multiprocessors, or SM for short. An Ampere GPU, like the RTX 3070, has SMs that can execute 4 of these warps at the same time. To do this, it has 4 schedulers per SM.

Let's go back to the software side: your kernel. If you have a million photons that you want to trace, your kernel will be executed a million times. In CUDA terminology, there are a million threads of execution. And threads are grouped in blocks. All the blocks together, is the grid. And every block will be assigned to an SM for computation.

When a warp is scheduled and running, the 32 threads in a warp could diverge. If one of the threads is blocked waiting on something, the whole warp is blocked. This is bad. But luckily, the scheduler will intervene, and switch out the warp for a non-blocked warp. Each of the 4 schedulers in a SM can keep up to 12 warps in flight. Often there will be at least one warp ready to run. The net effect is that the latencies are hidden.

When a scheduler has not a single warp that is ready to advance, compute throughput is lost. The NSIGHT Compute profiling tool can detect how often this happens for your kernel.

Even though your kernel is written in scalar form, each iteration of your kernel is still responsible for run-time selecting the right work! That is why nearly every CUDA program will contain the following code:

const uint32_t index = blockIdx.x * blockDim.x + threadIdx.x;
foo = bar[ index ];

The upshot of all this, is that if your kernel does a = b + c then, provided the memory bandwidth is there, then every SM executes 4x32 = 128 instances of this, and with the 48 SMs of the 3070, that means 6144 instances. So 6144 b values are added to 6144 c values and assigned to 6144 a values.

Note that the best I could hope for on my Xeon W2140B workstation with AVX512 is 8 cores each computing 16 floats, which is 128 instances (or possibly double that, if both execution units can be kept fed.)

Let me conclude by showing what you can do with all the compute power. Here is a scene that is globally illuminated by 2M photons that bounced once. I can compute this in a handfull of milliseconds on the rtx 3070. Not bad!

Tuesday, January 5, 2021

Revisiting Android Studio

Yay, it is the year 12021 HE already. A palindrome year! And this is my first blog post for the year.

So you are still writing Objective-C? Not going with the times, my friend. Or you are still writing C/C++ Android apps? Again, you are not making it easy on yourself. But alas, you are in good company. You are in the company of me. Cheers!

Holding off on Kotlin and Swift can make it a little trickier. For instance, today's excercise is to install Android Studio from scratch on a new machine and build my C/C++ game (it has a little Java too.)

So, the download was easy enough. But I must say the Welcome Screen of Android Studio does not inspire confidence. This is what the welcome screen looks like, when resized. Oops.. not repainting the entire screen.

Seeing that this is cosmetic, let's glance over that and load up my Little Crane project. When I do this, after some loading and downloading, I am greeted by an error:

Gradle sync failed: Cause: com.android.builder.sdk.LicenceNotAcceptedException: Failed to install the following Android SDK packages as some licences have not been accepted.

Again, easily rectified, as the optional NDK needs to be installed first. For this, close the project, and start the SDK manager from the Welcome screen. In the SDK Manager, choose the SDK Tools tab, and select the latest NDK.

Let's try again! This time:

Gradle sync failed: Cause: executing external native build for cmake /home/bram/apps/LittleCrane/AndroidStudio/jni/CMakeLists.txt

Although it does not say so directly, my hunch is that this is a case of the cmake tool not being found, as opposed to something being wrong with my CMakeLists.txt file. Let's try installing cmake from the SDK Manager. I am prompted with two options for the cmake version, but let's try the newest: cmake 3.10.2

WARN - ues.SyncIssueUsageReporterImpl - Multiple sync failures reported. Discarding: SDK_BUILD_TOOLS_TOO_LOW 
WARN - ues.SyncIssueUsageReporterImpl - Multiple sync failures reported. Discarding: SDK_BUILD_TOOLS_TOO_LOW 
WARN - e.project.sync.GradleSyncState - Gradle sync failed: Cause: executing external native build for cmake /home/bram/apps/LittleCrane/AndroidStudio/jni/CMakeLists.txt

Consult IDE log for more details (Help | Show Log) (1 s 373 ms) 
With that IDE log being:
    2021-01-05 11:09:09,260 [ 387004]   INFO - .project.GradleProjectResolver - Gradle project resolve error 
org.gradle.tooling.BuildActionFailureException: The supplied phased action failed with an exception.
	at org.gradle.tooling.internal.consumer.connection.PhasedActionAwareConsumerConnection.run(PhasedActionAwareConsumerConnection.java:58)
	at org.gradle.tooling.internal.consumer.connection.ParameterValidatingConsumerConnection.run(ParameterValidatingConsumerConnection.java:62)
	at org.gradle.tooling.internal.consumer.DefaultPhasedBuildActionExecuter$1.run(DefaultPhasedBuildActionExecuter.java:78)
	at org.gradle.tooling.internal.consumer.DefaultPhasedBuildActionExecuter$1.run(DefaultPhasedBuildActionExecuter.java:70)
	at org.gradle.tooling.internal.consumer.connection.LazyConsumerActionExecutor.run(LazyConsumerActionExecutor.java:87)
	at org.gradle.tooling.internal.consumer.connection.CancellableConsumerActionExecutor.run(CancellableConsumerActionExecutor.java:45)
	at org.gradle.tooling.internal.consumer.connection.ProgressLoggingConsumerActionExecutor.run(ProgressLoggingConsumerActionExecutor.java:61)
	at org.gradle.tooling.internal.consumer.connection.RethrowingErrorsConsumerActionExecutor.run(RethrowingErrorsConsumerActionExecutor.java:38)
	at org.gradle.tooling.internal.consumer.async.DefaultAsyncConsumerActionExecutor.lambda$run$0(DefaultAsyncConsumerActionExecutor.java:55)
	at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
	at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:48)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:56)
	at java.lang.Thread.run(Thread.java:748)

I think it is time to call in the cavalry at stackoverflow for this one. The highest-voted suggestion? Build -> Refresh Linked C++ Projects is greyed-out in my menu. Probably because it was never created in the first place?

At this point, I am tempted to go back to my old machine, with an old copy of Android Studio, and just make sure I do not update it. Because it did work when I last used it, I'm guessing a year ago or so?

I notice that SDK Command Line Tools are not installed. Maybe that's the cause? Adding Android SDK Command-line Tools 4.0.0rc1 to see if that's it. But sadly, no.

Ok, maybe we need to spec a higher Gradle version, per Gradle Plugin Release Notes?

Excellent. It still won't sync, but at least the newer Gradle will print out the exception that occurred! No version of NDK matched the requested version 21.0.6113669. Versions available locally: 22.0.7026061

    * Exception is:
Caused by: org.gradle.api.InvalidUserDataException: NDK not configured. Download it with SDK manager. Preferred NDK version is '21.0.6113669'. 
	at com.android.build.gradle.internal.cxx.model.TryCreateCxxModuleModelKt$tryCreateCxxModuleModel$1.invoke(TryCreateCxxModuleModel.kt:138)
	at com.android.build.gradle.internal.cxx.model.TryCreateCxxModuleModelKt.tryCreateCxxModuleModel(TryCreateCxxModuleModel.kt:148)
	at com.android.build.gradle.internal.cxx.model.TryCreateCxxModuleModelKt.tryCreateCxxModuleModel(TryCreateCxxModuleModel.kt:246)
	at com.android.build.gradle.internal.TaskManager.createExternalNativeBuildJsonGenerators(TaskManager.java:1385)
	at com.android.build.gradle.internal.ApplicationTaskManager.createTasksForVariantScope(ApplicationTaskManager.java:183)
	at com.android.build.gradle.internal.VariantManager.createTasksForVariant(VariantManager.java:331)
	at com.android.build.gradle.internal.VariantManager.createVariantsAndTasks(VariantManager.java:207)
	at com.android.build.gradle.internal.plugins.BasePlugin.createAndroidTasks(BasePlugin.java:671)

Huh... why does it want NDK v21? I can't find any references in my project source insisting on NDK v21. I understand that the default NDK is determined by the Gradle plugin. So let's push the version nr for that up further, to latest: 4.1.0 instead.

Yes! Progress. Now: Minimum supported Gradle version is 6.5. Current version is 6.1.1.

And now the Gradle Sync is successful! Although it still want NDK v21 and not v22. I wonder why? Anyway, I have a new build!

Lessons learned:

  • Always set your gradle version to latest, before doing anything, in the build.gradle file. Mine is now set to 4.1.1. at the time of writing.
  • Copy over your debug.keystore from your old machine, because a fresh copy of Android Studio will create a new one, that is not registered in your Google Play developer console yet.

Monday, November 23, 2020


I've built a Xeon workstation, and had a lot of issues with the NVMe drive not being found by the BIOS. It would take roughly 3 reboots, before the WD SN550 would be found.

I've replaced that drive with a Crucial NVMe drive, and since I did that, the boot issue went away. Unfortunately, the Amazon return period has passed, so now I am test-driving the suspect WD SN550 drive as a secondary drive in a PCIe extension card.

With two drives in my system, I can do some comparative performance tests.

The Crucial P1 M2 2280 1000GB:
The Western Digital SN 550:

I will report back with an assessment of reliability as a non-boot drive, for the SN 550. It's read-performance does seem better than the Crucial drive. As I am using the Crucial as boot and root disk, I have not been able to compare the write performance between then. The Crucial does indeed feel slower.

UPDATE: The SN 550 is reliably detected as a 2nd drive by linux.

Thursday, November 12, 2020

Setting up a new Ubuntu box.

This is for my own benefit... when setting up a new Ubuntu distribution, adjust the following...

  • Choose minimal install, and the create an EFI partition, a Swap partition, and a Root partition.
  • After reboot, pin the terminal to the bar.
  • gsettings set org.gnome.desktop.interface enable-animations false
  • apt-get update ; apt-get dist-upgrade ; apt autoremove
  • Make it accessible: apt-get install openssh-server
  • Booting should not be silent, edit /etc/default/grub
  • Copy over .ssh/ directory from another machine, so that I have my ssh keys.
  • Copy over .vim/ directory from another machine, so that I have my vim setup.
  • Add to .bashrc file: export PKG_CONFIG_PATH=$HOME/lib/pkgconfig:/opt/lib/pkgconfig
  • Add repositories for dbgsym packages.
  • To allow fan control: sudo nvidia-xconfig -a --cool-bits=4
  • apt-get install psensor traceroute imagemagick
  • apt-get install vim git cmake clang-10 libsdl2-dev opengl-4-man-doc
  • apt-get install inkscape gimp wings3d
  • Install CUDA
  • Set git identity: git config --global user.email "EMAIL" ; git config --global user.name "NAME"
  • Get rid of Gnome's indexing by tracker.
  • Create a file /etc/modprobe.d/nsight containing: options nvidia "NVreg_RestrictProfilingToAdminUsers=0"

Thursday, November 5, 2020

Screen Recording and Ubuntu.

Recording gameplay videos on Ubuntu can be a big frustration. Here are some lessons learned.


Compositing window managers are garbage. Get rid of the default window manager 'Mutter' and go with 'Openbox' instead. GDM3 refuses to launch a different windowmanager, so replace that too, with lightdm. *sigh*

  $ sudo apt-get install lightdm
  $ sudo apt-get install openbox

Log off, select openbox, and log on again. Good riddance to the crapfest called compositing. You don't need it. Eye candy and transparent windows are for noobs.

Simple Screen Recorder

So far, the best results I get with Simple Screen Recorder, but with caveats: The docs say OpenGL recording is best. Well, it most certainly is not. It is choppy, glitchy and unreliable. It also causes OpenGL errors. Instead, record the window. Make sure to select superfast setting, otherwise the encoding will cause hitches in the framerate, especially so when trying to make 60Hz videos.

In the end, I got there, and managed to record gameplay for my video, I made with Openshot:

How Open Can You Make an Open World Game?

This is a one-man's quest on making an Open world game, with a capital-O.

It seems ages ago, that I was playing an Elder Scrolls game. It was big, and it was open. What impressed me most about it was the following:

With a hundred settlements on the map, and dozens of houses per settlement, you canvisit any one those settlements, enter a random house, and check out its book shelves. You could take a book from the shelves, and then, for instance, drop that book underneath a stairwell. Then, after doing all sorts of quests, fighting legions of foes, slaying of dragons, traveling far and wide, you could return to that house. And guess what? That book you dropped, at what seems to be ages ago, is still there under the stairwell.

In short, there is permanence in the world. A permanence of detail. (I remember reading that the developers did put some limits on that to cut down on the size of saved games, but in essence it did achieve this permanence.)

I thought about how to push this openness even further. What is not possible in that particular game, but would push the amount of influence a player has on a game world?


In real life, I can dig a ditch in my yard, and even if I leave the country for a year, that ditch will still be there when I return to the yard. Heck, even if a ditch gets filled in by erosion, or human interference, the traces of a ditch will be there two millennia later, when Time Team archaeologists excavate it looking for a Roman villa.

I don't think there are many games that let you dig a hole anywhere in the world, and have that persist. Although, thinking about it, Minecraft would qualify.

Considering that my indie game is a one man effort, I can't create a huge authored game world. Which means that I will, like Minecraft, rely on procedural generation of my universe. But, after that procedural generation, I want the player to be able to dig anywhere, and everywhere.

Enter The Planetary Ring

As a setting for my game, I have chosen space. More specifically, a whole bunch of rocks orbiting a planet. Procedural generation using 3D noise fields are well suited to create amorphous blobs in space. There is of course the whole problem of a 10,000 bowls of oatmeal, but still, it is easy to create a scene in which the player will experience the vastness of space.

18 billion billion

A side note here: even though the world is presented as a ring of rocks around a planet, I decided to make this ring infinite: As the player hops from one rock to the next, I keep generating new ones at the far end, causing the player never to reach the other side of the planet. With a 64-bit seed for each rock, 18 billion billion different rocks can be generated. This guarantees the player will never see the same rock twice, regardless how long they travel for.

Every planetoid in the game approaches a billion cubic feet of rock. Each and every cubic foot in there, can be dug by the player. This enables the "Tunnel To China" where you can dig to the other end of the planetoid, for instance.

To add permanence to the player's influence on this world, we have to store the modifications by the player of course. Only that terrain that was modified by the player is stored to disk, leaving the virgin grounds represented by the proc-gen algorithm alone. So, even though the game world is infinite, the amount of digging is limited by the disk space of the player's computer.


About the generation of the space rocks: they are defined by two large 3D noise fields. One field defines the density, and by using the Marching Cubes algorithm, a surface is generated to define the shape of the object. The other field defines the material type. This enables me to define veins of mineral ore inside the object.

Standard functions, like Open Simplex Noise, will create smooth terrains and never show steep cliffs. To add cliffs and ragged shapes, we can employ a technique called domain warping: Instead of sampling the noise function with straight x/y/z grid locations, the sample positions are perturbed with another noise value. See Sean Murray's GDC presentation on noise for more info on this.

Results So Far

So, what are the results so far? Well, that little space rover can dig holes in any rock, fly for ages to other parts of the world, and when it returns to the original space rock, those holes are still there. Permanence achieved! Next up: gameplay.

Gameplay is of course the part that matters, but it helps if you can set it in a world where the player has a strong sense of agency. Go anywhere, and leave your permanent mark.

Tuesday, October 20, 2020

Exit vi less often.

I am from the school of thought that recognizes the fact that typing is not the bottleneck for software engineering. So I am perfectly fine to use my command line, my vi and my Makefile. My peers consider me a weirdo for not using an IDE. Additionally, a powerful debugger is not a substitute for thought either, as expertly expressed by Rob Pike.

Not using an IDE means a more direct control over what happens. No schemas, no vcproj xml, no burried GUI fields, no property sheets. Or as I call it: bliss!

But fair enough, I spend a non trivial amount of time opening and closing vi, because I need to navigate my code. What could we do to reduce that?

Remember your place

First off, it helps if vim (I'll use vim and vi interchangably) remembers where you cursor was, last time you edited a file. For this, we have the following snippet in .vim/vimrc configuration file:

if has("autocmd")
  au BufReadPost * if line("'\"") > 0 && line("'\"") <= line("$") | exe "normal! g`\"" | endif

Toggle to/from header file

Next up: we want to be able to quickly toggle between header file and implementation file. Because I mix up C and C++, I unfortunately have three instead of two shortcuts for that. \oh to open the header, \oc to open the .c file, and \oC to open the .cpp file. You can have those too, with this in the configuration:

nnoremap oc :e %<.c
nnoremap oC :e %<.cpp
nnoremap oh :e %<.h

:set timeoutlen=2000

Note: The time out is added, so that you have some time between pressing the leader key (backslash) and the command.

Jump to declaration

But how do I quickly jump to the declaration of a particular function? Well, there is a mechanism for that too, called include jump. The vim editor can parse your include statements, so that with the :ij foo command, you can open up the header that declares the foo() function. But you do need to point vim to the path of files to try, which I do with:

set path=.,**

Jump to another file, regardless of location

If you just want to jump to a named file, but don't want to be typing full paths, you can use the :find bar.cpp syntax.

So there we have it: less quiting and starting vim. For completeness, my entire .vim/vimrc configuration is below:

filetype plugin on

set path=.,**

if has("autocmd")
  au BufReadPost * if line("'\"") > 0 && line("'\"") <= line("$") | exe "normal! g`\"" | endif

set wildignore=*.d,*.o

nnoremap oc :e %<.c
nnoremap oC :e %<.cpp
nnoremap oh :e %<.h

:set timeoutlen=2000

autocmd Filetype python setlocal noexpandtab tabstop=8 shiftwidth=8 softtabstop=8