Sunday, November 5, 2017

Profiling on Linux

Speaking of profiling, I wrote an inline profiler for multi threaded Linux apps called ThreadTracer. It's quite good, as it records both wall clock time and cpu time. And on top of that, it also keeps track of pre-empted threads and voluntary context switches.

It's a fine tool, making use of Google Chrome's tracing capability to view the measurements. But for the really low level, micro-sized measurements, we need an additional tool. And this tool is the linux perf tool.

Instead of using perf directly, I found that using the ocperf.py wrapper from PMU-Tools to be a much better option. It has proven to be more reliable for me. You can make it sample your program to see where the cycles are spent. I use the following command line:

$ ocperf.py record -e cpu-cycles:pp --call-graph=dwarf ./bench
$ ocperf.py report -g graph,0.25,caller

In addition to the perf wrapper, it also comes with a great overall analysis tool called toplvl.py which gives you a quick insight into potential issues. Start at level 1 (-l1) and drill your way down to more specific issues, using:

$ toplev.py --long-desc -l1 ./bench

Friday, October 27, 2017

Pitting Profilers against each other.

In the past, I have been using Remotery, an in-app profiler. I recently became aware of Minitrace, a similar app. So I decided to compare results.

The good news is that when my ray tracer is working in single-threaded mode, the results are in agreement. 6ms or so is spent on uploading the image as texture to OpenGL. The rest of the time is spent rendering scanlines.

Minitrace:
Remotery:

I can also run my app in multi-threaded mode. The scanlines are then rendered in 100 work batches. The batches are processed by four worker threads, that are alive during the lifetime of the app.

Minitrace:
Remotery:

The Minitrace run shows that the worker threads were fully busy during the generation of the image. Sometimes, I see a chunk that take a lot more time (> x10) than normal, which made me doubt the measurement. This was the reason I decided to compare to Remotery. However, now I no longer think this is a bad measurement. One of the worker-threads probably got pre-empted by the OS or something.

The Remotery run, on the other hand, seems to be missing data? Could it be a race-condition between worker threads trying to record events? I'll be creating a github issue, but wrote this blog post first, so that the images are hosted.

OS: 64bit Ubuntu Linux.
CPU: Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz
Both Minitrace and Remotery latest version from github as of oct27, 2017.

Thursday, October 26, 2017

sRGB Colour Space - Part Deux.

In part one, I described what you need to do after simulating light before you can show it on a monitor. You need to brighten it by going from linear colour space to sRGB colour space.

Here I talk about how to avoid polluting your calculations with non-linear data. Light intensities can be added or multiplied. But sRGB values cannot. As Tom Forsyth puts it: just as you would not add two zip files together, you would also not add sRGB values.

If you were to take a photograph of a certain material, the image from the camera will typically be in the sRGB colour space. If you want to render a 3D object that has this texture applied to it, then for the lighting calculations you need to get a texel in the linear colour space.

Fortunately, OpenGL can help you out here. If you sample from an sRGB encoded texture, there will be an automatic sRGB->linear conversion applied, so that after the texel fetch, you can actually do calculations with it. To trigger this automatic conversion you need to pass the correct parameters when creating the texture using the glTexImage2D() function. Instead of using GL_RGB8 or GL_RGBA8, you specify the internal format as GL_SRGB8 or GL_SRGB8_ALPHA8. There are also compressed variants: GL_COMPRESSED_SRGB and others.

Be careful that you do not use sRGB textures for content that is not in sRGB. If the content is linear, like maybe a noise map, or a normal map, then you don't want OpenGL meddling with that content by doing a sRGB to Linear conversion step. This kind of data needs to be in a texture with linear colour encoding.

Lastly, when creating mip maps for sRGB, you need to be careful.

sRGB versus Linear Colour Space

I keep screwing up my colour spaces, so I forced myself to write down the rationale behind it. A lot of it comes from Tom Forsyth.

CRT monitors respond non-linearly to the signal you drive it with. If you send it value 0.5 you get less than half the brightness (photons) of a pixel with a 1.0 value. This means, you cannot do light calculations in sRGB space. You need to do them in a linear space.

Once you have calculated the light for your rendered image, you need to send it to the monitor. LCD monitors are made to respond the same way as the old CRT monitors. So the values you send to your framebuffer will end up producing too few photons (too dark.)

To account for this, you need to convert your rendered image from linear colour space to sRGB colour space. This means that all dark pixels need to be brightened up. One way to do this, which avoids manual conversion, is to have OpenGL do this for you. You create a framebuffer that is sRGB capable. With SDL2 you do this with the SDL_GL_FRAMEBUFFER_SRGB_CAPABLE flag to SDL_GL_SetAttribute() function. In iOS you can use kEAGLColorFormatSRGBA8 drawable property of the CAEAGLLayer.

Once you have this special framebuffer, you tell OpenGL Core Profile that you want the conversion to happen. To do this, you use glEnable( GL_FRAMEBUFFER_SRGB );

Note that OpenGL ES3 does not have this glEnable flag. If the ES3 framebuffer is sRGB capable, the conversion is always enabled.

When my renderer does the lighting calculations, it will work in a linear colour space. After rendering, it would produce this linear image:

For proper display on a monitor, we need to account for the monitor's response curve, so we change it into the sRGB colour space. After conversion, the dark colours are brighter:

Yes, much brighter! But hey! What's up with those ugly colour bands? Unfortunately, by converting the values into sRGB, we lose a lot of precision, which means that 8-bit colour channels are no longer adequate. In 8-bits, the three darkest linear values are 0x00, 0x01 and 0x02. After converting these values to sRGB, they are mapped to 0x00, 0x0c and 0x15. Let that sink in... there is a gap of "1" between linear colours 0x00 and 0x01, but a gap of "12" between corresponding sRGB neighbours.

So when we convert from linear to sRGB, we should never convert from 8bit linear to 8bit sRGB. Instead, we convert using floating point linear values. If OpenGL is rendering to a sRGB capable framebuffer, it just needs to read from floating point textures. In my game, the ray tracer now renders to a floating point texture. This texture is then put on a full screen quad onto the sRGB framebuffer, resulting in a correct image:

And that's it for this installment, people. In the future I could perhaps gain more precision by rendering to a framebuffer with 10 bit channels. But for now this is good enough.

Please see part two of this blog post series where I explain what you need to do if you sample textures in your shader and want to do light calculations on that.

Tuesday, September 26, 2017

vtune permissions

Note to self: before running vtune, do:

echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
sudo chmod 755 /sys/kernel/debug
sudo chown -R bram:bram /sys/kernel/debug/tracing

Saturday, August 26, 2017

Linear filtering of masked PNG images.

If you render your RGBA sprites in OpenGL using...

glBlendFunc( GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA )
and you use GL_LINEAR filtering, then you may see black borders around your sprite edges.

The reason for the pixel artefacts on the border (they can be white too, or another colour) is that the linear sampling causes incorrect R/G/B to be summed. If one of the samples falls on a zero-alpha pixel, then that pixel's RGB colour gets weighed into the average, even though it is not visible.

This is a common pitfall in sprite rendering. The answer given on the stackexchange question is the correct one: you should use pre-multiplied alpha textures, and use instead:

glBlendFunc( GL_ONE, GL_ONE_MINUS_SRC_ALPHA )

The downside of this is that PNGs are — per specification — non pre multiplied. And Inkscape can only create PNGs, not TIFFs which would support pre-multiplied alpha. Also, stb_image lacks TIFF support too. So how to solve this by keeping PNG as the source material?

The trick is to have the proper background colour set for pixels that have alpha 0 (fully transparent.) If you know that you will be blitting these sprites onto a white background, then these masked out pixels should be value ff/ff/ff/00. If you know that you will be blitting these sprites onto a red background instead, use value ff/00/00/00 instead.

This is all good and well, but software (like Cairo and Inkscape) often mistreat alpha-zero pixels. Cairo sets them all to 00/00/00/00 for instance, even though there may be colour information in the fully transparent pixels. This means you cannot anticipate the colour of the target buffer, as the masked out pixels get a black colour. In my code, I have my texture loader swap out the alpha-0 pixels with a new RGB value, that matches the background against which the sprites are rendered. Note that this solution results in lower quality than pre-multiplied alpha sprites, but does have the advantage that it is less of a hassle.

Above left, you can see the effect of having the wrong colour (black) for fully transparent pixels. On the image on the right, you see the same compositing, but where the sprite has its transparent pixel colour set to white.

My fellow game-dev Nick, from Slick Entertainment fame, suggested another approach of bleeding out the colour value into the transparent pixels. That makes the sprite material a little more versatile, as you can render them against any colour background. I think it does give a slightly less correct result though, for the case where you do know the background colour and prepare for that.

Wednesday, August 23, 2017

Match 3

I decided to challenge myself in writing a quick and dirty Match-3 game. Not sure, how I came up with the theme, but it occurred to me that Smileys with facial feature permutations would make for interesting content.

I spent a full day on art, using Inkscape and OpenClipArt. A big time saver is the Inkscape feature that lets you render a specified object from the command line, so I don't have to wrestle with the GUI to do all the exporting.

The next day was spent coding. The game is written in a C-Style C++, on top of my own engine with the help of SDL2. I'm happy to report that I had the basics working in only a day: you can swap Smileys, and after matching, the Smileys above fall down, and are replenished. No scoring yet, no sound, no visual effects. But I did put in a very cute feature for hints: All Smileys will look in the direction of a smiley you could swap for a match. I think that's a cute touch.

My matching mechanism is quite novel too, I think. Instead of simply matching up a colour or shape, you need to match up one of the facial features. So three Smileys with the same glasses will match up. So do three Smileys with the same hair. Or three Smileys with the same moustache/beard.

I found that this feature matching takes more mental effort than traditional colour matching. Is that a good thing or a bad thing? On one hand, it makes for less relaxing gameplay. On the other hand, it seems a more intense training for your mind. So it's a trade-off.

I am happy with the visual appeal of the board though. A varied collection of Smileys makes for a happy sight.