So I have been developing a new game where a rocket engine Globally Illuminates its environment. It started out as an AVX2 implementation, but since then I have been pursuing other acceleration techniques. For instance, can I run the photon mapping algorithm on the GPU instead? Here are some initial results of the different approaches.
Technology: AVX2, single threaded.
OS: Linux
Device: i5-4570 CPU @ 3.20GHz.
Cores: 1
Frames per second: 51
Technology: GLSL- OpenGL3.2
OS: Linux
Device: Intel® HD Graphics 4600 GPU
Cores: ?
Frames per second: 37
Technology: GLSL- OpenGL3.2
OS: Linux
Device: ATI HD 5670 Graphics, Open Source Gallium driver.
Cores: ?
Frames per second: 60
Technology: OpenCL 1.2
OS: Linux
Device: i5-4570 CPU @ 3.20GHz.
Cores: 4
Frames per second: 38
Technology: GLSL - OpenGLES 3.0
OS: Android
Device: Adreno(TM) 420
Cores: ?
Frames per second: 10-14.
Remarks: Oscillates with a 3.5s period. Thermal throttle?
Technology: OpenCL 1.2
OS: Mac OSX
Device: i5-4278U CPU @ 2.60GHz.
Cores: 4
Frames per second: 11
Remarks: Could not handle work group sizes larger than 1.
Technology: OpenCL 1.2
OS: Mac OSX
Device: Intel Iris GPU
Cores: 40
Frames per second:-
Remarks: Apple's OpenCL compiler failed to build the OpenCL kernel source. Gave 'Parse Error' on a perfectly fine source fragment for no traceable reason.
Some conclusions I got from this: OpenCL has not been worth the effort. It takes 4 CPU cores to get to a speed that still lies significantly below the speed of my hand optimized AVX2 implementation that runs on a single core.
Apple's OpenCL seems to be in a bad shape. I could not get it to run on GPU, and running on CPU yielded a 3.5 times slower result compared to Linux.
The GLSL implementation seems promising. A dependency on ES3.1 or OpenGL 3.2 is less of a barrier than the AVX2 dependency. With some temporal caching, and reducing photon counts, it should be able to reach solid 60fps on integrated GPUs, and maybe even 60fps on future mobile CPUs.