Wednesday, April 25, 2018

Leaving Track Prints

I am currently developing my, still unnamed, indiegame. This game is a 2D top down tank fight. And its main gimmick is the 100% destructible world.

Nice destruction if I say so myself, but notice that the tanks don't leave track prints. Today, I will write about implementing track prints that the tanks leave behind on the terrain as they drive over it.

The first observation to be made here, is that there are a lot of them, for each tank. That means generating and rendering thousands of them, if not tens of thousands or even hundreds of thousands. This immediately tells me that they can't be rendered individually. I need to apply a technique called Instanced Rendering.

Rendering

In instanced rendering, all instances share the model vertex data and have some per-instance data to make them unique. This per-instance data is typically a transformation matrix, but can also include other things like colour if need be. In my case, the per-instance data can be particularly compact because I work in two dimensions.

All the prints will be identical, except for two things: their position in the world, and their orientation. So in theory, three values would be enough: an x and y coordinate, plus a rotation angle. But personally, I find that defining rotation with a vector, like Chipmunk2D does, is more elegant. Hence, I will feed OpenGL a 2D vector for position and 2D vector for orientation.

The next thing to consider is the life-time of the prints. If we create a new print at frame N, then we will need to render it at that frame N and all other frames after it. Up until frame M (M much larger than N) where we need to evict this print to make space. After all, we don't want to run out of resources by creating arbitrary many prints.

The fact that I progressively create the prints, and reclaim resources for the oldest one, leads me to the convenient solution of ring-buffers. We create a Vertex Buffer Object to hold the shared model data plus N instances. When creating instance N+1, we will reuse the slot at position 0. Each frame, we will only write the VBO at the slots that got new data that same frame.

Generating

Having the rendering covered, leaves me the problem of generating the prints. This problem is trickier. The tank has many track-segments touching the ground at any time, all leaving a print. When the tank drives straight, those prints all superimpose, so you would only really need to generate one of them. But when the tank turns, this won't work, and gets worse if it turns-in-place. See below what happens if you leave one print at each side of the tank. The tracks look fine, until the tank does a 180 degrees spin.

And it looks particularly bad if the tank gets bumped hard and moves sideways. I haven't really cracked the problem of generating proper tracks yet. I think the root of the problem lies in the fact that the game's simulation has no concept of the track links. The tank it self is just four rigid bodies, one chassis, one turret and two for the L/R tracks. The links of the tracks are just an animation effect.

So the generation of track prints needs some more work. I'll report back when and if I solve it.

Tuesday, February 20, 2018

Returning to iOS development.

It occurred to me that the new iPad Pro 120Hz display is a great motivation to update my Little Crane game for iOS. So after a long time, I returned to my iOS codebase. Here I report some random findings.

🔴 OS Support
Currently, Little Crane supports iOS3.2 and up. But the current Xcode (9.2) does not support anything under iOS8. Oh well, abandoning a few old devices then.

🔴 Launch Image
Also scrapped by iOS: Launch Images. If you want to have support for iPad Pro, you now need new fangled Launch Screen storyboards. As more iOS devices got released, the launching process got more complex over time:

  • First, they were just specially named images in your bundle.
  • Then, they were images in an Asset Catalog.
  • Now, they are a storyboard with a whole lot of crap that comes with this. Oh boy.

🔴 Bloated AdMob
Scrapped a long time ago, was the iAd product. So if you want to have ads in your app, you need to look elsewhere. I went with the other behemoth in advertisements: AdMob. When upgrading from AdMob SDK 7.6.0 to 7.28.0 I was unpleasantly surprised. I now need to link to a whole bunch of extra stuff. I think ads do 3D rendering now, as opposed to just playing a video? New dependencies in Admob:

  • GL Kit
  • Core Motion
  • Core Video
  • CFNetwork
  • Mobile Core Services

🔴 GKLeaderboardViewControllerDelegate
Leaderboards with a delegate has been deprecated. It probably still works, so I am tempted to leave in the old code. I do get this weird runtime error message when closing a Game Center dialog though: "yowza! restored status bar too many times!"

Tuesday, February 13, 2018

Flame Graphs and Data Wrangling.

In my pursuit of doing Real Time (60fps) Ray Tracing for a game, I have been doing a lot of profiling with 'perf.' One way to quickly analyse the results from a perf record run, is by making a FlameGraph. Here's a graph for my ray tracing system:

Click here for expanded and interactive view.

During my optimization effort, I've found that lining up all the data nicely for consumption by your algorithm works wonders. Have everything ready to go, and blast through it with your SIMD units. For ray tracing, this means having your intersection routines blast through the data, as ray tracing in its core, is testing rays versus shapes. In my game, these shapes are all AABBs, and my intersection code tests 8 AABBs versus a single ray in one go. A big contribution to hitting 60fps ray tracing is the fact that my scenes use simple geometry: AABBs, almost as simple as spheres, but more practical for world building.

This is all fine and dandy, but does expose a new problem: your CPU is busy more with wrangling the data than doing the actual computation. Even when I cache the paths that primary rays take (from camera into scene) for quick reuse, the administration around intersection tests takes up more time than the tests themselves.

This is visible in the graph above, where the actual tests are in linesegment_vs_box8 (for shadow rays) and ray_vs_box8 (for primary rays.) It seems to be some wall I am hitting, and having a hard time to push through for even more performance.

So my shadow rays are more costly than my primary rays. I have a fixed camera position, so the primary rays traverse the world grid in the same fashion each frame. This, I exploit. But shadow rays go all over the place, of course, and need to dynamically march through my grid.

In order to alleviate the strain on the CPU a bit, I cut the number of shadow rays in half, by only computing shadow once for two frames, for each pixel. So half the shadow information lags by one frame.

So to conclude: if you line up all your geometry before hand, and having it packed by sets of 8, then the actual intersection tests take almost no time at all. This makes it possible to do real time ray tracing at a 800x400 resolution, at 60 frames per second, at 1.5 rays per pixel on 4 cores equipped with AVX2. To go faster than that, I need to find a way to accelerate the data-wrangling.

Friday, January 5, 2018

2017 Totals

So, The Little Crane That Could is waning. Here are the 2017 results (Number of free Downloads.) It did manage to surpass a 19M lifetime downloads.

2017 2016 2015 2014 2013 2012 2011
iOS 191K 416K 630K 1300K 3199K 3454K 1550K
Android 1100K 1515K 1525K 825K 1579K 1656K -
Mac 10K 20K 30K 53K 81K -
OUYA - - 0K 4K 15K - -
Kindle 9K 48K 52K 46K 95K - -
Rasp Pi - - ? ? 6K - -

Friday, December 15, 2017

Too late for the modern Gold Rush?

Quick, name the one technology that is hotter that Virtual Reality or Machine Learning today. Yep... the hottest thing right now is crypto currency. But did you miss out on the gravy train?

Well, judging from the dizzying levels of the two heavy weights in the crypto world, Bitcoin and Ethereum you could think so. At the time of writing, the price of bitcoin went from ¢24 to $17666,- in 8 years, which is roughly a factor 74,000. It's nigh impossible for bitcoin to repeat that in the next eight years, as there is no room for a $1.3B bitcoin.

Bitcoin got its position as top dog by virtue of being first on the scene. Ethereum got there by the strength of its features, as its blockchain offers a mechanism to write distributed applications and write smart contracts. It is vastly more versatile and more utilitarian than Bitcoin will ever be. For instance, you can use it to implement a cross-breeding, trading platform for virtual kitties with guaranteed pedigree. Sure, it's silly, but it does allow for, among other things, creating cryptographically secured scarce virtual objects.

So Ethereum for the win, then? Well, maybe not. Because of the meteoric rise in the price of Ether (the coin for Ethereum) developers of distributed apps may think twice about running their code on the Ethereum Virtual Machine. When an app is running, it is consuming precious gas. The the price for this gas will quickly become prohibitively expensive.

So if we discount Bitcoin and Ethereum as viable candidates for getting in late, what's left? With over a 1000 to chose from, is there one with a bright future, capable of rising to the top and displacing both Bitcoin and Ethereum? Spoiler: yes there is.

There is an interesting newcomer by the name "NEO." I've seen it described as "China's Ether." I came across it reading a thread about Ethereum killers.

So from what I've been able to ascertain, is that the NEO coin is not mined. If you want NEO, you have to buy it. However, the nice thing about NEO is that as you hold it in your crypto wallet, it generates GAS. Yep, the GAS that is used for running the distributed apps similarly, as Ethereum executes apps. An interesting aspect, right there: gas gets generated by the coins, which means you do not have to spend your precious and rapidly appreciating crypto coin to use the Virtual Machine.

Another possible contender from the aforementioned thread is EOS, by the way. A commenter described it as: "If bitcoin is currency, and ethereum is gas, EOS is land". So that may be worth looking into.

So for the sake of argument, let's say we want to hedge out bets, and get some of those dividend yielding NEO coins. How would you purchase them? Well, they are best purchased using another crypto coin like Bitcoin or Ether. If you don't have those, I suggest you head over to the nearest bitcoin ATM in your city.

With bitcoins in your wallet, it is now time to purchase NEO on an exchange. I recommend Binance (referral link) which has been excellent for me. It has some amazing advantages that other exchanges do not have:

  • No verification needed for below 2btc withdrawals.
  • After signing up you can fund and trade immediately.
  • No fee for withdrawal of NEO.
  • Great trading interface.
  • Easy to use.
  • Based in stable Japan/HongKong without much government interference.

I personally learned this too late, but you do not want to end up with fractional NEO coins. Buying 10.0 or 10.01 NEO is fine. But if you end up with 10.99 NEO, then you can only transfer out the whole coins, and have a less useful 0.99 NEO left over.

With the NEO coins in your Binance account, you can withdraw those for free to a wallet that you created yourself, on your own computer. I recommend the Neon Wallet. Before you withdraw from Binance to your own wallet, make absolutely sure you printed out the private key of your newly created wallet, on paper. And store it in a safe place. Lose your key, and you will lose your coins.

Let me conclude by showing Binance's nifty graph that in real time shows you the volume of buyers and sellers at all price points.

Sunday, November 5, 2017

Profiling on Linux

Speaking of profiling, I wrote an inline profiler for multi threaded Linux apps called ThreadTracer. It's quite good, as it records both wall clock time and cpu time. And on top of that, it also keeps track of pre-empted threads and voluntary context switches.

It's a fine tool, making use of Google Chrome's tracing capability to view the measurements. But for the really low level, micro-sized measurements, we need an additional tool. And this tool is the linux perf tool.

Instead of using perf directly, I found that using the ocperf.py wrapper from PMU-Tools to be a much better option. It has proven to be more reliable for me. You can make it sample your program to see where the cycles are spent. I use the following command line:

$ ocperf.py record -e cpu-cycles:pp --call-graph=dwarf ./bench
$ ocperf.py report -g graph,0.25,caller

In addition to the perf wrapper, it also comes with a great overall analysis tool called toplvl.py which gives you a quick insight into potential issues. Start at level 1 (-l1) and drill your way down to more specific issues, using:

$ toplev.py --long-desc -l1 ./bench

Friday, October 27, 2017

Pitting Profilers against each other.

In the past, I have been using Remotery, an in-app profiler. I recently became aware of Minitrace, a similar app. So I decided to compare results.

The good news is that when my ray tracer is working in single-threaded mode, the results are in agreement. 6ms or so is spent on uploading the image as texture to OpenGL. The rest of the time is spent rendering scanlines.

Minitrace:
Remotery:

I can also run my app in multi-threaded mode. The scanlines are then rendered in 100 work batches. The batches are processed by four worker threads, that are alive during the lifetime of the app.

Minitrace:
Remotery:

The Minitrace run shows that the worker threads were fully busy during the generation of the image. Sometimes, I see a chunk that take a lot more time (> x10) than normal, which made me doubt the measurement. This was the reason I decided to compare to Remotery. However, now I no longer think this is a bad measurement. One of the worker-threads probably got pre-empted by the OS or something.

The Remotery run, on the other hand, seems to be missing data? Could it be a race-condition between worker threads trying to record events? I'll be creating a github issue, but wrote this blog post first, so that the images are hosted.

OS: 64bit Ubuntu Linux.
CPU: Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz
Both Minitrace and Remotery latest version from github as of oct27, 2017.