Wednesday, January 16, 2019


Two days ago, I started using the Unity3D game development tool. Interestingly, Unity3D is the most pervasive tool in the Game Dev industry... it is everywhere. I've been making games since 1982, but have never used it before.

I'm one of those dinosaurs keeping to the old way of doing things, and making my own game engines. Think of a manual 3-gear transmission Willys Jeep instead of heated seats, 7 cup-holder, navsat SUV.

So far, I like it better than I thought I would. In this blog article, I will collect some gotchas, insights, surprises that I will undoubtedly encounter.

1. Project loads are slow. Once inside the editor, make sure you stop "Playback" before editing anything... all edits during playback will be lost.

2. Unity uses degrees, not radians for angles.

3. You can only use const for compile-time constants. Not for run-time constants, like C and C++.

4. Unity physics is a wrapper over PhysX. Neither uses ERP and CFM like OpenDE and Bullet Physics do.

5. Unity debug output prints vectors with low precision. So (0.0, 0.3, 1.0) can actually be unit length!

Saturday, January 12, 2019

Sprinkle, Sprinkle, Little Star.

I released a new software application. It is named "Sprinkle, Sprinkle, Little Star" and is available for free for Windows and Linux.

It is a 2D simulation of Gravity acting on a galaxy of stars. It is completely interactive, and you can paint your own galaxy onto a grid, and see the galaxy evolve under Newton's law for gravity.

I recorded a video of myself with a play-though.

I started it during the holiday break, and had fun implementing it. The so called N-Body problem is pretty hard to solve interactively for large numbers of stars, because of its O(N*N) nature: Each star is influenced by each other stars. This means calculating 100M distances if you have 10,000 stars. To do it efficiently, I had to aggregate stars at larger distances. In addition to this, I vectorized my code with AVX intrinsics, that calculates 8 forces in a single go using SIMD. Because my compiler did such a bad job on the scalar code, the speed up was even better than 8, I saw a x18 increase in framerate (which also included rendering, so the computation speed up was even a little more than 18x.) Below is the spatial structure I use to aggregate stars at large distances.

Wednesday, December 26, 2018

Designing a Programming Language: 10Lang.

I have been programming since 1982, I guess that shows my age. But in those 36 years, I have never designed nor implemented a programming language. With the current anti-C++ sentiments on Twitter, it made me ponder on what a better language would look like. So let's start the first tiny steps towards eliminating that "design a programming language" bucket-list item.

Let's start with people jokingly call the hardest part, but really is, the easiest part: the name. As I want my hypothetical language to succeed the prince of all programming languages, C, it only follows that I should name it as a successor the C. Seeing C++ and D are already taken, I see no other option that to call it 10. Or for better Googlability: 10Lang.

If you look at C, you can kind of see that C code is pretty closely modeled after the hardware. Things in the language often have a direct representation in transistors. And if they don't, it's at least not too remote from the silicon, as more complex and more abstract languages are.

What would a good programming model be for today's hardware? For that, let's just look at the main differences between a 1970s processor and a 2018 processor.

One big difference, is that today's CPU has a relatively slow memory. And by slow, I mean, very slow. The CPU has to wait an eternity for data that is not in a register or a cache. This speed discrepancy means that today's CPU can be crippled by things as simple as branches, virtual function calls, irregular data access. What could we do to lead the programmer's mind away from OOP or AoS thinking, and naturally guide her into the SoA or Structure-of-Arrays approach?

Design Decision ALPHA
Arrays come first, scalars second.
If we want our code to be efficient, we need to make sure the CPU can efficiently perform batch operations. The one-off operations can be slower, they are not important from a performance viewpoint. By default, operations should be performed on a range, and doing operations on a scalar should be the special case. Maybe by treating it as array of size 1, maybe by treating it with more verbose syntax, we'll see what works later.

The next big difference between that 1970s processor and today's are the SIMD units. It's probably one of the most distinctive features of a processor nowadays, and will dictate what the register file will look like. So, if we are going to model the programming language after the transistors, then there really are no two ways about it...

Design Decision BETA
SIMD is a first class citizen in the language.
I haven't figured out on how to approach this specifically, yet. First, there is the overlap with design decision ALPHA. SIMD really is kind of like an in-register array. Next, there is the consideration of whether to let the register-width seep through into the language. The C programming language was always pretty vague about how many bits there should be in a char, short or int value. I'm not sure if that was helpful in the long run. But there is int8_t int16_t int32_t now, of course. Would our 10Lang benefit from explicit SIMD widths in the code? I'm hesitant. Maybe yes, maybe no. If we concentrate on 32-bit float/integer values for now, x86_64 can pack 4, 8 or 16 of those 32b values in a 128, 256 or 512 bit SIMD register, using SSE, AVX or AVX-512. I don't believe in accommodating old crap like SSE, so that leaves us with 8 or 16 lanes of 32bits each. (For 64bit values, this would be 4 or 8, which complicates matters further, so let's ignore those for now.) One possibility would be to have native octafloat, octaint, hexafloat, hexaint types. Heck, in an extreme version of language design, we could even leave out the scalar float and int, so that the CPU would never have to switch values from the scalar register file to the SIMD register file, or vice versa. Do we need to accommodate byte access? Maybe not? Text character's haven't been 7 bit ASCII for a long time now. TBD.

Modern processors are complex monstrosities. As we don't want to stall on branching, CPUs make tremendous efforts in predicting branches. At what cost? At the cost of CPU vulnerabilities. Branches hinder efficiency. Instead of making them faster with prediction, why can't we just focus on reducing them instead?

Design Decision GAMMA
Conditional SIMD operations are explicit in code.
So, how do we reduce branch operations? Because memory is now so obscenely slow, often the fastest way to compute things conditionally, is not to branch, but to compute both the TRUE branch and the FALSE branch, and then conditionally select values on a per-SIMD lane basis. Processors have a construct for this. AVX calls this vector blend (as in VBLENDVPS), ARM NEON calls it vector bitwise select (as in VBSL.) The programmer using 10Lang should have explicit access to this construct, so that he may write guaranteed branch-free code if he so chooses. Writing branch-free SIMD code is how you end up with something that is crazy efficient, and just tears through the data and the computations.

With these three main design decisions, it should be possible to sketch out a programming language. So a C like language, but for arrays and SIMD hardware. And I wonder if it could be possible to implement a rudimentary prototype by just using a preprocessor. Just have the 10Lang code translated into plain C with the help of the immintrin.h header file.

But for now, it is feedback time. After reading this, it would be great if you could drop a note in the comments. A folly? An exercise worth pursuing? Let me know!

Thursday, June 14, 2018

Joystick sampling rate in games.

I investigated an interesting conundrum this morning: why was my game running so much differently on my iPad PRO? The tank was snappy, and turning aggressively on the iPad PRO, but not on Linux and Android.

The main difference between iPad PRO and other platforms is its higher display refresh. But I was certain I had this covered, as I step my simulation exactly the same on all platforms: with 1/120s steps. The only difference being, that I render after each step on iPad PRO and only render once after two sim steps on other platforms that have 1/60s display refresh.

First thing to do, is to rule out differences in the iOS port. When I force my iPad to render at 1/60s instead, the iPad behaviour reverts to the same as the Linux/Android ports. Confirmed: it is the display refresh rate that makes the difference, not the platform's architecture.

So why would these two scenarios have different outcome?
[ sim 1/120s, render ] [ sim 1/120s, render ]

[ sim 1/120s,       sim 1/120s,      render ]
|                     |                     |
0ms                 8.3ms                 16.6ms

A logical explanation would be that I somehow influence the simulation somewhere, as I render. But after examining the code, nothing showed up.

It dawned on me that in the high display refresh case, the faster rendering is not the only difference. In 120Hz mode, you not only get more rendering activity, you also get more frequent input events. Touches come in faster when you render 120Hz, as they do when you render at 60Hz. Joystick changes, and touch events are batched with display refresh.

To confirm this, I put in an artificial joystick value, that would simply rotate the joystick at a set pace. Then I adjusted how those joystick changes were relayed for a 60Hz display frame. The result is the video below.

On the right, I adjust the joystick angle with 0.10 radians before each sim step. On the left, I adjust the joystick angle only once for two steps, but at double the the radians.

At 120Hz stick sampling, I get a smoother joystick signal. Even though the joystick rotation speed is the same, the 60Hz sampling shows more jarring deltas. I hadn't expect the effect on the simulation outcome this big.

The reason for the dramatic difference is that the small difference is amplified by the PID controllers I use in my game. In the case of low stick sampling rate, the PID controller will always see a zero change during the second step, and a large change in the first step. The PID controller can react a lot more effectively if it gets a higher frequency signal.

Lesson learned: these two scenarios give different simulation outcomes:

[ read stick,     sim 1/120s,     sim 1/120s,     render ]

[ read stick, sim 1/120s, read stick, sim 1/120s, render ]
|                                                        |
0ms                                                    16.6ms

Although forcing the 120Hz stick signal down to 60Hz is simple to achieve, it will be hard to provide a 120Hz stick signal if you only get your events at 60Hz. So the sweet, reactive control on iPad PRO is hard to achieve on 60Hz devices, unless you interpolate or extrapolate the stick values.

Friday, May 18, 2018

Differential Steering.

I just did a fun little exercise to figure out the steering in my Flank That Tank! indiegame. Of course, the best way to steer a tank is by using two throttle levers, one for each track. This will let the tank driver directly control the differential steering of the tank. It also enables some pretty exciting and wild maneuvers.

So really, case closed. A gamepad typically has two analog joysticks, one joystick for each track. Done!

However, I want to be able to run this game on mobile platforms using touch. And I tried it, but the lack of physical stops really hampers the feel of driving. Two levers on a touch screen simply is no proper substitute. (Not to speak of controlling two levers and a fire button, which is even harder on a touch screen.)

So no levers on a touch screen. Could we perhaps do differential steering with a single touch? Or, quite similarly: do differential steering with a single joystick?

It's fun to figure it out.

  • The stick at 12 o'clock would mean full steam ahead, so the L and R tracks at +100% power.
  • The stick at 06 o'clock would mean full steam backwards, so the L and R tracks at -100% power.
  • The stick at 03 o'clock would mean a hard right turn, in place, so L at +100% and R at -100% power.
  • The stick at 09 o'clock would mean a hard left turn, in place, so L at -100% and R at +100% power.
For the intermediate joystick positions, interpolating these four settings is all that is required.

This ought to work nicely for touch screens. Left thumb to drive the tank, which leaves the right hand free for tapping the screen to shoot, and possibly aim the turret as well.

So I'll be implementing this scheme shortly. That leaves me to consider the issue of absolute/relative control. Some people can't steer a vehicle that drives towards the camera (or in 2D: to the bottom of the screen) as it reverses L/R from the driver's point of view. So I may implement an absolute system as well: the "12 o'clock position" will adapt to where the tank is pointing.

Friday, May 11, 2018

The curious case of FPS jitter.

I tried to record a video of my game this morning, and it bugged me that it wasn't 100% smooth. The game did report 60fps though, so let's find out what is going on.

First order of business, is to graph the delta-time for each frame, instead of just reporting the frames-per-second. And sure enough, I would see jitter in the signal: a slow frame followed by a fast frame.

What was really puzzling, was that I could induce this jitter by pressing a key on the keyboard. Even if this key has no game functionality behind it, the frame time would jitter: slow+fast, for each and every press. And also for the auto-repeat events.

In the picture above, the three red lines are at values 1/60, 1/30 and 1/20 seconds. The green marks are the measured frame times. The jitter shows up for every key I press.

So, perf and FlameGraph to the rescue.

To my surprise, I notice that SDL_IBus_UpdateTextRect() shows up in the profile. Why is SDL updating text rectangles? I'm not doing any text related things. I just render to an OpenGL window. Notice how a single key press leads to an avalanche of computation and communication, with a call depth of 34 functions deep no less!

Frogtoss told me to look into SDL's Text Input system. My code never started a text input cycle, but to be sure, I called SDL_IsTextInputActive() to check. And sure enough: Text Input is active by default! Adding a SDL_StopTextInput() fixed the jitter.

Judging from the flamegraph, a key press when Text Input is active, is incredibly costly, as it involves computation, communication with the X server, polling, waking up stuff, and more. An avalanche of IO happens for every press. So for games, it's best to turn it off as soon as you have initialized SDL.

Executive Summary: after launching your SDL2 based game, call SDL_StopTextInput() for a smoother frame rate.

 if ( SDL_IsTextInputActive() )

Post Sctript: I will try recording that video again. If it still isn't smooth, at least it is not because of this.

Test specs:
Ubuntu 18.04
SDL 2.0.8

Tuesday, May 8, 2018

GDPR and iOS developers.

Two of my mobile games on iOS use AdMob to serve ads. This makes me vulnerable to EU General Data Protection fines, as I have no clear view on what exactly is collected by AdMob. Google puts the responsibility of requesting user permission for the data that AdMob collects on me.

The safest option at this time, is for me to completely stop serving ads. And I may very well end up using that option. But I thought it may be interesting to examine other options.

Let's start with disabling all ads, but only for European customers. How feasible would this be?

Well, it starts with the ill-defined term "European customer." We need to identify exactly who the GDPR applies to. This is what the EU has to say about that:

It applies to all companies processing and holding the personal data of data subjects residing in the European Union, regardless of the company’s location.

Still imprecise, because it says nothing about the subject's location, other than residence. What about a EU citizen on holiday in the US? What about a US citizen on holiday in EU? For now, let's ignore this, and just try to determine residence.

One way, would be to check the user's locale. Typically, it would be set to the country of residence. So the use of NSLocale would be a good start. Better than the alternative of actually checking the user's location, as that would first throw up an annoying dialog requesting permission for checking location.

Is that 100% fool proof? What if user's have their locale setup incorrectly? Let me guess, the onus is on me? Hmm... completely disabling ads seems safer indeed.

Ok, disabling ads completely. Is it as simple as going to the AdMob portal, and stop the Ad servings? Unfortunately not, because AdMob would still be active on the mobile device, and after contacting the AdMob servers would learn that there is no Ad service. However, who's to say the user profile hasn't already been sent to AdMob servers anyway, before learning there are no Ads to show?

So nope, disabling ads can't be done without building and uploading new versions to the app store.

One final remark: those GDPR tools that AdMob talks about? Not there!