Tuesday, September 29, 2020

7 Years

Amazingly, the PC that I built has lasted me a whole 7 years. It is still a fine PC. It had one or two graphics card upgrades, and some extra RAM, but other than that, a Haswell CPU is still a perfectly fine workhorse.

I would have continued using that PC for even longer, if it were not for two things: One: I got curious about AVX-512 and what I could do with that. Two: my daughter showed interest in PC tech, so I thought this would be a good educational opportunity, so I let her help me build it.

Annoyingly, many, if not most, of Intel's new processors still do not come with AVX512. They keep introducing CPUs that don't have it. Why? However, there are some interesting routes to take, if you want AVX512. One of them is to get this obscure little appliance: Intel Crimson Canyon NUC. But going back to just 2 cores? Eh....

Another interesting route to take to AVX512 is to buy what corporate and pro users no longer want: old Xeons. Take, e.g a professional Mac user that bought the 2017 Mac Pro with 8 cores, starting at the $4999,- price. Some time later, this pro-user needs to have a faster Mac, so they upgrade the Xeon CPU. What happens to that old W-2140B Xeon that was replaced? It gets dumped on eBay for $200 or so! Similarly, corporate users dump high end Supermicro Xeon boards there too.

So why not snap those up, and build our own Xeon based workstation? That was the plan, and that was what happened. I am now the proud owner of a used PC, suited to replace my aging Haswell.

Some things I learned along the way:
  • Some fans (looking at you, Noctua) do not reliably report the RPM, so the motherboard can sometimes read a '0' value, causing it to panic, and put all fans on full blast. So out with the Noctua, in with an aging repurposed Corsair fan, yanked from a broken watercooling kit.
  • I bought a really cheap audio card from Amazon, thinking it would be fine. But the System Event Log (SEL) of Supermicro actually showed that there were parity errors on the PCIexpress bus. This went away after yanking out the sound card.
  • Initially this system ran with an antique GTX750Ti. As a developer I need to test with a variety of GPUs to improve my code, so I thought I should replace it with a Radeon. The quality of the AMD GPU PRO drivers being what it is: the resulting stress is just not worth it. That GPU went back to Amazon.
  • The InWin 301 MicroATX case has two fan-mounts on the front of the case, yet a sealed front with no air inlet. So you end up using the bottom fan mounts, blocking a PCIExpress slot with it. It doesn't seem to be a smart design.
  • Even though the CPU and Motherboard can be purchased at discounted prices on eBay, the ECC RAM that is needed is still full price.
  • I designed the system for low power usage. So a PSU rated at 550W and 80+ GOLD, seemed good enough. In hindsight, more head room would be better: I read that PSUs perform best at their 50% level or so.
  • It is quite interesting to be able to manage your system on a side-channel. With IPMI you can manage many aspects of your system.
  • To set sensor limits, use ipmitool sensor thresh FAN1 lower 100 200 300 and ipmitool sensor thresh FAN1 upper 3000 4000 5000.
  • Block the power management mode C6 in the BIOS. If I don't, my OS keeps getting small pauses in which the PC seems frozen: not even keyboard strokes make it through. UPDATE: This may have been incorrect. It looks like the pauses were caused by slow writes of the Crucial P1 NVMe drive.
  • My Western Digital NVMe storage (BLUE) is mostly not found by BIOS during boot. I need at least 3 reboots before it gets found once. After replacing with NVMe drive by Crucial, that problem went away. Faulty WD unit? Or just not compatible? It seems to be a common complaint by Amazon buyers of the WD NVMe storage.

Wednesday, September 23, 2020

Solving Collisions

I've been developing games for a long time now, but solving collisions remains a hairy subject. The Physics Engine I have used the most is OpenDE. It worked well for Little Crane. But the fact that OpenDE considers a triangle mesh as a generic 'triangle soup' as opposed to a closed surface, tends to get me into trouble. Consider the figure, below, where a wheel intersects the terrain.

Here, we expect triangles A and B to cause the wheel to be pushed out of the terrain. And we expect this correction in the direction of the triangle normals for A and B. Unfortunately, this is not always what happens. As OpenDE considers every triangle individually, it will report collisions on internal edges of the mesh, and tries to correct them with collision normals that lie in the triangle plane, as depicted below.

To make matters worse, other edges can cause collision constraints that push in the completely opposite direction, causing the wheel to get stuck in the terrain.

So what are the options here? One approach is to ignore all collision contacts with collision normals that do not align with the triangle normals. This can cause sinking into the terrain.

Another approach is to correct all reported collision normals, and override them with the triangle normals. This seems to work reasonably well, until you hit a 'thin' part of the terrain, where the wheel goes inside the terrain, and emerges from the other side, out of another triangle, as depicted below.

Blindly using the triangle normals as collision normals leads to bad things here: The wheel is simulateously restricted to go only up and to go only down, meaning the solver will freeze the wheel in place! To solve this, we need to somehow detect one collision as being the first, and ignore the other one.

At the end of the day, filtering the contact points that your physics engine gives you is a non-trivial problem. Limiting the maximum velocities, and using tiny timesteps go a long way, but even then you can get into trouble. If you can, building your terrain out of convex shapes only will save you a lot of troubles, as there is always a well defined inside and outside, making the collision resolution simpler. With generic triangle meshes, you have to be careful.

Saturday, September 19, 2020

OpenCL on a Radeon

So, the game I am currently developing was written around Procedural Generation of terrain, using the CUDA language. CUDA is great for doing GPGPU (General Purpose computing on a Graphics Processor.) It's well thought-out, and straightforward to use. It only has one drawback: it will only run on nVidia GPUs.

If I am to sell my game on Steam, it will have to run on AMD GPUs as well. So that means supporting something besides CUDA. The most portable way of doing GPGPU, is GLSL, but that is very cumbersome, as you need textures to get your data out, for starters. The next most portable way would be OpenCL.

At the time of porting from CUDA to OpenCL, I did not have an AMD GPU, so I did the OpenCL port using my nVidia GPU. OpenCL is a little rougher around the edges than CUDA, but the port did work fine, and ran at the same speed too. So it was time to test it on AMD hardware. As my freshly built Xeon Workstation reused a very aging GTX 750 Ti, it was upgrade time anyway, so out with the GTX 750 Ti, and in with the Radeon RX 5500 XT.

The last time I used a Radeon, the linux drivers for it were a mess, and worse, left your Ubuntu install in a mess too, by using it. In 2020, things are easier, and Ubuntu supports it out-of-the-box with an Open Source driver. However, that Open Source driver has limited capabilities. For starters, it comes without OpenCL, the sole reason why I purchased the Radeon.

So out with the Open Source driver, and in with the proprietary driver. These are the steps I had to take to install OpenCL support for AMD on Ubuntu:

  • Download the proprietary driver from AMD's website.
  • Unpack the archive.
  • The driver comes in two flavours: consumer and pro. You need the pro version.
  • Install as root with: ./amdgpu-pro-install
  • # dpkg -i opencl-amdgpu-pro-comgr_20.30-1109583_amd64.deb
  • # dpkg -i opencl-amdgpu-pro-icd_20.30-1109583_amd64.deb
And now, I can run my OpenCL code:
    OpenCL 2.1 AMD-APP (3143.9) AMD Accelerated Parallel Processing Advanced Micro Devices, Inc. has 1 devices:
    gfx1012 gfx1012 with [11 units] localmem=65536 globalmem=8573157376 dims=3(1024x1024x1024) max workgrp sz 256
I am not sure why it says [11 units] though, as Wikipedia lists the RX 5500 XT as having 22 cores. Hopefully I didn't get scammed with the hardware.

So on Linux, at least, my code now works both on nVidia and AMD, and I can use either CUDA or OpenCL to generate worlds from Open Simplex Noise, like shown below. TODO: Windows port.