Wednesday, November 2, 2016

Cache Friendly

In my upcoming title Children of Orc the largest computational load is from the GOAP action planner that the NPCs use. I found that simple plans would be found in a millisecond. But larger plans would really tax the CPU. Especially if no plan exists to reach the goal state, a lot of cycles would be consumed. Typically a few seconds on a mid range corei5 CPU. This means it would be prohibitively expensive to do on a mobile device, if I were to port the game to iOS or Android.

I profiled it with linux command line tools, as well with Xcode's Instruments tool. Both showed that the cycles were spent in a linear search for the lowest cost node. The search would iterate though up to 32K nodes to find the lowest 'f' value in an array of these structures:


struct astarnode
{
        worldstate_t ws;                //!< The state of the world at this node.
        int g;                          //!< The cost so far.
        int h;                          //!< The heuristic for remaining cost (don't overestimate!)
        int f;                          //!< g+h combined.
        const char* actionname;         //!< How did we get to this node?
        worldstate_t parentws;          //!< Where did we come from?
};

Examining the assembly output of the linear search, I could not find anything wrong with the compiler's code. The only bothersome issue was that all the 'f' values that were being compared were lying 160 bytes apart.

To give the CPU cache an easier time, I decided to store the data as a structure of arrays instead. So now the search is performed on an array of ints, tightly packed, and no longer on an array of structures.

The result of this little exercise? A speed up of 2.7× which pleased me very much. If in the future, more speed ups would be required, I could possibly store the values in a priority queue instead. But for now, this will do nicely.

Note that in my GOAP implementation I search with A* and the most common operations are testing for presence in CLOSED and OPEN sets. These have already been accelerated with hash sets, shifting the bottleneck of the search to the finding of the lowest cost node.

No comments:

Post a Comment