Pathfinding is often associated with AI, because the A* algorithm and many other pathfinding algorithms were developed by AI researchers. Several biology-inspired AI techniques are currently popular, and I receive questions about why I don’t use them. Neural Networks model a brain learning by example—given a set of right answers, it learns the general patterns. Reinforcement Learning models a brain learning by experience—given some set of actions and an eventual reward or punishment, it learns which actions are good or bad. Genetic Algorithms model evolution by natural selection—given some set of agents, let the better ones live and the worse ones die. Typically, genetic algorithms do not allow agents to learn during their lifetimes, while neural networks allow agents to learn only during their lifetimes. Reinforcement learning allows agents to learn during their lifetimes and share knowledge with other agents.
Neural networks are structures that can be “trained” to recognize patterns in inputs. They are a way to implement function approximation: given y1 = f(x1), y2 = f(x2), ..., yn = f(xn), construct a function f’ that approximates f. The approximate function f’ is typically smooth: for x’ close to x, we will expect that f’(x’) is close to f’(x). Function approximation serves two purposes:
- Size: the representation of the approximate function can be significantly smaller than the true function.
- Generalization: the approximate function can be used on inputs for which we do not know the value of the function.
Neural networks typically take a vector of input values and produce a vector of output values. Inside, they train weights of “neurons”. Neural networks use supervised learning, in which inputs and outputs are known and the goal is to build a representation of a function that will approximate the input to output mapping.
In pathfinding, the function is f(start, goal) = path. We do not already know the output paths. We could compute them in some way, perhaps by using A*. But if we are able to compute a path given (start, goal), then we already know the function f, so why bother approximating it? There is no use in generalizing f because we know it completely. The only potential benefit would be in reducing the size of the representation of f. The representation of f is a fairly simple algorithm, which takes little space, so I don’t think that’s useful either. In addition, neural networks produce a fixed-size output, whereas paths are variable sized.
Instead, function approximation may be useful to construct components of pathfinding. It may be that the movement cost function is unknown. For example, the cost of moving across an orc-filled forest may not be known without actually performing the movement and fighting the battles. Using function approximation, each time the forest is crossed, the movement cost f(number of orcs, size of forest) could be measured and fed into the neural network. For future pathfinding sessions, the new movement costs could be used to find better paths. Even when the function is unknown, function approximation is useful primarily when the function varies from game to game. If a single movement cost applies every time someone plays the game, the game developer can precompute it beforehand.
Another function that is could benefit from approximation is the heuristic. The heuristic function in A* should estimate the minimum cost of reaching the destination. If a unit is moving along path P = p1, p2, ..., pn, then after the path is traversed, we can feed n updates, g(pi, pn) = (actual cost of moving from i to n), to the approximation function h. As the heuristic gets better, A* will be able to run quicker.
Neural networks, although not useful for pathfinding itself, can be used for the functions used by A*. Both movement and the heuristic are functions that can be measured and therefore fed back into the function approximation.
Genetic Algorithms allow you to explore a space of parameters to find solutions that score well according to a “fitness function”. They are a way to implement function optimization: given a function g(x) (where x is typically a vector of parameter values), find the value of x that maximizes (or minimizes) g(x). This is an unsupervised learning problem—the right answer is not known beforehand. For pathfinding, given a starting position and a goal, x is the path between the two and g(x) is the cost of that path. Simple optimization approaches like hill-climbing will change x in ways that increase g(x). Unfortunately in some problems, you reach “local maxima”, values of x for which no nearby x has a greater value of g, but some faraway value of x is better. Genetic algorithms improve upon hill-climbing by maintaining multiple x, and using evolution-inspired approaches like mutation and cross-over to alter x. Both hill-climbing and genetic algorithms can be used to learn the best value of x. For pathfinding, however, we already have an algorithm (A*) to find the best x, so function optimization approaches are not needed.
Genetic Programming takes genetic algorithms a step further, and treats programs as the parameters. For example, you would breeding pathfinding algorithms instead of paths, and your fitness function would rate each algorithm based on how well it does. For pathfinding, we already have a good algorithm and we do not need to evolve a new one.
It may be that as with neural networks, genetic algorithms can be applied to some portion of the pathfinding problem. However, I do not know of any uses in this context. Instead, a more promising approach seems to be to use pathfinding, for which solutions are known, as one of many tools available to evolving agents.
Like genetic algorithms, Reinforcement Learning is an unsupervised learning problem. However, unlike genetic algorithms, agents can learn during their lifetimes; it’s not necessary to wait to see if they “live” or “die”. Also, it’s possible for multiple agents experiencing different things to share what they’ve learned. Reinforcement learning has some similarities to the core of A*. In A*, reaching the end goal is propagated back to mark all the choices that were made along the path; other choices are discarded. In reinforcement learning, every state can be evaluated and its reward (or punishment) is propagated back to mark all the choices that were made leading up to that state. The propagation is made using a value function, which is somewhat like the heuristic function in A*, except that it’s updated as the agents try new things and learn what works. One of the key advantages of reinforcement learning and genetic algorithms over simpler approaches is that there is a choice made between exploring new things and exploiting the information learned so far. In genetic algorithms, the exploration via mutation; in reinforcement learning, the exploration is via exlicitly allowing the probability of choosing new actions.
As with genetic algorithms, I don’t believe reinforcement learning should be used for the pathfinding problem itself, but instead as a guide for teaching agents how to behave in the game world.