Generalizing Centipede Game States for Reinforcement Learning
Reinforcement learning algorithms like Q-learning typically find an optimal policy for some Markov decision process by storing and updating a table of values used to map states to optimal actions1....