Author: Denny Loevlie
Originally published in the direction of artificial intelligence.
Solving the Sutton and Bartos racing problem with reinforcement learning.
This story is only for us for a member. Update to access all media.
This post includes the solution and extension of the racing track problem from chapter 5 of learning to gain strengthening by Sutton and Barto. If you want to read the problem and try it yourself, you can find it in the free version of the online book HERE. The whole code needed to repeat the results in this post can be found in this GitHUB repository: https://github.com/loevlie/reinforcement_learning_tufts/tree/main/racetrack_monte_carlo.
Monte Carlo (MC) control methods are calculated because they rely on extensive sampling. However, unlike dynamic programming methods (DP), MC does not assume that the agent has excellent knowledge about the environment, which makes him more flexible in uncertain or complex scenarios. Thanks to MC methods, the agent ends the entire episode before updating the rules. This is beneficial from the theoretical point of view, because the expected sum of future discounted awards can be accurately calculated based on the actual future prizes registered during this episode.
The problem from the racing track after learning to strengthen by Sutton and Barto motivates reaching the finish line, providing a permanent prize for -1 at every stage of the episode and causing that the agent has returned to the start, when it works … Read the full blog for free on the medium.
Published via AI