Skip to content

Files

Latest commit

075df3f · Jul 30, 2020

History

History
This branch is 223 commits behind upb-lea/reinforcement_learning_course_materials:master.

Exercise 05

In this exercise we will revisit the included racetrack_environment to have a look at temporal difference (TD) algorithms.

Tasks:

  1. policy evaluation using TD learning
  2. on-policy epsilon-greedy control using TD learning
  3. off-policy epsilon-greedy control using TD learning → Q-learning
  4. using double Q-learning in stochastic environments