Hints

Intro

This page is on general hints for using Neural Networks as RL models.

Hints

It is best to make the samples as IID as possible.

Gradient descent vs gradient TD Updates

Gradient descent works well for many applications but can have a few issues with optimizing for fitting an action value function. Direct gradient descent can move the action value functino Q(s,a) closer to r + gamma * Q(st+1, at+1) but also moves gamma * Q(s,a) closer to Q(s,a) - rt+1. I have found that this can result in a very poor behaviour during the optimization. The landscape tends to get very flat and optimization get slower and slower.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hints

Intro

Hints

Gradient descent vs gradient TD Updates

Uh oh!

Clone this wiki locally