-
-
Notifications
You must be signed in to change notification settings - Fork 60
Open
Description
currently multi-step TD has an incorrect parameter (JuliaReinforcementLearning/ReinforcementLearning.jl#648).
ReinforcementLearningAnIntroduction.jl/notebooks/Chapter09_Random_Walk.jl
Lines 193 to 216 in e83f540
| function run_once(n, α) | |
| env = StateTransformedEnv( | |
| RandomWalk1D(N=NS, actions=ACTIONS), | |
| state_mapping=GroupMapping(n=NS) | |
| ) | |
| agent = Agent( | |
| policy=VBasedPolicy( | |
| learner=TDLearner( | |
| approximator=TabularVApproximator(; | |
| n_state=n_groups+2, | |
| opt=Descent(α) | |
| ), | |
| method=:SRS, | |
| n=n | |
| ), | |
| mapping=(env,V) -> rand(action_space(env)) | |
| ), | |
| trajectory=VectorSARTTrajectory() | |
| ) | |
| hook = RecordRMS() | |
| run(agent, env, StopAfterEpisode(10),hook) | |
| mean(hook.rms) | |
| end |
as an example, the n is used as the number of time steps. however it currently corresponds to the number of time steps plus one. run_once(1, α) thus is not TD(0) which has a time step parameter of 1, but rather a 2-step TD method. depending on how upstream is resolved an update might be needed here.
Metadata
Metadata
Assignees
Labels
No labels