Request help on Double DQN

Hi, thanks a lot for your great work! 

I have a question, in the `Double DQN`, maybe the following code needs a `stop_gradient`?

    target_q = rewards + (gamma*double_q * (1-terminal_flags))

The `double_q` is from the target DQN. And when updating the main DQN, the error will back propagated to the target DQN if we don't stop the flow, right? So do we need to stop the gradient as follows?
 
    target_q = tf.stop_gradient(target_q)

Could you please give some advice? Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request help on Double DQN #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Request help on Double DQN #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions