Skip to content

Conversation

taufeeque9
Copy link
Collaborator

Description

This PR changes the adversarial algorithm such that the at any iteration, the rollouts are collected first, and then the discriminator is trained, followed by training the generator. This modification matches Algorithm 1 given in the AIRL paper.

Testing

The proposed change improves the returns obtained on many environments. The table below shows the imitation-to-expert return ratio of the algorithms on several environments. The results were obtained by tuning the hyperparameters for each environment separately. The return ratio was obtained by evaluating the tuned hyperparameters on five distinct seeds and calculating the average return ratio to the expert's return.

Algo \ Env Ant Half Cheetah Hopper Swimmer Walker
GAIL-PR 0.883 0.868 1.01 0.986 0.989
AIRL-PR -0.04 0.993 1.01 0.926 0.270
GAIL-Master 0.864 0.981 1.004 0.945 0.893
AIRL-Master 0.259 0.447 1.008 0.663 0.176

@ernestum
Copy link
Collaborator

Thanks a lot @taufeeque9 for adding this change. We will need it for #675 !

For my understanding: does the table show comparisons to the previous version of the implementation?

@taufeeque9
Copy link
Collaborator Author

The table shows comparisons with the current version of the algorithm implemented on the master branch, which hasn't been updated since I last computed the results. -PR indicates the modified algorithm implemented in this PR and -Master indicates the algorithm implemented currently on the master branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants