GitHub - VittorioGiammarino/Learning-from-humans-combining-imitation-and-deep-on-policy-reinforcement-learning-to-accomplish-su

Code used for the paper Learning from humans: combining imitation and deep reinforcement learning to accomplish human-level performance on a virtual foraging task

Please cite the paper if you used this code or any of its components

Dependencies

pytorch
scikit-learn

Note that this implementation does not support cuda.

Summary of the code

algorithms contains our Pytorch implementations of the algortihms mentioned in the paper: TRPO, PPO, UATRPO, SAC, TD3 and Generative adversarial IL (GAIL)
BatchBW_HIL_torch.py contains our implementation of MLE Imitation learning suitable also for Hierarchical Imitation Learning (https://arxiv.org/pdf/2103.12197.pdf)
models.py contains the NN models used as parameterization for the policies
World.py contains the environment
Plot.py is used to draw the plots of the paper

main "modes"

HIL_HRL: run IL followed by RL
HIL_ablation_study
HRL_ablation_study
HIL_ablation_study_allocentric_only
HRL_ablation_study_allocentric_only

Data Set

The human trajectories (Fig.9 in the Supplementary Material) are stored in the folder Expert_Data.

Imitation Learning (Fig. 10)

python main.py --mode "HIL_ablation_study" --seed $1

Reinforcement Learning (Fig. 4)

python main.py --number_options 1 --policy PPO --seed $1 --HIL --load_HIL_model

PPO ablation study (Fig. 11)

python main.py --mode HRL_ablation_study --policy PPO --seed $1 --HIL --load_HIL_model --load_HIL_model_expert_traj $2

Imitation Learning Adversarial Reward (Fig. 12)

python main.py --number_options 1 --policy PPO --seed $1 --HIL --load_HIL_model --load_HIL_model_expert_traj $2 --adv_reward

Imitation Learning + PPO Adversarial Reward (Fig. 13)

python main.py --number_options 1 --policy PPO --seed $1 --load_HIL_model_expert_traj $2 --adv_reward

Imitation Learning Allocentric only (Fig. 14)

python main.py --mode "HIL_ablation_study_allocentric_only" --seed $1

Imitation Learning + PPO Allocentric only (Fig. 15)

python main.py --mode HRL_ablation_study_allocentric_only --policy PPO --seed $1 --HIL --load_HIL_model --load_HIL_model_expert_traj $2

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Expert_data		Expert_data
Figures		Figures
__pycache__		__pycache__
algorithms		algorithms
results		results
.gitignore		.gitignore
BatchBW_HIL_torch.py		BatchBW_HIL_torch.py
Buffer.py		Buffer.py
Plot.py		Plot.py
README.md		README.md
Store_expert_data.py		Store_expert_data.py
World.py		World.py
evaluation.py		evaluation.py
main.py		main.py
models.py		models.py
pre-analysis.py		pre-analysis.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code used for the paper Learning from humans: combining imitation and deep reinforcement learning to accomplish human-level performance on a virtual foraging task

Please cite the paper if you used this code or any of its components

Dependencies

Summary of the code

main "modes"

Data Set

Imitation Learning (Fig. 10)

Reinforcement Learning (Fig. 4)

PPO ablation study (Fig. 11)

Imitation Learning Adversarial Reward (Fig. 12)

Imitation Learning + PPO Adversarial Reward (Fig. 13)

Imitation Learning Allocentric only (Fig. 14)

Imitation Learning + PPO Allocentric only (Fig. 15)

About

Releases

Packages

Languages

VittorioGiammarino/Learning-from-humans-combining-imitation-and-deep-on-policy-reinforcement-learning-to-accomplish-su

Folders and files

Latest commit

History

Repository files navigation

Code used for the paper Learning from humans: combining imitation and deep reinforcement learning to accomplish human-level performance on a virtual foraging task

Please cite the paper if you used this code or any of its components

Dependencies

Summary of the code

main "modes"

Data Set

Imitation Learning (Fig. 10)

Reinforcement Learning (Fig. 4)

PPO ablation study (Fig. 11)

Imitation Learning Adversarial Reward (Fig. 12)

Imitation Learning + PPO Adversarial Reward (Fig. 13)

Imitation Learning Allocentric only (Fig. 14)

Imitation Learning + PPO Allocentric only (Fig. 15)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages