Code used for the paper Learning from humans: combining imitation and deep reinforcement learning to accomplish human-level performance on a virtual foraging task
- pytorch
- scikit-learn
Note that this implementation does not support cuda
.
algorithms
contains our Pytorch implementations of the algortihms mentioned in the paper: TRPO, PPO, UATRPO, SAC, TD3 and Generative adversarial IL (GAIL)BatchBW_HIL_torch.py
contains our implementation of MLE Imitation learning suitable also for Hierarchical Imitation Learning (https://arxiv.org/pdf/2103.12197.pdf)models.py
contains the NN models used as parameterization for the policiesWorld.py
contains the environmentPlot.py
is used to draw the plots of the paper
- HIL_HRL: run IL followed by RL
- HIL_ablation_study
- HRL_ablation_study
- HIL_ablation_study_allocentric_only
- HRL_ablation_study_allocentric_only
The human trajectories (Fig.9 in the Supplementary Material) are stored in the folder Expert_Data
.
python main.py --mode "HIL_ablation_study" --seed $1
python main.py --number_options 1 --policy PPO --seed $1 --HIL --load_HIL_model
python main.py --mode HRL_ablation_study --policy PPO --seed $1 --HIL --load_HIL_model --load_HIL_model_expert_traj $2
python main.py --number_options 1 --policy PPO --seed $1 --HIL --load_HIL_model --load_HIL_model_expert_traj $2 --adv_reward
python main.py --number_options 1 --policy PPO --seed $1 --load_HIL_model_expert_traj $2 --adv_reward
python main.py --mode "HIL_ablation_study_allocentric_only" --seed $1
python main.py --mode HRL_ablation_study_allocentric_only --policy PPO --seed $1 --HIL --load_HIL_model --load_HIL_model_expert_traj $2