This is the code for implementing the intention reading and generalization experiments from the paper
Do deep reinforcement learning agents model intentions?.
It is using the simple_spread environment from the
Multi-Agent Particle Environments (MPE).
- 
To install,
cdinto the root directory and typepip install -e . - 
Known dependencies: OpenAI gym, tensorflow, numpy, also scikit-learn and matplotlib for plotting.
 
- 
Download and install the MPE code here by following the
README. - 
To run the code,
cdinto theexperimentsdirectory and run:- for basic MADDPG agents:
./experiment.sh coop_navi_0 - for MADDPG + shared scheme, all agents use shared model:
./experiment.sh coop_navi_shared_0 --shared - for MADDPG + shuffle scheme, agents are shuffled for each episode:
./experiment.sh coop_navi_shuffle_episode_0 --shuffle episode - for MADDPG + ensemble scheme, agents are sampled for each episode:
./experiment_ensemble.sh coop_navi_ensemble_episode_0 --ensemble-choice episode 
 - for basic MADDPG agents:
 
train.py- basic training script, also used for evaluationensemble.py- ensemble training script, also used for evaluationlearning_curve.py- plots learning curve of an experimentstatistics.py- collects basic benchmark data from evaluationprepare.py- simplifies evaluation data for further processingprepare_ensemble.py- simplifies evaluation data for further processing, for ensemble resultsaccuracy.py- calculates per-timestep target prediction accuraciesfigure.py- plots target prediction accuracies for all agentssheldon.py- runs evaluation against Sheldon agents (agents with fixed targets)sheldon_ensemble.py- runs evaluation against Sheldon agents, for ensemble results
For usage details refer to experiment.sh, experiment_ensemble.sh and individual files.
If you used this code for your experiments or found it helpful, consider citing the following paper:
@article{matiisen2018do,
  title={Do deep reinforcement learning agents model intentions?},
  author={Matiisen, Tambet and Labash, Aqeel and Majoral, Daniel and Aru, Jaan and Vicente, Raul},
  journal={arXiv preprint arXiv:1805.06020},
  year={2018}
}
Thanks to OpenAI for the original paper and for releasing the code.