Given two agents that control rockets to bounce a ball over a net. The goal is that the agents must bounce ball between one another while not dropping or sending ball out of bounds. The agent receives a reward of +0.1 if it hits the ball over the net. It receives a reward of -0.1 if the ball hits the ground or sends out of bounds.
The observation space consists of 8 variables: the position of the ball and racket and the velocity of the ball and rocket. There are two continuous actions are available for each agent: (1) movement toward or away from the net, (2) jumping.
The environment is considered solved, when the agents get in average at least +0.5 reward over 100 episodes.
-
Download the environment that matches your operating system:
- Linux: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here
-
Place the file in the DRLND (https://github.com/udacity/deep-reinforcement-learning), in the
p3_collab-compet/
folder, and unzip (or decompress) the file.
- Open
Tennis.ipynb
Run the cells from 1. to 5. After the training a checkpoint.pth file will be created containing all trained weights.