RL Model

Deep Q-learning model is used for the implementation of RL.

Keras handles the neural network training and execution. Tensorflow handles backend for computation.

Input

Instead of using convolutional neural networks (CNNs), as in traditional RL implementations for games (eg. atari, alphaGo), discrete angle, continuous-time LiDAR data is fed into the model. This helps to drastically reduce training time, provided that now feature extraction from an image snapshot is not necessary, and LiDAR data encodes in itself the surroundings of the bot which automatically accounts for the symmetry in the circumstances offered by the arena.

The LiDAR calculation is done discretely for every 20 degrees which returns back information regarding nearest buff, de-buff zones, their classification, and obstacles. pyMunk implementation based upon SONAR

Output

Model outputs 3 parameters, one each for no change in the angle of movement, left angle change and right angle change. A maximum of three (similar to softMax) is taken as the current time decision.

Rewards

Enemy buff zone : -10000
Debuff zone : -10000
Buff zone : +4000 or +3000
If current movement gets closer to goal : 400, else : -500
Hit obstacle : -1000

Reward penalty based upon time spent and time extensions.

Wiki

Model Details

Game Details

Implementation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RL Model

Input

Output

Rewards

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally