Skip to content

RL Model

Vaibhav Vardhan edited this page Dec 5, 2019 · 2 revisions

Deep Q-learning model is used for the implementation of RL.

Keras handles the neural network training and execution. Tensorflow handles backend for computation.

Input

Instead of using convolutional neural networks (CNNs), as in traditional RL implementations for games (eg. atari, alphaGo), discrete angle, continuous-time LiDAR data is fed into the model. This helps to drastically reduce training time, provided that now feature extraction from an image snapshot is not necessary, and LiDAR data encodes in itself the surroundings of the bot which automatically accounts for the symmetry in the circumstances offered by the arena.

The LiDAR calculation is done discretely for every 20 degrees which returns back information regarding nearest buff, de-buff zones, their classification, and obstacles. pyMunk implementation based upon SONAR

Output

Model outputs 3 parameters, one each for no change in the angle of movement, left angle change and right angle change. A maximum of three (similar to softMax) is taken as the current time decision.

Rewards

  • Enemy buff zone : -10000
  • Debuff zone : -10000
  • Buff zone : +4000 or +3000
  • If current movement gets closer to goal : 400, else : -500
  • Hit obstacle : -1000

Reward penalty based upon time spent and time extensions.

Wiki

Model Details

Game Details

Clone this wiki locally