-
Notifications
You must be signed in to change notification settings - Fork 0
RL Model
Deep Q-learning model is used for the implementation of RL.
Keras handles the neural network training and execution. Tensorflow handles backend for computation.
Instead of using convolutional neural networks (CNNs), as in traditional RL implementations for games (eg. atari, alphaGo), discrete angle, continuous-time LiDAR data is fed into the model. This helps to drastically reduce training time, provided that now feature extraction from an image snapshot is not necessary, and LiDAR data encodes in itself the surroundings of the bot which automatically accounts for the symmetry in the circumstances offered by the arena.
The LiDAR calculation is done discretely for every 20 degrees which returns back information regarding nearest buff, de-buff zones, their classification, and obstacles. pyMunk implementation based upon SONAR
Model outputs 3 parameters, one each for no change in the angle of movement, left angle change and right angle change. A maximum of three (similar to softMax) is taken as the current time decision.
- Enemy buff zone : -10000
- Debuff zone : -10000
- Buff zone : +4000 or +3000
- If current movement gets closer to goal : 400, else : -500
- Hit obstacle : -1000
Reward penalty based upon time spent and time extensions.