Skip to content

Rashmeet09/Hierarchical-Actor-Critic-Pytorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hierarchical-Actor-Critic-Pytorch

Hierarchical Actor Critic (HAC) helps agents learn tasks more quickly by enabling them to break problems down into short sequences of actions. It uses

  1. DDPG (Lilicrap et. al. 2016),
  2. Universal Value Function Approximators (UVFA) (Schaul et al. 2015), and
  3. Hindsight Experience Replay (HER) (Andrychowicz et al. 2017).

Deep Deterministic Policy Gradient (DDPG) is an actor-critic, model-free, off-policy algorithm to learn a policy over continuous action domains. It was proposed by Lillicrap et. al. 2016 after the success of Deep Q Network (DQN) for discrete action domains in Mnih et. al. 2015. DDPG is based on Deep Deterministic Gradient (DPG) actor-critic algorithm proposed by Silver et. al. 2014. Innovations of DQN:

  1. the network is trained off-policy with samples from a "replay buffer" to minimize correlations between samples;
  2. the network is trained with a separate "target Q network" to give consistent targets during temporal difference backups.

Use following commands to train and test

You can modeify the training and testing configuration and the parameters of HAC and DDPG in the main.py file.

python3 main.py ‘MountainCarContinuous-v1’
python3 main.py ‘Pendulum-v1’

References:

  1. Andrew Levy et. al. 2016
  2. Andrew Levy et. al. 2018
  3. Lilicrap et. al. 2016
  4. Tensorflow github by Andrew
  5. PyTorch github by Nikhil
  6. Blog by Andrew

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published