Hierarchical Actor Critic (HAC) helps agents learn tasks more quickly by enabling them to break problems down into short sequences of actions. It uses
- DDPG (Lilicrap et. al. 2016),
- Universal Value Function Approximators (UVFA) (Schaul et al. 2015), and
- Hindsight Experience Replay (HER) (Andrychowicz et al. 2017).
Deep Deterministic Policy Gradient (DDPG) is an actor-critic, model-free, off-policy algorithm to learn a policy over continuous action domains. It was proposed by Lillicrap et. al. 2016 after the success of Deep Q Network (DQN) for discrete action domains in Mnih et. al. 2015. DDPG is based on Deep Deterministic Gradient (DPG) actor-critic algorithm proposed by Silver et. al. 2014. Innovations of DQN:
- the network is trained off-policy with samples from a "replay buffer" to minimize correlations between samples;
- the network is trained with a separate "target Q network" to give consistent targets during temporal difference backups.
You can modeify the training and testing configuration and the parameters of HAC and DDPG in the main.py file.
python3 main.py ‘MountainCarContinuous-v1’
python3 main.py ‘Pendulum-v1’