Skip to content

v1.1.0 full roadmap algorithms and features

Choose a tag to compare

@kengz kengz released this 19 Jun 03:24
· 2432 commits to master since this release

Canonical Algorithms and Components

This release is research-ready.

Finish implementation of all canonical algorithms and components. All design is fully refactored and usable across components as suitable. This release is ready for research. Read the updated doc

SLM Lab implements most of the recent canonical algorithms and various extensions. These are used as the base of research.

Algorithm

code: slm_lab/agent/algorithm

Various algorithms are in fact extensions of some simpler ones, and they are implemented as such. This makes the code very concise.

Policy Gradient:

  • REINFORCE
  • AC (Vanilla Actor-Critic)
    • shared or separate actor critic networks
    • plain TD
    • entropy term control
  • A2C (Advantage Actor-Critic)
    • extension of AC with with advantage function
    • N-step returns as advantage
    • GAE (Generalized Advantage Estimate) as advantage
  • PPO (Proximal Policy Optimization)
    • extension of A3C with PPO loss function

Value-based:

  • SARSA
  • DQN (Deep Q Learning)
    • boltzmann or epsilon-greedy policy
  • DRQN (Recurrent DQN)
  • Double DQN
  • Double DRQN
  • Multitask DQN (multi-environment DQN)
  • Hydra DQN (multi-environment DQN)

Below are the modular building blocks for the algorithms. They are designed to be general, and are reused extensively.

Memory

code: slm_lab/agent/memory

For on-policy algorithms (policy gradient):

  • OnPolicyReplay
  • OnPolicySeqReplay
  • OnPolicyBatchReplay
  • OnPolicyBatchSeqReplay

For off-policy algorithms (value-based)

  • Replay
  • SeqReplay
  • StackReplay
  • AtariReplay
  • PrioritizedReplay

Neural Network

code: slm_lab/agent/net

These networks are usable for all algorithms.

  • MLPNet (Multi Layer Perceptron)
  • MLPHeterogenousTails (multi-tails)
  • HydraMLPNet (multi-heads, multi-tails)
  • RecurrentNet
  • ConvNet

Policy

code: slm_lab/agent/algorithm/policy_util.py

  • different probability distributions for sampling actions
  • default policy
  • Boltzmann policy
  • Epsilon-greedy policy
  • numerous rate decay methods