Release v1.1.0 full roadmap algorithms and features · kengz/SLM-Lab

Canonical Algorithms and Components

This release is research-ready.

Finish implementation of all canonical algorithms and components. All design is fully refactored and usable across components as suitable. This release is ready for research. Read the updated doc

SLM Lab implements most of the recent canonical algorithms and various extensions. These are used as the base of research.

Algorithm

code: slm_lab/agent/algorithm

Various algorithms are in fact extensions of some simpler ones, and they are implemented as such. This makes the code very concise.

Policy Gradient:

REINFORCE
AC (Vanilla Actor-Critic)
- shared or separate actor critic networks
- plain TD
- entropy term control
A2C (Advantage Actor-Critic)
- extension of AC with with advantage function
- N-step returns as advantage
- GAE (Generalized Advantage Estimate) as advantage
PPO (Proximal Policy Optimization)
- extension of A3C with PPO loss function

Value-based:

SARSA
DQN (Deep Q Learning)
- boltzmann or epsilon-greedy policy
DRQN (Recurrent DQN)
Double DQN
Double DRQN
Multitask DQN (multi-environment DQN)
Hydra DQN (multi-environment DQN)

Below are the modular building blocks for the algorithms. They are designed to be general, and are reused extensively.

Memory

code: slm_lab/agent/memory

For on-policy algorithms (policy gradient):

OnPolicyReplay
OnPolicySeqReplay
OnPolicyBatchReplay
OnPolicyBatchSeqReplay

For off-policy algorithms (value-based)

Replay
SeqReplay
StackReplay
AtariReplay
PrioritizedReplay

Neural Network

code: slm_lab/agent/net

These networks are usable for all algorithms.

MLPNet (Multi Layer Perceptron)
MLPHeterogenousTails (multi-tails)
HydraMLPNet (multi-heads, multi-tails)
RecurrentNet
ConvNet

Policy

code: slm_lab/agent/algorithm/policy_util.py

different probability distributions for sampling actions
default policy
Boltzmann policy
Epsilon-greedy policy
numerous rate decay methods

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.1.0 full roadmap algorithms and features

Choose a tag to compare

Sorry, something went wrong.