Skip to content

Releases: kengz/SLM-Lab

Remove Data Space History, Optimize Memory

09 Sep 06:37
f577d78

Choose a tag to compare

This release optimizes the RAM consumption and memory sampling speed after stress-testing with Atari. RAM growth is curbed, and replay memory RAM usage is now near theoretical optimality.

Thanks to @mwcvitkovic for providing major help with this release.

Remove DataSpace history

#163

  • debug and fix memory growth (cause: data space saving history)
  • remove history saving altogether, and mdp data. remove aeb add_single. This changes the API.
  • create body.df to track data efficiently as a replacement. This is the API replacement for above.

Optimize Replay Memory RAM

#163 first optimization, halves replay RAM

  • make memory state numpy storage float16 to accommodate big memory size. half a million max_size virtual memory goes from 200GB to 50GB
  • memory index sampling for training with large size is very slow. add a method fast_uniform_sampling to speed up

#165 second optimization, halves replay RAM again to the theoretical minimum

  • do not save next_states for replay memories due to redundancy
  • replace with sentinel self.latest_next_states during sampling
  • 1 mil max_size for Atari replay now consumes 50Gb instead of 100Gb (was 200Gb before float16 downcasting in #163 )

Add OnPolicyAtariReplay

#164

  • add OnPolicyAtariReplay memory so that policy based algorithms can be applied to the Atari suite.

Misc

  • #157 allow usage as a python module via pip install -e . or python setup.py install
  • #160 guard lab default.json creation on first install
  • #161 fix agent save method, improve logging
  • #162 split logger by session for easier debugging
  • #164 fix N-Step-returns calculation
  • #166 fix pandas weird casting breaking issue causing process to hang
  • #167 uninstall unused tensorflow and tensorboard that come with Unity ML-Agents. rebuild Docker image.
  • #168 rebuild Docker and CI images

v2.0.0 Singleton Mode, CUDA Support, Distributed Training

03 Sep 20:28

Choose a tag to compare

This major v2.0.0 release addresses the user feedbacks on usability and feature requests:

  • makes the singleton case (single-agent-env) default
  • adds CUDA GPU support for all algorithms (except for distributed)
  • adds distributed training to all algorithms (ala A3C style)
  • optimizes compute, fixes some computation bugs

Note that this release is backward-incompatible with v1.x. and earlier.

v2.0.0: make components independent of the framework so it can be used outside of SLM-Lab for development and production, and improve usability. Backward-incompatible with v1.x.

Singleton Mode as Default

#153

  • singleton case (single-agent-env-body) is now the default. Any implementations need only to worry about singleton. Uses the Session in lab.
  • space case (multi-agent-env-body) is now an extension from singleton case. Simply add space_{method} to handle the space logic. Uses the SpaceSession in lab.
  • make components more independent from framework
  • major logic simplification to improve usability. Simplify the AEB and init sequences. remove post_body_init()
  • make network update and grad norm check more robust

CUDA support

#153

  • add attribute Net.cuda_id for device assignment (per network basis), and auto-calculate the cuda_id by trial and session index to distribute jobs
  • enable CUDA and add GPU support for all algorithms, except for distributed (A3C, DPPO etc.)
  • properly assign tensors to CUDA automatically depending if GPU is available and desired
  • run unit tests on machine with GTX 1070

Distributed Training

#153 #148

  • add distributed key to meta spec
  • enable distributed training using pytorch multiprocessing. Create new DistSession class which acts as the worker.
  • In distributed training, Trial creates the global networks for agents, then passes to and spawns DistSession. Effectively, the semantics of a session changes from being a disjoint copy to being a training worker.
  • make distributed usable for both singleton (single agent) and space (multiagent) cases.
  • add distributed cases to unit tests

State Normalization

#155

  • add state normalization using running mean and std: state = (state - mean) / std
  • apply to all algorithms
  • TODO conduct a large scale systematic study of the effect is state normalization vs without it

Bug Fixes and Improvements

#153

  • save() and load() now include network optimizers
  • refactor set_manual_seed to util
  • rename StackReplay to ConcatReplay for clarity
  • improve network training check of weights and grad norms
  • introduce BaseEnv as base class to OpenAIEnv and UnityEnv
  • optimize computations, major refactoring
  • update Dockerfile and release

Misc

  • #155 add state normalization using running mean and std
  • #154 fix A2C advantage calculation for Nstep returns
  • #152 refactor SIL implementation using multi-inheritance
  • #151 refactor Memory module
  • #150 refactor Net module
  • #147 update grad clipping, norm check, multicategorical API
  • #156 fix multiprocessing for device with cuda, without using cuda
  • #156 fix multi policy arguments to be consistent, and add missing state append logic

PPOSIL, fix continuous actions and PPO

08 Aug 06:19
fb617ae

Choose a tag to compare

This release adds PPOSIL, fixes some small issues with continuous actions, and PPO ratio computation.

Implementations

#145 Implement PPOSIL. Improve debug logging
#143 add Arch installer thanks to @angel-ayala

Bug Fixes

#138 kill hanging processes of Electron for plotting
#145 fix PPO wrong graph update sequence causing ratio to be 1. Fix continuous action output construction. add guards.
#146 fix continuous actions and add full tests

add SIL, fix PG loss bug, add dueling networks

28 Jun 07:46
fbf482e

Choose a tag to compare

This release adds some new implementations, and fixes some bugs from first benchmark runs.

Implementations

#127 Self-Imitation Learning
#128 Checkpointing for saving models
#129 Dueling Networks

Bug Fixes

#132 GPU test-run fixes
#133 fix ActorCritic family loss compute getting detached, and linux plotting issues, add SHA to generated specs

v1.1.0 full roadmap algorithms and features

19 Jun 03:24

Choose a tag to compare

Canonical Algorithms and Components

This release is research-ready.

Finish implementation of all canonical algorithms and components. All design is fully refactored and usable across components as suitable. This release is ready for research. Read the updated doc

SLM Lab implements most of the recent canonical algorithms and various extensions. These are used as the base of research.

Algorithm

code: slm_lab/agent/algorithm

Various algorithms are in fact extensions of some simpler ones, and they are implemented as such. This makes the code very concise.

Policy Gradient:

  • REINFORCE
  • AC (Vanilla Actor-Critic)
    • shared or separate actor critic networks
    • plain TD
    • entropy term control
  • A2C (Advantage Actor-Critic)
    • extension of AC with with advantage function
    • N-step returns as advantage
    • GAE (Generalized Advantage Estimate) as advantage
  • PPO (Proximal Policy Optimization)
    • extension of A3C with PPO loss function

Value-based:

  • SARSA
  • DQN (Deep Q Learning)
    • boltzmann or epsilon-greedy policy
  • DRQN (Recurrent DQN)
  • Double DQN
  • Double DRQN
  • Multitask DQN (multi-environment DQN)
  • Hydra DQN (multi-environment DQN)

Below are the modular building blocks for the algorithms. They are designed to be general, and are reused extensively.

Memory

code: slm_lab/agent/memory

For on-policy algorithms (policy gradient):

  • OnPolicyReplay
  • OnPolicySeqReplay
  • OnPolicyBatchReplay
  • OnPolicyBatchSeqReplay

For off-policy algorithms (value-based)

  • Replay
  • SeqReplay
  • StackReplay
  • AtariReplay
  • PrioritizedReplay

Neural Network

code: slm_lab/agent/net

These networks are usable for all algorithms.

  • MLPNet (Multi Layer Perceptron)
  • MLPHeterogenousTails (multi-tails)
  • HydraMLPNet (multi-heads, multi-tails)
  • RecurrentNet
  • ConvNet

Policy

code: slm_lab/agent/algorithm/policy_util.py

  • different probability distributions for sampling actions
  • default policy
  • Boltzmann policy
  • Epsilon-greedy policy
  • numerous rate decay methods

Atari, Dockerfile, PPO

16 May 15:37
99f54b4

Choose a tag to compare

New features and improvements

  • some code cleanup to prepare for the next version
  • DQN Atari working, not optimized yet
  • Dockerfile finished, ready to run lab at scale on server
  • implemented PPO in tensorflow from OpenAI, along with the utils

v1.0.2 Evolutionary Search

04 Mar 17:16
6f03300

Choose a tag to compare

New features and improvements

  • add EvolutionarySearch for hyperparameter search
  • rewrite and simplify the underlying Ray logic
  • fix categorical error in a2c
  • improve experiment graph: wider, add opacity

v1.0.1: fitness, analysis, tune A2C and Reinforce

17 Feb 02:19
04e8048

Choose a tag to compare

New features and improvements

  • improve fitness computation after usage
  • add retro analysis script, via yarn analyze <dir>
  • improve plotly renderings
  • improve CNN and RNN architectures, bring to Reinforce
  • fine tune A2C and Reinforce specs

v1.0.0: First stable release with full lab features

04 Feb 23:09
c4538fc

Choose a tag to compare

This is the first stable release of the lab, with the core API and features finalized.

Refer to the docs:
Github Repo | Lab Documentation | Experiment Log Book

Features

All the crucial features of the lab are stable and tested:

  • baseline algorithms
  • OpenAI gym, Unity environments
  • modular reusable components
  • multi-agents, multi-environments
  • scalable hyperparameter search with ray
  • useful graphs and analytics
  • fitness vector for universal benchmarking of agents, environments

Baselines

The first release includes the following algorithms, with more to come later.

  • DQN
  • Double DQN
  • REINFORCE
    • Option to add entropy to encourage exploration
  • Actor-Critic
    • Batch or episodic training
    • Shared or separate actor and critic params
    • Advantage calculated using n-step returns or generalized advantage estimation
    • Option to add entropy to encourage exploration