Releases · kengz/SLM-Lab

09 Sep 06:37

kengz

v2.1.0

f577d78

Remove Data Space History, Optimize Memory

This release optimizes the RAM consumption and memory sampling speed after stress-testing with Atari. RAM growth is curbed, and replay memory RAM usage is now near theoretical optimality.

Thanks to @mwcvitkovic for providing major help with this release.

Remove DataSpace history

#163

debug and fix memory growth (cause: data space saving history)
remove history saving altogether, and mdp data. remove aeb add_single. This changes the API.
create body.df to track data efficiently as a replacement. This is the API replacement for above.

Optimize Replay Memory RAM

#163 first optimization, halves replay RAM

make memory state numpy storage float16 to accommodate big memory size. half a million max_size virtual memory goes from 200GB to 50GB
memory index sampling for training with large size is very slow. add a method fast_uniform_sampling to speed up

#165 second optimization, halves replay RAM again to the theoretical minimum

do not save next_states for replay memories due to redundancy
replace with sentinel self.latest_next_states during sampling
1 mil max_size for Atari replay now consumes 50Gb instead of 100Gb (was 200Gb before float16 downcasting in #163 )

Add OnPolicyAtariReplay

#164

add OnPolicyAtariReplay memory so that policy based algorithms can be applied to the Atari suite.

Misc

#157 allow usage as a python module via pip install -e . or python setup.py install
#160 guard lab default.json creation on first install
#161 fix agent save method, improve logging
#162 split logger by session for easier debugging
#164 fix N-Step-returns calculation
#166 fix pandas weird casting breaking issue causing process to hang
#167 uninstall unused tensorflow and tensorboard that come with Unity ML-Agents. rebuild Docker image.
#168 rebuild Docker and CI images

Assets 2

03 Sep 20:28

kengz

v2.0.0

8862b01

v2.0.0 Singleton Mode, CUDA Support, Distributed Training

This major v2.0.0 release addresses the user feedbacks on usability and feature requests:

makes the singleton case (single-agent-env) default
adds CUDA GPU support for all algorithms (except for distributed)
adds distributed training to all algorithms (ala A3C style)
optimizes compute, fixes some computation bugs

Note that this release is backward-incompatible with v1.x. and earlier.

v2.0.0: make components independent of the framework so it can be used outside of SLM-Lab for development and production, and improve usability. Backward-incompatible with v1.x.

Singleton Mode as Default

#153

singleton case (single-agent-env-body) is now the default. Any implementations need only to worry about singleton. Uses the Session in lab.
space case (multi-agent-env-body) is now an extension from singleton case. Simply add space_{method} to handle the space logic. Uses the SpaceSession in lab.
make components more independent from framework
major logic simplification to improve usability. Simplify the AEB and init sequences. remove post_body_init()
make network update and grad norm check more robust

CUDA support

#153

add attribute Net.cuda_id for device assignment (per network basis), and auto-calculate the cuda_id by trial and session index to distribute jobs
enable CUDA and add GPU support for all algorithms, except for distributed (A3C, DPPO etc.)
properly assign tensors to CUDA automatically depending if GPU is available and desired
run unit tests on machine with GTX 1070

Distributed Training

#153 #148

add distributed key to meta spec
enable distributed training using pytorch multiprocessing. Create new DistSession class which acts as the worker.
In distributed training, Trial creates the global networks for agents, then passes to and spawns DistSession. Effectively, the semantics of a session changes from being a disjoint copy to being a training worker.
make distributed usable for both singleton (single agent) and space (multiagent) cases.
add distributed cases to unit tests

State Normalization

#155

add state normalization using running mean and std: state = (state - mean) / std
apply to all algorithms
TODO conduct a large scale systematic study of the effect is state normalization vs without it

Bug Fixes and Improvements

#153

save() and load() now include network optimizers
refactor set_manual_seed to util
rename StackReplay to ConcatReplay for clarity
improve network training check of weights and grad norms
introduce BaseEnv as base class to OpenAIEnv and UnityEnv
optimize computations, major refactoring
update Dockerfile and release

Misc

#155 add state normalization using running mean and std
#154 fix A2C advantage calculation for Nstep returns
#152 refactor SIL implementation using multi-inheritance
#151 refactor Memory module
#150 refactor Net module
#147 update grad clipping, norm check, multicategorical API
#156 fix multiprocessing for device with cuda, without using cuda
#156 fix multi policy arguments to be consistent, and add missing state append logic

Assets 2

08 Aug 06:19

kengz

v1.1.2

fb617ae

PPOSIL, fix continuous actions and PPO

This release adds PPOSIL, fixes some small issues with continuous actions, and PPO ratio computation.

Implementations

#145 Implement PPOSIL. Improve debug logging
#143 add Arch installer thanks to @angel-ayala

Bug Fixes

#138 kill hanging processes of Electron for plotting
#145 fix PPO wrong graph update sequence causing ratio to be 1. Fix continuous action output construction. add guards.
#146 fix continuous actions and add full tests

Assets 2

28 Jun 07:46

kengz

v1.1.1

fbf482e

add SIL, fix PG loss bug, add dueling networks

This release adds some new implementations, and fixes some bugs from first benchmark runs.

Implementations

#127 Self-Imitation Learning
#128 Checkpointing for saving models
#129 Dueling Networks

Bug Fixes

#132 GPU test-run fixes
#133 fix ActorCritic family loss compute getting detached, and linux plotting issues, add SHA to generated specs

Assets 2

19 Jun 03:24

kengz

v1.1.0

6a1bf27

v1.1.0 full roadmap algorithms and features

Canonical Algorithms and Components

This release is research-ready.

Finish implementation of all canonical algorithms and components. All design is fully refactored and usable across components as suitable. This release is ready for research. Read the updated doc

SLM Lab implements most of the recent canonical algorithms and various extensions. These are used as the base of research.

Algorithm

code: slm_lab/agent/algorithm

Various algorithms are in fact extensions of some simpler ones, and they are implemented as such. This makes the code very concise.

Policy Gradient:

REINFORCE
AC (Vanilla Actor-Critic)
- shared or separate actor critic networks
- plain TD
- entropy term control
A2C (Advantage Actor-Critic)
- extension of AC with with advantage function
- N-step returns as advantage
- GAE (Generalized Advantage Estimate) as advantage
PPO (Proximal Policy Optimization)
- extension of A3C with PPO loss function

Value-based:

SARSA
DQN (Deep Q Learning)
- boltzmann or epsilon-greedy policy
DRQN (Recurrent DQN)
Double DQN
Double DRQN
Multitask DQN (multi-environment DQN)
Hydra DQN (multi-environment DQN)

Below are the modular building blocks for the algorithms. They are designed to be general, and are reused extensively.

Memory

code: slm_lab/agent/memory

For on-policy algorithms (policy gradient):

OnPolicyReplay
OnPolicySeqReplay
OnPolicyBatchReplay
OnPolicyBatchSeqReplay

For off-policy algorithms (value-based)

Replay
SeqReplay
StackReplay
AtariReplay
PrioritizedReplay

Neural Network

code: slm_lab/agent/net

These networks are usable for all algorithms.

MLPNet (Multi Layer Perceptron)
MLPHeterogenousTails (multi-tails)
HydraMLPNet (multi-heads, multi-tails)
RecurrentNet
ConvNet

Policy

code: slm_lab/agent/algorithm/policy_util.py

different probability distributions for sampling actions
default policy
Boltzmann policy
Epsilon-greedy policy
numerous rate decay methods

Assets 2

16 May 15:37

kengz

v1.0.3

99f54b4

Atari, Dockerfile, PPO

New features and improvements

some code cleanup to prepare for the next version
DQN Atari working, not optimized yet
Dockerfile finished, ready to run lab at scale on server
implemented PPO in tensorflow from OpenAI, along with the utils

Assets 2

04 Mar 17:16

kengz

v1.0.2

6f03300

v1.0.2 Evolutionary Search

New features and improvements

add EvolutionarySearch for hyperparameter search
rewrite and simplify the underlying Ray logic
fix categorical error in a2c
improve experiment graph: wider, add opacity

Assets 2

17 Feb 02:19

kengz

v1.0.1

04e8048

v1.0.1: fitness, analysis, tune A2C and Reinforce

New features and improvements

improve fitness computation after usage
add retro analysis script, via yarn analyze <dir>
improve plotly renderings
improve CNN and RNN architectures, bring to Reinforce
fine tune A2C and Reinforce specs

Assets 2

04 Feb 23:09

kengz

v1.0.0

c4538fc

v1.0.0: First stable release with full lab features

This is the first stable release of the lab, with the core API and features finalized.

Refer to the docs:
Github Repo | Lab Documentation | Experiment Log Book

Features

All the crucial features of the lab are stable and tested:

baseline algorithms
OpenAI gym, Unity environments
modular reusable components
multi-agents, multi-environments
scalable hyperparameter search with ray
useful graphs and analytics
fitness vector for universal benchmarking of agents, environments

Baselines

The first release includes the following algorithms, with more to come later.

DQN
Double DQN
REINFORCE
- Option to add entropy to encourage exploration
Actor-Critic
- Batch or episodic training
- Shared or separate actor and critic params
- Advantage calculated using n-step returns or generalized advantage estimation
- Option to add entropy to encourage exploration

Assets 2

Releases: kengz/SLM-Lab

Remove Data Space History, Optimize Memory

Remove DataSpace history

Optimize Replay Memory RAM

Add OnPolicyAtariReplay

Misc

Uh oh!

v2.0.0 Singleton Mode, CUDA Support, Distributed Training

Singleton Mode as Default

CUDA support

Distributed Training

State Normalization

Bug Fixes and Improvements

Misc

Uh oh!

PPOSIL, fix continuous actions and PPO

Implementations

Bug Fixes

Uh oh!

add SIL, fix PG loss bug, add dueling networks

Implementations

Bug Fixes

Uh oh!

v1.1.0 full roadmap algorithms and features

Canonical Algorithms and Components

Algorithm

Memory

Neural Network

Policy

Uh oh!

Atari, Dockerfile, PPO

New features and improvements

Uh oh!

v1.0.2 Evolutionary Search

New features and improvements

Uh oh!

v1.0.1: fitness, analysis, tune A2C and Reinforce

New features and improvements

Uh oh!

v1.0.0: First stable release with full lab features

Features

Baselines

Uh oh!