Behavioral Cloning using Seq-2-seq models

After learning about probabilistic and neural approaches to sequential machine learning I decided to apply what I had learned to the Imitation Learning (IL) domain. Since many reinforcement learning tasks can be models as Markov Decision Processes (MDPs) that have a sequential nature, I thought that it would be interesting to see in we could using neural networks and Conditional Random Fields (CRFs) to do Behavioral Cloning (BC). I decided to try solving several of the classic control tasks (mountain car, acrobot, etc) using a CRF and a bidirectional Long Short Term Memory (LSTM). I then compared my results with a respected method, Generative Adversarial Imitation Learning (GAIL), and a baseline (Logistic Regression).

For the LSTM I choose bi-directional because it more accurately captured the relationship between the action and the next state than a uni-directional LSTM could. While this violates the Markovian assumption of an MDP it proved to work well. Another problem with the LSTM was that when using the model in the actually environment it could not accurately predict the action at the begin of an episode. I was able to solve this problem by taking the train data and augmenting it. Each demonstration was turned into a distinct number of demonstration equal to the length of the original demonstration. In doing this the LSTM was able to learn the starting behavior better. FOr example if my original training data had been a single demonstration of length 100, my augmented training data would be 100 demonstrations of length 1, 2, ..., 100.

For the CRF I used a simple linear chain CRF and then added a dependency on the previous state. This again violated the Markovian assumption, but worked very well. The CRF was able to achieve performance close to that of GAIL and the LSTM and took a fraction of the time to train.

A more in depth discussion of the results can be found in the paper and presentation. These write ups, as well as the code can be found of GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Colab Notebooks		Colab Notebooks
__pycache__		__pycache__
bin		bin
crf_models		crf_models
data		data
images		images
lstm_models/non_adversarial		lstm_models/non_adversarial
notebooks		notebooks
paper		paper
presentation		presentation
serialized_models		serialized_models
src		src
videos		videos
README.md		README.md
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Behavioral Cloning using Seq-2-seq models

About

Uh oh!

Releases

Packages

Languages

Uh oh!

Uh oh!

brendanjcrowe/BC-using-seq-2-seq-models

Folders and files

Latest commit

History

Repository files navigation

Behavioral Cloning using Seq-2-seq models

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages