Code for "Principled Exploration via Optimistic Bootstrapping and Backward Induction"

Prerequisites

Tensorflow-gpu > 1.13 with eager execution, or tensorflow 2.x
Tensorflow-probability 0.6.0
OpenAI baselines
OpenAI Gym

Implemented Algorithms

Basic Algorithm

Bootstrapped EBU (The basic algorithm of BEBU-UCB, BEBU-IDS and OB2I)
Bootstrapped DQN
EBU (we provide a clean implementation at https://github.com/Baichenjia/EBU)

Action Selection

Bonus

UCB-Bonus

Others

Randomized Prior Function

Usage

Run OB2I

The following command should train an agent on "Breakout".

python run_atari.py --env BreakoutNoFrameskip-v4 --reward-type ucb --ebu

Run Other Baselines

The following commands should train an agent on "Breakout" with other baselines.

Bootstrapped EBU (BEBU)

python run_atari.py --env BreakoutNoFrameskip-v4 --ebu

Bootstrapped EBU + UCB action-selection (BEBU-UCB)

python run_atari.py --env BreakoutNoFrameskip-v4 --action-selection ucb --ebu

Bootstrapped EBU + IDS action-selection (BEBU-IDS)

python run_atari.py --env BreakoutNoFrameskip-v4 --action-selection ids --ebu

Bootstrapped EBU + Ensemble Vote

(vote is used for evaluation)

python run_atari.py --env BreakoutNoFrameskip-v4 --action-selection vote --ebu

Bootstrapped DQN

python run_atari.py --env BreakoutNoFrameskip-v4

Bootstrapped DQN + Ensemble Vote

python run_atari.py --env BreakoutNoFrameskip-v4 --action-selection vote

Bootstrapped DQN + UCB action-selection

python run_atari.py --env BreakoutNoFrameskip-v4 --action-selection ucb

Bootstrapped DQN + IDS action-selection

python run_atari.py --env BreakoutNoFrameskip-v4 --action-selection ids

Randomized Prior Function

Any method can combine with the Randomized Prior Function by using --prior flag.

For example, run Bootstrapped DQN + Randomized Prior Function as

python run_atari.py --env BreakoutNoFrameskip-v4 --prior

Structure Overview

deepq.py contains stepping the environment, storing experience and saving models.
deepq_learner.py contains action-selection methods, bonus, bootstrapped DQN/EBU training.
replay_buffer.py contains two class of replay buffer for BDQN and BEBU, respectively. The memory consumption has been highly optimized.
models.py contains Q-network, Bootstrapped Q-network with multiple heads, Bootstrapped Q-network with Randomized Prior Function.
run_atari.py contains hyper-parameters setting. Run this file will start training.

Execution

The data for separate runs is stored on disk under the result directory with filename <env-id>-<algorithm>-<date>-<time>. Each run directory contains

log.txt Record the episode, exploration rate, episodic rewards in training (after normalization and used for training), episodic scores (raw scores), current timesteps, percentage completed.
monitor.csv Env monitor file by using logger from Openai Baselines.
parameters.txt All hyper-parameters used in training.
progress.csv Same data as log.txt but with csv format.
evaluate scores.txt Evaluation of policy for 108000 frames every 1e5 training steps with 30 no-op evaluation.
model_10M.h5, model_20M.h5, model_best_10M.h5, model_best_20M.h5 are the policy files saved.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code for "Principled Exploration via Optimistic Bootstrapping and Backward Induction"

Prerequisites

Implemented Algorithms

Basic Algorithm

Action Selection

Bonus

Others

Usage

Run OB2I

Run Other Baselines

Bootstrapped EBU (BEBU)

Bootstrapped EBU + UCB action-selection (BEBU-UCB)

Bootstrapped EBU + IDS action-selection (BEBU-IDS)

Bootstrapped EBU + Ensemble Vote

Bootstrapped DQN

Bootstrapped DQN + Ensemble Vote

Bootstrapped DQN + UCB action-selection

Bootstrapped DQN + IDS action-selection

Randomized Prior Function

Structure Overview

Execution

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
__init__.py		__init__.py
deepq.py		deepq.py
deepq_learner.py		deepq_learner.py
models.py		models.py
replay_buffer.py		replay_buffer.py
run_atari.py		run_atari.py

Baichenjia/OB2I

Folders and files

Latest commit

History

Repository files navigation

Code for "Principled Exploration via Optimistic Bootstrapping and Backward Induction"

Prerequisites

Implemented Algorithms

Basic Algorithm

Action Selection

Bonus

Others

Usage

Run OB2I

Run Other Baselines

Bootstrapped EBU (BEBU)

Bootstrapped EBU + UCB action-selection (BEBU-UCB)

Bootstrapped EBU + IDS action-selection (BEBU-IDS)

Bootstrapped EBU + Ensemble Vote

Bootstrapped DQN

Bootstrapped DQN + Ensemble Vote

Bootstrapped DQN + UCB action-selection

Bootstrapped DQN + IDS action-selection

Randomized Prior Function

Structure Overview

Execution

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages