Pessimistic Value Iteration for Multi-Task Data Sharing

This repo contains a PyTorch implementation and the datasets for our paper titled "Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning" published at Artificial Intelligence Journal. This is the paper Link.

Datasets

We collect a Multi-Task Offline Dataset based on DeepMind Control Suite (DMC).

Download the Dataset to ./collect before you start training.
The users can collect new datasets based on collect_daeta.py. The supported tasks include standard tasks from DMC and custom tasks from ./custom_dmc_tasks/

Our dataset contains 3 domains with 4 tasks per domain, resulting in 12 tasks in total.

Domain	Available task names
Walker	`walker_stand`, `walker_walk`, `walker_run`, `walker_flip`
Quadruped	`quadruped_jump`, `quadruped_roll_fast`
Jaco Arm	`jaco_reach_top_left`, `jaco_reach_top_right`, `jaco_reach_bottom_left`, `jaco_reach_bottom_right`

For each task, we run TD3 to collect five types of datasets, including:

random data generated by a random agent.
medium data generated by a medium-level TD3 agent.
medium-replay data that collects all experiences in training a medium-level TD3 agent.
medium-expert data that collects all experiences in training an expert-level TD3 agent.
expert data generated by an expert-level TD3 agent.

Prerequisites

Install MuJoCo:

Download MuJoCo binaries here.
Unzip the downloaded archive into ~/.mujoco/.
Append the MuJoCo subdirectory bin path into the env variable LD_LIBRARY_PATH.

Install the following libraries:

sudo apt update
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3 unzip

Install dependencies:

conda env create -f conda_env.yml
conda activate utds

Algorithms

We provide several algorithms to train the single-agent and multi-task data-sharing agent.

For single-agent training, we provide the following algorithms.

Algorithm	Name	Paper
Behavior Cloning	`bc`	paper
CQL	`cql`	paper
TD3-BC	`td3_bc`	paper
CRR	`crr`	paper
PBRL	`ddpg`	paper

For multi-task data sharing, we support the following algorithms.

Algorithm	Name	Paper
Direct Sharing	`cql`	paper
CDS	`cql_cds`	paper
Unlabeled-CDS	`cql_cdsz`	paper
UTDS	`pbrl`	our paper

Training

Train CDS

Train the CDS agent in quadruped_jump (random) task with data sharing from quadruped_roll_fast (replay) dataset, run

python train_offline_cds.py task=quadruped_jump "+share_task=[quadruped_jump, quadruped_roll_fast]" "+data_type=[random, replay]"

Train UTDS

Train the CDS agent in quadruped_jump (random) task with data sharing from quadruped_roll_fast (replay) dataset, run

python train_offline_share.py task=quadruped_jump "+share_task=[quadruped_jump, quadruped_roll_fast]" "+data_type=[random, replay]"

We support wandb by setting wandb: True in config*.yaml file.

Citation

@article{UTDS2023,
title = {Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning},
journal = {Artificial Intelligence},
author = {Chenjia Bai and Lingxiao Wang and Jianye Hao and Zhuoran Yang and Bin Zhao and Zhen Wang and Xuelong Li},
pages = {104048},
year = {2023},
issn = {0004-3702},
doi = {https://doi.org/10.1016/j.artint.2023.104048},
url = {https://www.sciencedirect.com/science/article/pii/S0004370223001947},
}

License

MIT license

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
agent		agent
custom_dmc_tasks		custom_dmc_tasks
LICENSE		LICENSE
README.md		README.md
collect_data.py		collect_data.py
collect_data.yaml		collect_data.yaml
conda_env.yml		conda_env.yml
config.yaml		config.yaml
config_cds.yaml		config_cds.yaml
config_single.yaml		config_single.yaml
dmc.py		dmc.py
logger.py		logger.py
replay_buffer.py		replay_buffer.py
replay_buffer_collect.py		replay_buffer_collect.py
task.json		task.json
train_offline_cds.py		train_offline_cds.py
train_offline_share.py		train_offline_share.py
train_offline_single.py		train_offline_single.py
utils.py		utils.py
video.py		video.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pessimistic Value Iteration for Multi-Task Data Sharing

Datasets

Prerequisites

Algorithms

Training

Train CDS

Train UTDS

Citation

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Baichenjia/UTDS

Folders and files

Latest commit

History

Repository files navigation

Pessimistic Value Iteration for Multi-Task Data Sharing

Datasets

Prerequisites

Algorithms

Training

Train CDS

Train UTDS

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages