This repo contains a PyTorch implementation and the datasets for our paper titled "Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning" published at Artificial Intelligence Journal. This is the paper Link.
We collect a Multi-Task Offline Dataset based on DeepMind Control Suite (DMC).
- Download the Dataset to
./collectbefore you start training. - The users can collect new datasets based on
collect_daeta.py. The supported tasks include standard tasks from DMC and custom tasks from./custom_dmc_tasks/
Our dataset contains 3 domains with 4 tasks per domain, resulting in 12 tasks in total.
| Domain | Available task names |
|---|---|
| Walker | walker_stand, walker_walk, walker_run, walker_flip |
| Quadruped | quadruped_jump, quadruped_roll_fast |
| Jaco Arm | jaco_reach_top_left, jaco_reach_top_right, jaco_reach_bottom_left, jaco_reach_bottom_right |
For each task, we run TD3 to collect five types of datasets, including:
randomdata generated by a random agent.mediumdata generated by a medium-level TD3 agent.medium-replaydata that collects all experiences in training a medium-level TD3 agent.medium-expertdata that collects all experiences in training an expert-level TD3 agent.expertdata generated by an expert-level TD3 agent.
Install MuJoCo:
- Download MuJoCo binaries here.
- Unzip the downloaded archive into
~/.mujoco/. - Append the MuJoCo subdirectory bin path into the env variable
LD_LIBRARY_PATH.
Install the following libraries:
sudo apt update
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3 unzipInstall dependencies:
conda env create -f conda_env.yml
conda activate utdsWe provide several algorithms to train the single-agent and multi-task data-sharing agent.
- For single-agent training, we provide the following algorithms.
| Algorithm | Name | Paper |
|---|---|---|
| Behavior Cloning | bc |
paper |
| CQL | cql |
paper |
| TD3-BC | td3_bc |
paper |
| CRR | crr |
paper |
| PBRL | ddpg |
paper |
- For multi-task data sharing, we support the following algorithms.
| Algorithm | Name | Paper |
|---|---|---|
| Direct Sharing | cql |
paper |
| CDS | cql_cds |
paper |
| Unlabeled-CDS | cql_cdsz |
paper |
| UTDS | pbrl |
our paper |
Train the CDS agent in quadruped_jump (random) task with data sharing from quadruped_roll_fast (replay) dataset, run
python train_offline_cds.py task=quadruped_jump "+share_task=[quadruped_jump, quadruped_roll_fast]" "+data_type=[random, replay]"
Train the CDS agent in quadruped_jump (random) task with data sharing from quadruped_roll_fast (replay) dataset, run
python train_offline_share.py task=quadruped_jump "+share_task=[quadruped_jump, quadruped_roll_fast]" "+data_type=[random, replay]"
We support wandb by setting wandb: True in config*.yaml file.
@article{UTDS2023,
title = {Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning},
journal = {Artificial Intelligence},
author = {Chenjia Bai and Lingxiao Wang and Jianye Hao and Zhuoran Yang and Bin Zhao and Zhen Wang and Xuelong Li},
pages = {104048},
year = {2023},
issn = {0004-3702},
doi = {https://doi.org/10.1016/j.artint.2023.104048},
url = {https://www.sciencedirect.com/science/article/pii/S0004370223001947},
}
MIT license
