The official code for Efficient and Stable Offline-to-online Reinforcement Learning via Continual Policy Revitalization, (IJCAI'24).
The training dependencies could be installed by the following command with conda. Notice that since we provide the same package in our training, it might be possible that the installed version of CUDA is not compatible with your GPU. In that case, you can mannually reinstall pytorch only.
conda env create -f environment.ymlTo install the D4RL benchmark, try the following command
git clone https://github.com/Farama-Foundation/D4RL.git
cd d4rl
pip install -e .If you do not want to use wandb for tracking, you can run the following command in your terminal
wandb offlineOtherwise, you can fill the wandb account setting in scripts/config.sh
export PYTHONPATH=".":$PYTHONPATH
wandb_online="False"
entity=""
if [ ${wandb_online} == "True" ]; then
export WANDB_API_KEY=""
export WANDB_MODE="online"
else
wandb offline
fiRun the following script to finish the offline experiments
bash ./script/run_td3bc_offline.sh tasktask quality namename seed --device $device_idValue for the arguments
- task: halfcheetah, hopper, walker2d, all
- quality: medium, medium-replay, medium-expert, random
- name: original(paper args), corl(CORL args, recommended)
- seed: random seed
- device_id: cuda device ID
One example command is
bash ./script/run_td3bc_offline.sh halfcheetah medium corl 0 --device "cuda:0"Notice: Online training is only possible after the corresponding offline training checkpoint is produced.
Run the following script to reproduce online experiments
bash ./script/run_cpr_online.sh tasktask quality original seed−−deviceseed --device device_idValue for the arguments
- task: halfcheetah, hopper, walker2d, all
- quality: medium, medium-replay, medium-expert, random
- seed: random seed
- device_id: cuda device ID
One example command is
bash ./script/run_cpr_online.sh halfcheetah medium original 0 --device "cuda:0"The logs and models are stored in "./out" folder.
tensorboard --logdir="./out"We thank the following repos for the help:
- OfflineRL-Lib provides the framework and implementation of most baselines.
- CORL provides finetuned hyper-parameters.
If you find this work useful for your research, you can cite with the following bib:
@inproceedings{
cpr,
title={Efficient and Stable Offline-to-online Reinforcement Learning via Continual Policy Revitalization},
author={Rui Kong, Chenyang Wu, Chen-Xiao Gao, Zongzhang Zhang and Ming Li},
booktitle={Proceedings of the Thirty-Third International Joint Conference on
Artificial Intelligence, {IJCAI} 2024},
year={2024},
}