TinyZero

TinyZero is a reproduction of DeepSeek R1 Zero in countdown and multiplication tasks. We built upon veRL.

Through RL, the 3B base LM develops self-verification and search abilities all on its own

You can experience the Ahah moment yourself for < $30

Twitter thread: https://x.com/jiayi_pirate/status/1882839370505621655

Full experiment log: https://wandb.ai/jiayipan/TinyZero

Paper's on it's way!

Installation

conda create -n zero python=3.9
# install torch [or you can skip this step and let vllm to install the correct version for you]
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
# install vllm
pip3 install vllm==0.6.3 # or you can install 0.5.4, 0.4.2 and 0.3.1
pip3 install ray

# verl
pip install -e .

# flash attention 2
pip3 install flash-attn --no-build-isolation
# quality of life
pip install wandb IPython matplotlib

Countdown task

Data Preparation

conda activate zero
python ./examples/data_preprocess/countdown.py --local_dir {path_to_your_dataset}

Run Training

conda activate zero

For the following code, if you see Out-of-vram, try add critic.model.enable_gradient_checkpointing=True to the script, and checkout the discussion here

Single GPU

Works for model <= 1.5B. For Qwen2.5-0.5B base, we know it fails to learn reasoning.

export N_GPUS=1
export BASE_MODEL={path_to_your_model}
export DATA_DIR={path_to_your_dataset}
export ROLLOUT_TP_SIZE=1
export EXPERIMENT_NAME=countdown-qwen2.5-0.5b
export VLLM_ATTENTION_BACKEND=XFORMERS

bash ./scripts/train_tiny_zero.sh

3B+ model In this case, the base model is able to develop sophisticated reasoning skills.

export N_GPUS=2
export BASE_MODEL={path_to_your_model}
export DATA_DIR={path_to_your_dataset}
export ROLLOUT_TP_SIZE=2
export EXPERIMENT_NAME=countdown-qwen2.5-3b
export VLLM_ATTENTION_BACKEND=XFORMERS

bash ./scripts/train_tiny_zero.sh

Instruct Ablation

We experiment with QWen-2.5-3B Instruct too. Data Preparation To follow chat template, we need to reprocess the data:

conda activate zero
python examples/data_preprocess/countdown.py --template_type=qwen-instruct --local_dir={path_to_your_dataset}

Training

export N_GPUS=2
export BASE_MODEL={path_to_your_model}
export DATA_DIR={path_to_your_dataset}
export ROLLOUT_TP_SIZE=2
export EXPERIMENT_NAME=countdown-qwen2.5-3b-instruct
export VLLM_ATTENTION_BACKEND=XFORMERS

bash ./scripts/train_tiny_zero.sh

Acknowledge

We run our experiments based on veRL.
We use Qwen2.5 series base model Qwen2.5.

Citation

@misc{tinyzero,
author       = {Jiayi Pan and Junjie Zhang and Xingyao Wang and Lifan Yuan and Hao Peng and Alane Suhr},
title        = {TinyZero},
howpublished = {https://github.com/Jiayi-Pan/TinyZero},
note         = {Accessed: 2025-01-24},
year         = {2025}
}

Name	Name	Last commit message	Last commit date
Latest commit JerryWu-code Merge branch 'Jiayi-Pan:main' into main Feb 1, 2025 1a5dbd5 · Feb 1, 2025 History 157 Commits
.github/workflows	.github/workflows	[perf] feat: Support dynamic batch size (#118)	Jan 21, 2025
docker	docker	[docker] megatron: add TE to ngc dockerfile (Jiayi-Pan#88 )	Jan 9, 2025
docs	docs	[misc] fix: fix license (#110)	Jan 16, 2025
examples	examples	fix instruct train	Jan 24, 2025
patches	patches	[init] feat: upload first open source version of verl	Oct 31, 2024
scripts	scripts	Add grpo ans rebase for A100.	Jan 28, 2025
tests	tests	[algo] feat: support GRPO algorithm (#124)	Jan 23, 2025
verl	verl	fix instruct train	Jan 24, 2025
.gitignore	.gitignore	Adapt to own scripts and config for A100.	Jan 27, 2025
.readthedocs.yaml	.readthedocs.yaml	[init] feat: upload first open source version of verl	Oct 31, 2024
.style.yapf	.style.yapf	[init] feat: upload first open source version of verl	Oct 31, 2024
LICENSE	LICENSE	[init] feat: upload first open source version of verl	Oct 31, 2024
Notice.txt	Notice.txt	[init] feat: upload first open source version of verl	Oct 31, 2024
OLD_README.md	OLD_README.md	add readme	Jan 21, 2025
README.md	README.md	Update README.md	Feb 1, 2025
cover.png	cover.png	Add files via upload	Jan 24, 2025
pyproject.toml	pyproject.toml	[misc] feat: spport rmpad/data-packing in FSDP with transformers (Jia…	Jan 11, 2025
requirements.txt	requirements.txt	[example] docs: add getting started notebook with free GPUs from ligh…	Jan 11, 2025
setup.py	setup.py	[install] chore: add pyproject.toml. make vllm default dependency (Ji…	Dec 23, 2024
train_0.5b_ppo.sh	train_0.5b_ppo.sh	Modify config for 0.5B due to cua asyn bug.	Jan 28, 2025
train_1.5b_ppo.sh	train_1.5b_ppo.sh	Add grpo ans rebase for A100.	Jan 28, 2025
train_3b_grpo.sh	train_3b_grpo.sh	Add grpo ans rebase for A100.	Jan 28, 2025
train_3b_instruct_ppo.sh	train_3b_instruct_ppo.sh	Add grpo ans rebase for A100.	Jan 28, 2025
train_3b_ppo.sh	train_3b_ppo.sh	Add grpo ans rebase for A100.	Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TinyZero

Installation

Countdown task

Run Training

Instruct Ablation

Acknowledge

Citation

About

Releases

Packages

Languages

License

JerryWu-code/TinyZero

Folders and files

Latest commit

History

Repository files navigation

TinyZero

Installation

Countdown task

Run Training

Instruct Ablation

Acknowledge

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages