Releases · sail-sg/oat · GitHub

23 Dec 05:15

lkevinzc

v0.2.4 Latest

Latest

What's Changed

chore: minor updates on logging and resource allocation by @lkevinzc in #73

Full Changelog: v0.2.3...v0.2.4

Contributors

lkevinzc

Assets 2

31 Oct 01:08

lkevinzc

v0.2.3

What's Changed

chore: update lora and add metrics by @lkevinzc in #66
Fix incorrect state indexing in PPOMultiTurnLearner critic training by @MozerWang in #67
fix micro batch training issue in DPO training by @hmhuy0 in #68
feat: add fp16 training by @lkevinzc in #70

New Contributors

@MozerWang made their first contribution in #67
@hmhuy0 made their first contribution in #68

Full Changelog: v0.2.2...v0.2.3

Contributors

lkevinzc, hmhuy0, and MozerWang

Assets 2

02 Oct 02:43

lkevinzc

v0.2.2

What's Changed

feat: support turn-level ppo for general agentic rl by @lkevinzc in #63
feat: support LoRA RL training by @lkevinzc in #64

Full Changelog: v0.2.1...v0.2.2

Contributors

lkevinzc

Assets 2

24 Aug 06:25

lkevinzc

v0.2.1

What's Changed

fix: use semantic version comparison for vLLM compatibility with 0.10.0+ by @simonucl in #60
chore: updates for online preference learning by @lkevinzc in #61
fix: truncated importance sampling to handle precision mismatch by @lkevinzc in #62

New Contributors

@simonucl made their first contribution in #60

Full Changelog: v0.2.0...v0.2.1

Contributors

lkevinzc and simonucl

Assets 2

24 Jul 15:00

lkevinzc

v0.2.0

What's Changed

Fix tensor slicing in SFTLearner when batch_size=1 by @longxudou in #57
feat: refactor SFT to support multi turn chat data by @lkevinzc in #59

New Contributors

@longxudou made their first contribution in #57

Full Changelog: v0.1.4...v0.2.0

Contributors

longxudou and lkevinzc

Assets 2

09 Jul 02:44

lkevinzc

v0.1.4

What's Changed

fix broken examples by @lkevinzc in #44
feat: add math rl examples and data by @lkevinzc in #45
Supporting TP for vLLM, and distributed training by @ufotalent in #46
chore: update docs and python version by @lkevinzc in #47
Fix: Precisely Remove BOS Token Prefix from Prompts by @cameron-chen in #48
fix: resolve logger warnings by @emmanuel-ferdman in #51
Support offloading activations by @ufotalent in #50
chore: properly set port number for single-host training by @lkevinzc in #52
fix: python version, collector, clean math rl codes by @lkevinzc in #54
feat: reduce vram footprint by @lkevinzc in #56

New Contributors

@ufotalent made their first contribution in #46
@emmanuel-ferdman made their first contribution in #51

Full Changelog: v0.1.2...v0.1.4

Contributors

ufotalent, emmanuel-ferdman, and 2 other contributors

Assets 2

28 Jun 12:18

lkevinzc

v0.1.3.post2

What's Changed

fix broken examples by @lkevinzc in #44
feat: add math rl examples and data by @lkevinzc in #45
Supporting TP for vLLM, and distributed training by @ufotalent in #46
chore: update docs and python version by @lkevinzc in #47
Fix: Precisely Remove BOS Token Prefix from Prompts by @cameron-chen in #48
fix: resolve logger warnings by @emmanuel-ferdman in #51
Support offloading activations by @ufotalent in #50
chore: properly set port number for single-host training by @lkevinzc in #52
fix: python version, collector, clean math rl codes by @lkevinzc in #54

New Contributors

@ufotalent made their first contribution in #46
@emmanuel-ferdman made their first contribution in #51

Full Changelog: v0.1.2...v0.1.3.post2

Contributors

ufotalent, emmanuel-ferdman, and 2 other contributors

Assets 2

06 May 08:17

lkevinzc

v0.1.2

What's Changed

Minor refactor by @lkevinzc in #39

Full Changelog: v0.1.0...v0.1.2

Contributors

lkevinzc

Assets 2

18 Apr 03:34

lkevinzc

v0.1.0

What's Changed

Changes for Dr. GRPO by @lkevinzc in #35
Improve logging by @lkevinzc in #37
Upgrade to vllm V1 (0.8.4) and use actor api init() by @lkevinzc in #38

Full Changelog: v0.0.9...v0.1.0

Contributors

lkevinzc

Assets 2

21 Mar 09:42

lkevinzc

v0.0.9

What's Changed

add grpo's critic estimation by @lkevinzc in #26
minor fix for offline sft by @lkevinzc in #27
Use a toy task to test R1-zero like training behaviors by @lkevinzc in #28
Update README.md by @lkevinzc in #29
chore: update deepspeed.py by @eltociear in #30
add sft script by @lkevinzc in #32
Fix the wrong batch indices for computing ppo advantages by @qlan3 in #33
Upgrade vllm for more efficient collocation by @lkevinzc in #34

New Contributors

@eltociear made their first contribution in #30
@qlan3 made their first contribution in #33

Full Changelog: v0.0.6...v0.0.9

Contributors

eltociear, lkevinzc, and qlan3

Assets 2