Releases: sail-sg/oat
Releases · sail-sg/oat
v0.2.4
v0.2.3
What's Changed
- chore: update lora and add metrics by @lkevinzc in #66
- Fix incorrect state indexing in PPOMultiTurnLearner critic training by @MozerWang in #67
- fix micro batch training issue in DPO training by @hmhuy0 in #68
- feat: add fp16 training by @lkevinzc in #70
New Contributors
- @MozerWang made their first contribution in #67
- @hmhuy0 made their first contribution in #68
Full Changelog: v0.2.2...v0.2.3
v0.2.2
v0.2.1
What's Changed
- fix: use semantic version comparison for vLLM compatibility with 0.10.0+ by @simonucl in #60
- chore: updates for online preference learning by @lkevinzc in #61
- fix: truncated importance sampling to handle precision mismatch by @lkevinzc in #62
New Contributors
Full Changelog: v0.2.0...v0.2.1
v0.2.0
What's Changed
- Fix tensor slicing in SFTLearner when batch_size=1 by @longxudou in #57
- feat: refactor SFT to support multi turn chat data by @lkevinzc in #59
New Contributors
- @longxudou made their first contribution in #57
Full Changelog: v0.1.4...v0.2.0
v0.1.4
What's Changed
- fix broken examples by @lkevinzc in #44
- feat: add math rl examples and data by @lkevinzc in #45
- Supporting TP for vLLM, and distributed training by @ufotalent in #46
- chore: update docs and python version by @lkevinzc in #47
- Fix: Precisely Remove BOS Token Prefix from Prompts by @cameron-chen in #48
- fix: resolve logger warnings by @emmanuel-ferdman in #51
- Support offloading activations by @ufotalent in #50
- chore: properly set port number for single-host training by @lkevinzc in #52
- fix: python version, collector, clean math rl codes by @lkevinzc in #54
- feat: reduce vram footprint by @lkevinzc in #56
New Contributors
- @ufotalent made their first contribution in #46
- @emmanuel-ferdman made their first contribution in #51
Full Changelog: v0.1.2...v0.1.4
v0.1.3.post2
What's Changed
- fix broken examples by @lkevinzc in #44
- feat: add math rl examples and data by @lkevinzc in #45
- Supporting TP for vLLM, and distributed training by @ufotalent in #46
- chore: update docs and python version by @lkevinzc in #47
- Fix: Precisely Remove BOS Token Prefix from Prompts by @cameron-chen in #48
- fix: resolve logger warnings by @emmanuel-ferdman in #51
- Support offloading activations by @ufotalent in #50
- chore: properly set port number for single-host training by @lkevinzc in #52
- fix: python version, collector, clean math rl codes by @lkevinzc in #54
New Contributors
- @ufotalent made their first contribution in #46
- @emmanuel-ferdman made their first contribution in #51
Full Changelog: v0.1.2...v0.1.3.post2
v0.1.2
v0.1.0
v0.0.9
What's Changed
- add grpo's critic estimation by @lkevinzc in #26
- minor fix for offline sft by @lkevinzc in #27
- Use a toy task to test R1-zero like training behaviors by @lkevinzc in #28
- Update README.md by @lkevinzc in #29
- chore: update deepspeed.py by @eltociear in #30
- add sft script by @lkevinzc in #32
- Fix the wrong batch indices for computing ppo advantages by @qlan3 in #33
- Upgrade vllm for more efficient collocation by @lkevinzc in #34
New Contributors
- @eltociear made their first contribution in #30
- @qlan3 made their first contribution in #33
Full Changelog: v0.0.6...v0.0.9