What's New
Features
-
Pretraining Data Processing API (#672)
- Added new API for processing pretraining-style datasets
- Documents are now chunked by configurable
block_size - Chunks are treated as independent, fully-unmasked samples
- Updated training loop to ingest pretraining-style datasets
- Includes comprehensive test coverage (
test_pretraining_data_process.py,test_pretraining_mode.py,test_pretraining_sampler.py)
-
AdamW Optimizer Configuration (#674)
- Exposed
weight_decay,betas, andepsparameters in TrainingArgs - Users can now tune AdamW hyperparameters through
run_training()API - Provides more control over optimizer behavior
- Exposed
-
Granite 4 Model Support (#669)
- Added support for Granite 4 models as Mixture of Experts (MoE) models in training
Bug Fixes
-
Process Timing Fix (#675)
- Fixed race condition where process wasn't completed by the time it was read
-
Variable Access Fix (#668)
- Fixed invalid variable access bug
Dependencies
- Build Dependency Update (#670)
- Updated hynek build dependency
Files Changed
17 files changed with 1,642 insertions and 52 deletions:
- Core training modules:
data_process.py,main_ds.py,sampler.py,model.py,config.py - New test suites for pretraining functionality
- Updated README with new capabilities
Full Changelog
All Changes:
- 574f946 Exposes API for processing pretraining data (#672)
- 638a753 fixes bug where process isn't completed by the time the process gets read (#675)
- c495035 Expose AdamW optimizer parameters in training API (#674)
- 3d05302 Handle granite 4 as MoE models in training (#669)
- 781c36f fixes stray invalid variable access bug (#668)
- 529c2f7 bumps hynek build dep (#670)
Full Diff: v0.12.1...v0.13.0