Skip to content

v0.13.0 - Pretraining Support & Optimizer Configuration

Latest

Choose a tag to compare

@RobotSail RobotSail released this 08 Jan 19:48
· 4 commits to main since this release
574f946

What's New

Features

  • Pretraining Data Processing API (#672)

    • Added new API for processing pretraining-style datasets
    • Documents are now chunked by configurable block_size
    • Chunks are treated as independent, fully-unmasked samples
    • Updated training loop to ingest pretraining-style datasets
    • Includes comprehensive test coverage (test_pretraining_data_process.py, test_pretraining_mode.py, test_pretraining_sampler.py)
  • AdamW Optimizer Configuration (#674)

    • Exposed weight_decay, betas, and eps parameters in TrainingArgs
    • Users can now tune AdamW hyperparameters through run_training() API
    • Provides more control over optimizer behavior
  • Granite 4 Model Support (#669)

    • Added support for Granite 4 models as Mixture of Experts (MoE) models in training

Bug Fixes

  • Process Timing Fix (#675)

    • Fixed race condition where process wasn't completed by the time it was read
  • Variable Access Fix (#668)

    • Fixed invalid variable access bug

Dependencies

  • Build Dependency Update (#670)
    • Updated hynek build dependency

Files Changed

17 files changed with 1,642 insertions and 52 deletions:

  • Core training modules: data_process.py, main_ds.py, sampler.py, model.py, config.py
  • New test suites for pretraining functionality
  • Updated README with new capabilities

Full Changelog

All Changes:

  • 574f946 Exposes API for processing pretraining data (#672)
  • 638a753 fixes bug where process isn't completed by the time the process gets read (#675)
  • c495035 Expose AdamW optimizer parameters in training API (#674)
  • 3d05302 Handle granite 4 as MoE models in training (#669)
  • 781c36f fixes stray invalid variable access bug (#668)
  • 529c2f7 bumps hynek build dep (#670)

Full Diff: v0.12.1...v0.13.0