Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support frozen weights #185

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Support frozen weights #185

wants to merge 16 commits into from

Conversation

jlamypoirier
Copy link
Collaborator

@jlamypoirier jlamypoirier commented Mar 11, 2025

✨ Description

Fix #183

  • Extract the FSDP stuff from Stage so we can have a separate one for frozen weights. (811739a)
  • Separate shards so they can have different sizes (i.e. no grad and optimizer shards for frozen parameters). Shards now stored as a dict {shard_name:shard} instead of a single tensor of shape (num_shards, shard_size). (b9b017f)
  • Remove unnecessary buffers and shards for frozen weights by setting their size to zero. (b9b017f)
  • Add test for frozen weights and make it pass. (e878656)
  • Train a small model (no frozen weights) to check for regressions.
  • Try loading an older checkpoint in distributed format to verify backward compatibility.
$ fast-llm train gpt
[...]
2025-03-18 03:02:53,421 >>> Allocating 14 weight buffers (692.54 MiB)
2025-03-18 03:02:53,952 >>> Allocating 14 grad buffers (692.54 MiB)
2025-03-18 03:02:53,952 >>> Allocating 4 shards (2,770.14 MiB)
2025-03-18 03:02:53,953 Total allocated: 4,155.21 MiB
[...]
$ fast-llm train gpt model.base_model.transformer.mlp_lr_scale=[0]
[...]
2025-03-18 03:01:30,496 >>> Allocating 14 weight buffers (692.54 MiB)
2025-03-18 03:01:31,326 >>> Allocating 14 grad buffers (308.30 MiB)
2025-03-18 03:01:31,327 >>> Allocating 4 shards (1,617.44 MiB)
2025-03-18 03:01:31,327 Total allocated: 2,618.27 MiB
[...]

🔍 Type of change

Select all that apply:

  • 🐛 Bug fix (non-breaking change that addresses a specific issue)
  • 🚀 New feature (non-breaking change that adds functionality)
  • ⚠️ Breaking change (a change that could affect existing functionality)
  • 📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
  • 🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
  • 📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
  • 📝 Documentation change (updates documentation, including new content or typo fixes)
  • 🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

@jlamypoirier jlamypoirier marked this pull request as ready for review March 18, 2025 22:20
@jlamypoirier jlamypoirier requested a review from tscholak March 18, 2025 22:22
@jlamypoirier
Copy link
Collaborator Author

@tscholak PR is ready for review, feel free to volunteer another lucky reviewer. I guess it's too big for a full review, general structure should be fine.
I still have one-off tests to do, but they aren't strictly part of the PR so can be done in parallel with the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support frozen weights
1 participant