[WIP]Implement llama4 HF format to DCP converter #1104

fegin · 2025-04-15T23:26:21Z

Why do we need this?
There have been a lot of asks to get the HF checkpoints work with TorchTitan. There are already workarounds for this problem. However, the converted DCP checkpoints generally results in slow loading when being used for the next training.

This PR tries to address this problem by resharding the full weights into the correct sharding that will be used for training later. So we basically perform an offline resharding first to avoid long loading time later (online resharding).

This converter also perform concurrent file loads using multiple trainers.

While this PR should perform reasonably well, the converter requires using exactly the same machines/GPUs and sharding for the conversion. The main blocker for using CPU machines to do the conversion is that we are unable to run torchtitan with CPU only machines.

An alternative is to use less machines than the training machines to do the conversion. This will work but an additional resharding will happen during the actual training loading, which may not perform well, depending on the resharding patterns.

Future extensions

Directly reading from huggingface without downloading it first (will come in the next PR).
While this converter is written for llama4, the logic can be generalized to all other models with some cutomized functions (e.g., FQN mapping).
Explore the possibility to perform the conversion with GPUs and still get the correct sharding scheme.

**Why do we need this?** There have been a lot of asks to get the HF checkpoints work with TorchTitan. There are already workarounds for this problem. However, the converted DCP checkpoints generally results in slow loading when being used for the next training. This PR tries to address this problem by resharding the full weights into the correct sharding that will be used for training later. So we basically perform an offline resharding first to avoid long loading time later (online resharding). This converter also perform concurrent file loads using multiple trainers. While this PR should perform reasonably well, the converter requires using exactly the same machines/GPUs and sharding for the conversion. The main blocker for using CPU machines to do the conversion is that we are unable to run torchtitan with CPU only machines. An alternative is to use less machines than the training machines to do the conversion. This will work but an additional resharding will happen during the actual training loading, which may not perform well, depending on the resharding patterns. **Future extensions** 1. Directly reading from huggingface without downloading it first (will come in the next PR). 2. While this converter is written for llama4, the logic can be generalized to all other models with some cutomized functions (e.g., FQN mapping). 3. Explore the possibility to perform the conversion with GPUs and still get the correct sharding scheme.

tianyu-l

I have merged something similar lol
https://github.com/pytorch/torchtitan/tree/main/torchtitan/experiments/llama4/scripts

(I realized that the REAME.md there is missing a "D")

fegin · 2025-04-16T00:37:50Z

lol, okay, do we want to keep the one in experiments or actually have the ones in the main scripts?

tianyu-l · 2025-04-16T00:42:02Z

I think we should move them to local folder, as we are having more models.
I even think we should move the current "main scripts" into the llama3 folder, lol

fegin · 2025-04-16T05:03:18Z

okay, since you already merge them, I'll make this PR to be fixing the issues. But I'll keep the description of the PR since I would like to track the converter development if we want to generalize it to other models.

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 15, 2025

fegin mentioned this pull request Apr 15, 2025

Model init with HuggingFace model #743

Open

fegin requested a review from wwwjn April 15, 2025 23:29

tianyu-l reviewed Apr 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP]Implement llama4 HF format to DCP converter #1104

[WIP]Implement llama4 HF format to DCP converter #1104

fegin commented Apr 15, 2025

tianyu-l left a comment

fegin commented Apr 16, 2025

tianyu-l commented Apr 16, 2025

fegin commented Apr 16, 2025

[WIP]Implement llama4 HF format to DCP converter #1104

Are you sure you want to change the base?

[WIP]Implement llama4 HF format to DCP converter #1104

Conversation

fegin commented Apr 15, 2025

tianyu-l left a comment

Choose a reason for hiding this comment

fegin commented Apr 16, 2025

tianyu-l commented Apr 16, 2025

fegin commented Apr 16, 2025