Make the model config override the pretrained config #170

jlamypoirier · 2025-03-06T03:54:01Z

🎯 Goal (What & Why)

Currently, a pretrained config overrides an arbitrary part of the user-specified config. This causes a lot of troubles:

We typically override the architecture parameters only, which complicates things when also want to throw in some non-architecture parameters Missing configuration when converting from HF model config json #166
Some default values are set before loading the pretrained configs and end up being wrong [bug] Inconsistent init_method_std in test_load_distributed_checkpoint_dp2 #88.
Values set in the config are silently ignored and can be confusing. Ex the actual hidden size may end up being 4096 when the config explicitly says 2048.

I suggest flipping things around so the specified model config overrides the pretrained config. This should give us the behaviour we want in most cases:

Pretrained config, no base model config: All architecture parameters are imported, and so are relevant non-architecture parameters (ex. window_size). Other non-architecture parameters take the Fast-LLM default.
Pretrained config, base model config with non-architecture parameters: Parameters explicitly specified in the base model config are taken, others are as above.
Pretrained config, base model config with architecture parameters: We probably want to enforce matching values, and raise an error for any mismatch. (This would be an improvement because right now wrong values are silently ignored.)
No pretrained config: Same as before.

🚀 Execution Plan

We can use Fast-LLM's override mechanism as in #168.
However, we'll also need to adapt the update mechanism to get the behaviour we want for nested configs.
It could also be difficult to achieve backward compatibility.

📌 Acceptance Criteria (Must-Haves for Completion)

Things should work as described above

🛠️ Project Management

Assign the project to the Fast-LLM project.
Set the Estimate field (in days) in the GitHub project.
Use the Size field to categorize the PR size (Small/Medium/Large).
Assign an owner when opening the issue.

The text was updated successfully, but these errors were encountered:

jlamypoirier added the enhancement New feature or request label Mar 6, 2025

jlamypoirier self-assigned this Mar 6, 2025

This was referenced Mar 6, 2025

[Prototype] Make the model config override the pretrained config #171

Closed

Missing configuration when converting from HF model config json #166

Open

jlamypoirier mentioned this issue Mar 26, 2025

Config update mechanism, keep track of explicitly set config parameters #205

Merged

8 tasks

jlamypoirier mentioned this issue Apr 5, 2025

Make the specified config parameters update the pretrained config #211

Merged

8 tasks

tscholak closed this as completed in #211 Apr 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the model config override the pretrained config #170

Make the model config override the pretrained config #170

jlamypoirier commented Mar 6, 2025

Make the model config override the pretrained config #170

Make the model config override the pretrained config #170

Comments

jlamypoirier commented Mar 6, 2025

🎯 Goal (What & Why)

🚀 Execution Plan

📌 Acceptance Criteria (Must-Haves for Completion)

🛠️ Project Management