Missing configuration when converting from HF model config json #166

tscholak · 2025-02-27T18:21:57Z

🐞 Describe the Bug

When converting models config options not included in the architecture config are not imported from the Hugging Face model's config.json.

This creates an unexpected and undocumented requirement for manual configuration, which can lead to costly mistakes.

The following critical options are affected:

window_size and max_windows_layers for models trained with windowed attention (e.g. Qwen 2), see Qwen2 converter #163.
router_aux_loss_coef for MoEs such as Mixtral.

Currently, the load-from-HF-model feature suggests seamless integration, but this bug prevents complete and accurate model conversion. Users are likely to assume the conversion will "just work" and may unknowingly train models with incorrect configurations.

🔄 Steps to Reproduce

Load a HF model using Fast-LLM:
Use a HF model that requires non-architecture-specific parameters (e.g., window_size for sliding window attention).
Observe missing configurations:
Check the output model configuration. Notice that parameters not included in the architecture config are missing or set to default values, potentially breaking the model.

🎯 Expected Behavior

Fast-LLM should correctly import all relevant configuration options from the Hugging Face config.json, not just those in the architecture configuration. This ensures that models are fully converted and behave as expected, without requiring manual intervention or hidden knowledge about configuration quirks.

The text was updated successfully, but these errors were encountered:

bigximik · 2025-03-04T16:00:13Z

I have found that when importing metadata, first, configuration is imported as architecture-only:
https://github.com/ServiceNow/Fast-LLM/blob/main/fast_llm/engine/checkpoint/external.py#L206.

However, later, it is reassigned as the full configuration class if the child class sets it that way (and converter handlers set it this way):
https://github.com/ServiceNow/Fast-LLM/blob/main/fast_llm/engine/checkpoint/external.py#L211

I am not sure why this approach is used. I have tried importing config as a full class, and the checkpoint tests pass, but I am unsure if this could trigger any side effects.

Perhaps we could check the type of cls._model_class and decide whether to import the configuration as architecture-only or not?

@jlamypoirier what do you think?

jlamypoirier · 2025-03-06T01:30:34Z

This is not a bug, Fast-LLM is designed that way. The architecture/non-architecture split is good enough for most situations, but to get the behaviour described here we need to rethink this principle. Forcing a full import of the model config isn't really a good option because it would prevent setting any other model parameter. (Also it would need major changes to the conversion mechanism which only supports architecture parameters.)

I can think of one option that would almost always give us what we want. Instead of having a base model config that may or may not be overridden in a confusing way, we could create directly create a full base model config (as @bigximik suggests), then treat the provided base model config as a set of explicit updates to that imported config. This wasn't really possible before, but now we can use the override method introduced in #168.

This should work as expected:

Pretrained config, no base model config: All architecture parameters are imported, and so are relevant non-architecture parameters (ex. window_size). Other non-architecture parameters take the Fast-LLM default.
Pretrained config, base model config with non-architecture parameters: Parameters explicitly specified in the base model config are taken, others are as above.
Pretrained config, base model config with architecture parameters: We probably want to enforce matching values, and raise an error for any mismatch. (This would be an improvement because right now wrong values are silently ignored.)
No pretrained config: Same as before.

tscholak · 2025-03-07T00:05:50Z

Hey @jlamypoirier, regardless of whether we call this a bug or a design flaw, the outcome is the same: the current behaviour is incorrect and leads to broken model conversions.

That said, @bigximik already outlined a clear and simple path forward, and I had mentioned Tuesday that any deviations from this plan need to be brought to the team before investing time in them.

If this takes more than a day or two to fix, it should be postponed. Right now, LoRA is the priority. We need to align on that.

jlamypoirier · 2025-03-07T00:27:32Z

@tscholak For the reasons I stated above, I do not believe the solution proposed by @bigximik could work even as a stopgap solution. But I can't just complain that things don't work, I have to propose an alternative to the team which I did through the proposal #170 and short POC #171.

This issue has been raised repeatedly and is considered a "bug" by many, so I consider it a priority issue. (And I was assuming you think the same since I was assigned and it was put in the "ready" section.)

tscholak · 2025-03-07T14:09:45Z

@jlamypoirier, the "ready" column does not mean top priority.

We discussed in the sync that LoRA is the immediate focus. This issue here was marked ready under the assumption that a quick fix was possible. It's not an urgent blocker. If Denis' approach is not feasible, then that should have been brought to the team first, not worked around in parallel.

Again, this issue is not a priority right now. For now, please shift back to LoRA.

tscholak added the bug Something isn't working label Feb 27, 2025

tscholak assigned jlamypoirier and bigximik Feb 27, 2025

tscholak mentioned this issue Mar 3, 2025

Qwen2 converter #163

Merged

20 tasks

jlamypoirier added enhancement New feature or request and removed bug Something isn't working labels Mar 6, 2025

jlamypoirier changed the title ~~Missing configuration when converting from HF model config json~~ Add option to convert non-architecture parameters Mar 6, 2025

jlamypoirier mentioned this issue Mar 6, 2025

Make the model config override the pretrained config #170

Closed

4 tasks

tscholak changed the title ~~Add option to convert non-architecture parameters~~ Missing configuration when converting from HF model config json Mar 7, 2025

tscholak added bug Something isn't working and removed enhancement New feature or request labels Mar 7, 2025

jlamypoirier mentioned this issue Mar 11, 2025

Support frozen weights #183

Closed

4 tasks

tscholak unassigned bigximik Mar 24, 2025

jlamypoirier mentioned this issue Mar 26, 2025

Config update mechanism, keep track of explicitly set config parameters #205

Merged

8 tasks

jlamypoirier mentioned this issue Apr 5, 2025

Make the specified config parameters update the pretrained config #211

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing configuration when converting from HF model config json #166

Missing configuration when converting from HF model config json #166

tscholak commented Feb 27, 2025

bigximik commented Mar 4, 2025

jlamypoirier commented Mar 6, 2025

tscholak commented Mar 7, 2025

jlamypoirier commented Mar 7, 2025

tscholak commented Mar 7, 2025

Missing configuration when converting from HF model config json #166

Missing configuration when converting from HF model config json #166

Comments

tscholak commented Feb 27, 2025

🐞 Describe the Bug

🔄 Steps to Reproduce

🎯 Expected Behavior

bigximik commented Mar 4, 2025

jlamypoirier commented Mar 6, 2025

tscholak commented Mar 7, 2025

jlamypoirier commented Mar 7, 2025

tscholak commented Mar 7, 2025