Skip to content

Missing configuration when converting from HF model config json #166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tscholak opened this issue Feb 27, 2025 · 5 comments
Open

Missing configuration when converting from HF model config json #166

tscholak opened this issue Feb 27, 2025 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@tscholak
Copy link
Collaborator

🐞 Describe the Bug

When converting models config options not included in the architecture config are not imported from the Hugging Face model's config.json.

This creates an unexpected and undocumented requirement for manual configuration, which can lead to costly mistakes.

The following critical options are affected:

  • window_size and max_windows_layers for models trained with windowed attention (e.g. Qwen 2), see Qwen2 converter #163.
  • router_aux_loss_coef for MoEs such as Mixtral.

Currently, the load-from-HF-model feature suggests seamless integration, but this bug prevents complete and accurate model conversion. Users are likely to assume the conversion will "just work" and may unknowingly train models with incorrect configurations.

🔄 Steps to Reproduce

  1. Load a HF model using Fast-LLM:
    Use a HF model that requires non-architecture-specific parameters (e.g., window_size for sliding window attention).

  2. Observe missing configurations:
    Check the output model configuration. Notice that parameters not included in the architecture config are missing or set to default values, potentially breaking the model.

🎯 Expected Behavior

Fast-LLM should correctly import all relevant configuration options from the Hugging Face config.json, not just those in the architecture configuration. This ensures that models are fully converted and behave as expected, without requiring manual intervention or hidden knowledge about configuration quirks.

@tscholak tscholak added the bug Something isn't working label Feb 27, 2025
@tscholak tscholak mentioned this issue Mar 3, 2025
20 tasks
@bigximik
Copy link
Contributor

bigximik commented Mar 4, 2025

I have found that when importing metadata, first, configuration is imported as architecture-only:
https://github.com/ServiceNow/Fast-LLM/blob/main/fast_llm/engine/checkpoint/external.py#L206.

However, later, it is reassigned as the full configuration class if the child class sets it that way (and converter handlers set it this way):
https://github.com/ServiceNow/Fast-LLM/blob/main/fast_llm/engine/checkpoint/external.py#L211

I am not sure why this approach is used. I have tried importing config as a full class, and the checkpoint tests pass, but I am unsure if this could trigger any side effects.

Perhaps we could check the type of cls._model_class and decide whether to import the configuration as architecture-only or not?

@jlamypoirier what do you think?

@jlamypoirier jlamypoirier added enhancement New feature or request and removed bug Something isn't working labels Mar 6, 2025
@jlamypoirier jlamypoirier changed the title Missing configuration when converting from HF model config json Add option to convert non-architecture parameters Mar 6, 2025
@jlamypoirier
Copy link
Collaborator

This is not a bug, Fast-LLM is designed that way. The architecture/non-architecture split is good enough for most situations, but to get the behaviour described here we need to rethink this principle. Forcing a full import of the model config isn't really a good option because it would prevent setting any other model parameter. (Also it would need major changes to the conversion mechanism which only supports architecture parameters.)

I can think of one option that would almost always give us what we want. Instead of having a base model config that may or may not be overridden in a confusing way, we could create directly create a full base model config (as @bigximik suggests), then treat the provided base model config as a set of explicit updates to that imported config. This wasn't really possible before, but now we can use the override method introduced in #168.

This should work as expected:

  • Pretrained config, no base model config: All architecture parameters are imported, and so are relevant non-architecture parameters (ex. window_size). Other non-architecture parameters take the Fast-LLM default.
  • Pretrained config, base model config with non-architecture parameters: Parameters explicitly specified in the base model config are taken, others are as above.
  • Pretrained config, base model config with architecture parameters: We probably want to enforce matching values, and raise an error for any mismatch. (This would be an improvement because right now wrong values are silently ignored.)
  • No pretrained config: Same as before.

@tscholak
Copy link
Collaborator Author

tscholak commented Mar 7, 2025

Hey @jlamypoirier, regardless of whether we call this a bug or a design flaw, the outcome is the same: the current behaviour is incorrect and leads to broken model conversions.

That said, @bigximik already outlined a clear and simple path forward, and I had mentioned Tuesday that any deviations from this plan need to be brought to the team before investing time in them.

If this takes more than a day or two to fix, it should be postponed. Right now, LoRA is the priority. We need to align on that.

@tscholak tscholak changed the title Add option to convert non-architecture parameters Missing configuration when converting from HF model config json Mar 7, 2025
@tscholak tscholak added bug Something isn't working and removed enhancement New feature or request labels Mar 7, 2025
@jlamypoirier
Copy link
Collaborator

@tscholak For the reasons I stated above, I do not believe the solution proposed by @bigximik could work even as a stopgap solution. But I can't just complain that things don't work, I have to propose an alternative to the team which I did through the proposal #170 and short POC #171.

This issue has been raised repeatedly and is considered a "bug" by many, so I consider it a priority issue. (And I was assuming you think the same since I was assigned and it was put in the "ready" section.)

@tscholak
Copy link
Collaborator Author

tscholak commented Mar 7, 2025

@jlamypoirier, the "ready" column does not mean top priority.

We discussed in the sync that LoRA is the immediate focus. This issue here was marked ready under the assumption that a quick fix was possible. It's not an urgent blocker. If Denis' approach is not feasible, then that should have been brought to the team first, not worked around in parallel.

Again, this issue is not a priority right now. For now, please shift back to LoRA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants