-
Notifications
You must be signed in to change notification settings - Fork 28
Missing configuration when converting from HF model config json #166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I have found that when importing metadata, first, configuration is imported as architecture-only: However, later, it is reassigned as the full configuration class if the child class sets it that way (and converter handlers set it this way): I am not sure why this approach is used. I have tried importing config as a full class, and the checkpoint tests pass, but I am unsure if this could trigger any side effects. Perhaps we could check the type of @jlamypoirier what do you think? |
This is not a bug, Fast-LLM is designed that way. The architecture/non-architecture split is good enough for most situations, but to get the behaviour described here we need to rethink this principle. Forcing a full import of the model config isn't really a good option because it would prevent setting any other model parameter. (Also it would need major changes to the conversion mechanism which only supports architecture parameters.) I can think of one option that would almost always give us what we want. Instead of having a base model config that may or may not be overridden in a confusing way, we could create directly create a full base model config (as @bigximik suggests), then treat the provided base model config as a set of explicit updates to that imported config. This wasn't really possible before, but now we can use the override method introduced in #168. This should work as expected:
|
Hey @jlamypoirier, regardless of whether we call this a bug or a design flaw, the outcome is the same: the current behaviour is incorrect and leads to broken model conversions. That said, @bigximik already outlined a clear and simple path forward, and I had mentioned Tuesday that any deviations from this plan need to be brought to the team before investing time in them. If this takes more than a day or two to fix, it should be postponed. Right now, LoRA is the priority. We need to align on that. |
@tscholak For the reasons I stated above, I do not believe the solution proposed by @bigximik could work even as a stopgap solution. But I can't just complain that things don't work, I have to propose an alternative to the team which I did through the proposal #170 and short POC #171. This issue has been raised repeatedly and is considered a "bug" by many, so I consider it a priority issue. (And I was assuming you think the same since I was assigned and it was put in the "ready" section.) |
@jlamypoirier, the "ready" column does not mean top priority. We discussed in the sync that LoRA is the immediate focus. This issue here was marked ready under the assumption that a quick fix was possible. It's not an urgent blocker. If Denis' approach is not feasible, then that should have been brought to the team first, not worked around in parallel. Again, this issue is not a priority right now. For now, please shift back to LoRA. |
🐞 Describe the Bug
When converting models config options not included in the architecture config are not imported from the Hugging Face model's
config.json
.This creates an unexpected and undocumented requirement for manual configuration, which can lead to costly mistakes.
The following critical options are affected:
window_size
andmax_windows_layers
for models trained with windowed attention (e.g. Qwen 2), see Qwen2 converter #163.router_aux_loss_coef
for MoEs such as Mixtral.Currently, the load-from-HF-model feature suggests seamless integration, but this bug prevents complete and accurate model conversion. Users are likely to assume the conversion will "just work" and may unknowingly train models with incorrect configurations.
🔄 Steps to Reproduce
Load a HF model using Fast-LLM:
Use a HF model that requires non-architecture-specific parameters (e.g.,
window_size
for sliding window attention).Observe missing configurations:
Check the output model configuration. Notice that parameters not included in the architecture config are missing or set to default values, potentially breaking the model.
🎯 Expected Behavior
Fast-LLM should correctly import all relevant configuration options from the Hugging Face
config.json
, not just those in the architecture configuration. This ensures that models are fully converted and behave as expected, without requiring manual intervention or hidden knowledge about configuration quirks.The text was updated successfully, but these errors were encountered: