You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Values set in the config are silently ignored and can be confusing. Ex the actual hidden size may end up being 4096 when the config explicitly says 2048.
I suggest flipping things around so the specified model config overrides the pretrained config. This should give us the behaviour we want in most cases:
Pretrained config, no base model config: All architecture parameters are imported, and so are relevant non-architecture parameters (ex. window_size). Other non-architecture parameters take the Fast-LLM default.
Pretrained config, base model config with non-architecture parameters: Parameters explicitly specified in the base model config are taken, others are as above.
Pretrained config, base model config with architecture parameters: We probably want to enforce matching values, and raise an error for any mismatch. (This would be an improvement because right now wrong values are silently ignored.)
No pretrained config: Same as before.
🚀 Execution Plan
We can use Fast-LLM's override mechanism as in #168.
However, we'll also need to adapt the update mechanism to get the behaviour we want for nested configs.
It could also be difficult to achieve backward compatibility.
📌 Acceptance Criteria (Must-Haves for Completion)
Things should work as described above
🛠️ Project Management
Assign the project to the Fast-LLM project.
Set the Estimate field (in days) in the GitHub project.
Use the Size field to categorize the PR size (Small/Medium/Large).
Assign an owner when opening the issue.
The text was updated successfully, but these errors were encountered:
🎯 Goal (What & Why)
Currently, a pretrained config overrides an arbitrary part of the user-specified config. This causes a lot of troubles:
I suggest flipping things around so the specified model config overrides the pretrained config. This should give us the behaviour we want in most cases:
window_size
). Other non-architecture parameters take the Fast-LLM default.🚀 Execution Plan
We can use Fast-LLM's override mechanism as in #168.
However, we'll also need to adapt the update mechanism to get the behaviour we want for nested configs.
It could also be difficult to achieve backward compatibility.
📌 Acceptance Criteria (Must-Haves for Completion)
🛠️ Project Management
Estimate
field (in days) in the GitHub project.Size
field to categorize the PR size (Small/Medium/Large).The text was updated successfully, but these errors were encountered: