You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Base model configs are currently split into architecture/non-architecture. This is a legacy behaviour that was introduced as a stopgap solution for:
Loading pretrained configs while allowing changes in parameters that support it.
Marking a subset of the parameters as essential to a model, which absolutely need to be kept fixed when loading a pretrained model (i.e. changing it would cause a crash or otherwise break the model).
(1) Is made obsolete by #211, and will become irrelevant after #231. (2) On the other hand needs a replacement. Ignoring the problem is not recommended because it would make the loading of pretrained checkpoints unsafe, causing cryptic crashes and unexpected breakdown. My suggestion (which I've been planning for a while) would be to replace the hard class split by a simple field descriptor, i.e. something like field hints. This would address the issue with relatively low added complexity (and could be made opt-in, ex. for experimental models).
🚀 Execution Plan
(This section may start as an incomplete draft but must be defined before implementation begins.)
Step 1: What is the smallest working version?
(Describe the simplest way to implement this feature with minimal effort.)
Step 2: What additional optimizations are possible (but optional)?
(List potential refinements that can be added in later PRs if needed.)
📌 Acceptance Criteria (Must-Haves for Completion)
The feature must be functional and tested.
The implementation must be documented in practical terms.
The PR must include a performance/impact summary.
No refactors unless directly necessary for feature completion.
🛠️ Project Management
Assign the project to the Fast-LLM project.
Set the Estimate field (in days) in the GitHub project.
Use the Size field to categorize the PR size (Small/Medium/Large).
Assign an owner when opening the issue.
The text was updated successfully, but these errors were encountered:
🎯 Goal (What & Why)
Base model configs are currently split into architecture/non-architecture. This is a legacy behaviour that was introduced as a stopgap solution for:
(1) Is made obsolete by #211, and will become irrelevant after #231. (2) On the other hand needs a replacement. Ignoring the problem is not recommended because it would make the loading of pretrained checkpoints unsafe, causing cryptic crashes and unexpected breakdown. My suggestion (which I've been planning for a while) would be to replace the hard class split by a simple field descriptor, i.e. something like field hints. This would address the issue with relatively low added complexity (and could be made opt-in, ex. for experimental models).
🚀 Execution Plan
Step 1: What is the smallest working version?
Step 2: What additional optimizations are possible (but optional)?
📌 Acceptance Criteria (Must-Haves for Completion)
🛠️ Project Management
Estimate
field (in days) in the GitHub project.Size
field to categorize the PR size (Small/Medium/Large).The text was updated successfully, but these errors were encountered: