Description
🎯 Goal (What & Why)
Base model configs are currently split into architecture/non-architecture. This is a legacy behaviour that was introduced as a stopgap solution for:
- Loading pretrained configs while allowing changes in parameters that support it.
- Marking a subset of the parameters as essential to a model, which absolutely need to be kept fixed when loading a pretrained model (i.e. changing it would cause a crash or otherwise break the model).
(1) Is made obsolete by #211, and will become irrelevant after #231. (2) On the other hand needs a replacement. Ignoring the problem is not recommended because it would make the loading of pretrained checkpoints unsafe, causing cryptic crashes and unexpected breakdown. My suggestion (which I've been planning for a while) would be to replace the hard class split by a simple field descriptor, i.e. something like field hints. This would address the issue with relatively low added complexity (and could be made opt-in, ex. for experimental models).
🚀 Execution Plan
(This section may start as an incomplete draft but must be defined before implementation begins.)
Step 1: What is the smallest working version?
(Describe the simplest way to implement this feature with minimal effort.)
Step 2: What additional optimizations are possible (but optional)?
(List potential refinements that can be added in later PRs if needed.)
📌 Acceptance Criteria (Must-Haves for Completion)
- The feature must be functional and tested.
- The implementation must be documented in practical terms.
- The PR must include a performance/impact summary.
- No refactors unless directly necessary for feature completion.
🛠️ Project Management
- Assign the project to the Fast-LLM project.
- Set the
Estimate
field (in days) in the GitHub project. - Use the
Size
field to categorize the PR size (Small/Medium/Large). - Assign an owner when opening the issue.