Extend add_linear_biases to support a dictionary of sub-layers to which linear bias should be added. #158

bigximik · 2025-02-24T14:50:59Z

✨ Description

A possibility to specify the addition of linear biases per layer and attention and mlp sub-layers in the form of a dictionary, with sub-layer keys and their presence in layers, like this:

 add_linear_biases: 
     "layers.self_attn.query": "*"
     "layers.mlp.layer_1": "1:10:3, 9"
     "layers.mlp.layer_2": "5:7"

🔍 Type of change

Select all that apply:

🐛 Bug fix (non-breaking change that addresses a specific issue)
🚀 New feature (non-breaking change that adds functionality)
⚠️ Breaking change (a change that could affect existing functionality)
📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
📝 Documentation change (updates documentation, including new content or typo fixes)
🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

📝 Changes

List the key changes introduced in this PR:

Adds support for add_linear_biases as dict

✅ Checklist

Make sure the following tasks are completed before submitting the PR:

General

📜 I have read and followed the contributing guidelines.
🏷️ I am using a clear and descriptive PR title that summarizes the key change or feature introduced.
🎉 The functionality is complete, and I have tested the changes. (tested only new test cases)
📝 I have updated the documentation if needed. (not applicable)
⚠️ The change does not introduce any new issues (e.g., runtime warnings, type checker errors, linting problems, unhandled edge cases).
🧩 I have commented my code, especially in hard-to-understand areas.

Testing

🧪 I have added or updated tests to cover my changes.
✔️ New and existing tests pass locally with my changes. (tested only new test cases)
🚦 I have tested these changes on GPUs and verified training stability. (not applicable)
🏋️ I have tested the changes on realistic training workloads, if applicable. (not applicable)

bigximik · 2025-02-24T14:55:56Z

I have two questions:

Do we want to specify sub-layers like this: "layers.self_attn.query" or use a simpler format like attention_query?
Transformer layer indexes in Fast-LLM start from 1, as layer 0 is the embedding layer. I’ve implemented zero-based indexes in the config, but should we switch to one-based indexes in the config as well to align with Fast-LLM's internal nomenclature?

tscholak · 2025-02-24T19:14:33Z

Thanks @bigximik. Do we need regex pattern matching for Qwen2 for layer-wise application of linear biases? I thought that Qwen2 used bias terms in all q,k,v layers, so this feature is technically not needed, or is it?

bigximik · 2025-02-25T05:48:07Z

Qwen2 does not need it. I made it generic to be more in line with #155

bigximik · 2025-02-25T15:27:59Z

This will be in draft, for Qwen2 support this is implemented in a while #160

jlamypoirier · 2025-03-04T21:40:59Z

I think #168 will make these kinds of things unnecessary?

tscholak · 2025-05-09T15:03:15Z

closing this since we are going forward with #242

bigximik added 2 commits February 24, 2025 15:13

added add_linear_biases as dict of sublayers keys

6d1c427

added simple tests for attention and mlp constructors

a4d513c

bigximik requested review from tscholak and jlamypoirier February 24, 2025 14:50

tscholak closed this May 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extend add_linear_biases to support a dictionary of sub-layers to which linear bias should be added. #158

Extend add_linear_biases to support a dictionary of sub-layers to which linear bias should be added. #158

Uh oh!

bigximik commented Feb 24, 2025 •

edited

Loading

Uh oh!

bigximik commented Feb 24, 2025

Uh oh!

tscholak commented Feb 24, 2025

Uh oh!

bigximik commented Feb 25, 2025

Uh oh!

bigximik commented Feb 25, 2025

Uh oh!

jlamypoirier commented Mar 4, 2025

Uh oh!

tscholak commented May 9, 2025

Uh oh!

Uh oh!

Extend add_linear_biases to support a dictionary of sub-layers to which linear bias should be added. #158

Extend add_linear_biases to support a dictionary of sub-layers to which linear bias should be added. #158

Uh oh!

Conversation

bigximik commented Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✨ Description

🔍 Type of change

📝 Changes

✅ Checklist

General

Testing

Uh oh!

bigximik commented Feb 24, 2025

Uh oh!

tscholak commented Feb 24, 2025

Uh oh!

bigximik commented Feb 25, 2025

Uh oh!

bigximik commented Feb 25, 2025

Uh oh!

jlamypoirier commented Mar 4, 2025

Uh oh!

tscholak commented May 9, 2025

Uh oh!

Uh oh!

bigximik commented Feb 24, 2025 •

edited

Loading