Skip to content

Add LFM2 (hybrid short-conv) target support#28

Open
nathanrchn wants to merge 2 commits into
bstnxbt:mainfrom
nathanrchn:lfm
Open

Add LFM2 (hybrid short-conv) target support#28
nathanrchn wants to merge 2 commits into
bstnxbt:mainfrom
nathanrchn:lfm

Conversation

@nathanrchn
Copy link
Copy Markdown

  • Lfm2TargetOps in engine/target_lfm2.py mirrors target_qwen_gdn structure
  • ShortConvRollbackCache for conv-state rollback during speculation
  • patched ShortConv.call records pre-conv tape when armed
  • forward path uses inner.fa_idx/conv_idx dual masks and embedding_norm
  • DFlashDraftModelArgs.from_dict reads rope_parameters (LFM2 config layout)
  • Lfm2TargetOps registered in TARGET_BACKENDS

Verified on LiquidAI/LFM2.5-1.2B-Instruct + nathanrchn/LFM2.5-1.2B-Instruct-DFlash on Apple M4 Max with the README protocol (AIME prompt, 3 repeats, 60s cooldown):

Tokens Baseline DFlash Speedup Acceptance
1024 141.94 tok/s 338.98 tok/s 2.39x 86.82%
2048 141.03 tok/s 209.59 tok/s 1.49x 78.76%

The bundled draft was trained on 2k sequences; speedup degrades past that horizon as draft acceptance falls.

nathanrchn and others added 2 commits May 7, 2026 11:55
- Lfm2TargetOps in engine/target_lfm2.py mirrors target_qwen_gdn structure
- ShortConvRollbackCache for conv-state rollback during speculation
- patched ShortConv.__call__ records pre-conv tape when armed
- forward path uses inner.fa_idx/conv_idx dual masks and embedding_norm
- DFlashDraftModelArgs.from_dict reads rope_parameters (LFM2 config layout)
- Lfm2TargetOps registered in TARGET_BACKENDS

Verified on LiquidAI/LFM2.5-1.2B-Instruct + nathanrchn/LFM2.5-1.2B-Instruct-DFlash
on Apple M4 Max with the README protocol (AIME prompt, 3 repeats, 60s cooldown):

  | Tokens | Baseline       | DFlash         | Speedup | Acceptance |
  |--------|----------------|----------------|---------|------------|
  | 1024   | 141.94 tok/s   | 338.98 tok/s   | 2.39x   | 86.82%     |
  | 2048   | 141.03 tok/s   | 209.59 tok/s   | 1.49x   | 78.76%     |

The bundled draft was trained on 2k sequences; speedup degrades past that
horizon as draft acceptance falls.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Two rows within the draft model's 2k training horizon. Footnote calls out
the hardware difference (M4 Max vs M5 Max in the rest of the table) and the
omitted 4k/8k rows.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant