Add LFM2 (hybrid short-conv) target support by nathanrchn · Pull Request #28 · bstnxbt/dflash-mlx

nathanrchn · 2026-05-06T13:54:53Z

Lfm2TargetOps in engine/target_lfm2.py mirrors target_qwen_gdn structure
ShortConvRollbackCache for conv-state rollback during speculation
patched ShortConv.call records pre-conv tape when armed
forward path uses inner.fa_idx/conv_idx dual masks and embedding_norm
DFlashDraftModelArgs.from_dict reads rope_parameters (LFM2 config layout)
Lfm2TargetOps registered in TARGET_BACKENDS

Verified on LiquidAI/LFM2.5-1.2B-Instruct + nathanrchn/LFM2.5-1.2B-Instruct-DFlash on Apple M4 Max with the README protocol (AIME prompt, 3 repeats, 60s cooldown):

Tokens	Baseline	DFlash	Speedup	Acceptance
1024	141.94 tok/s	338.98 tok/s	2.39x	86.82%
2048	141.03 tok/s	209.59 tok/s	1.49x	78.76%

The bundled draft was trained on 2k sequences; speedup degrades past that horizon as draft acceptance falls.

- Lfm2TargetOps in engine/target_lfm2.py mirrors target_qwen_gdn structure - ShortConvRollbackCache for conv-state rollback during speculation - patched ShortConv.__call__ records pre-conv tape when armed - forward path uses inner.fa_idx/conv_idx dual masks and embedding_norm - DFlashDraftModelArgs.from_dict reads rope_parameters (LFM2 config layout) - Lfm2TargetOps registered in TARGET_BACKENDS Verified on LiquidAI/LFM2.5-1.2B-Instruct + nathanrchn/LFM2.5-1.2B-Instruct-DFlash on Apple M4 Max with the README protocol (AIME prompt, 3 repeats, 60s cooldown): | Tokens | Baseline | DFlash | Speedup | Acceptance | |--------|----------------|----------------|---------|------------| | 1024 | 141.94 tok/s | 338.98 tok/s | 2.39x | 86.82% | | 2048 | 141.03 tok/s | 209.59 tok/s | 1.49x | 78.76% | The bundled draft was trained on 2k sequences; speedup degrades past that horizon as draft acceptance falls. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Two rows within the draft model's 2k training horizon. Footnote calls out the hardware difference (M4 Max vs M5 Max in the rest of the table) and the omitted 4k/8k rows. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

nathanrchn and others added 2 commits May 7, 2026 11:55

docs: add LFM2.5-1.2B benchmark rows to README

753c814

Two rows within the draft model's 2k training horizon. Footnote calls out the hardware difference (M4 Max vs M5 Max in the rest of the table) and the omitted 4k/8k rows. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

nathanrchn force-pushed the lfm branch from dcfb7ff to 753c814 Compare May 7, 2026 09:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LFM2 (hybrid short-conv) target support#28

Add LFM2 (hybrid short-conv) target support#28
nathanrchn wants to merge 2 commits into
bstnxbt:mainfrom
nathanrchn:lfm

nathanrchn commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nathanrchn commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant