Skip to content

[MLX] Low speed on Qwen3.5-27B-4bit: 15.5 tok/s with 80.9% acceptance (M4 Pro 48GB) #30

@heykb

Description

@heykb

dflash --model mlx-community/Qwen3.5-27B-4bit --draft z-lab/Qwen3.5-27B-DFlash --prompt "Write a Python quick sort" --max-tokens 1024

Output: 815 tokens | 15.5 tok/s | 80.9% acceptance

Hardware: M4 Pro 48GB

High acceptance (80.9%) but low speed. Per-step verification latency seems too high.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions