[MLX] Low speed on Qwen3.5-27B-4bit: 15.5 tok/s with 80.9% acceptance (M4 Pro 48GB)

dflash --model mlx-community/Qwen3.5-27B-4bit --draft z-lab/Qwen3.5-27B-DFlash --prompt "Write a Python quick sort" --max-tokens 1024

Output: 815 tokens | 15.5 tok/s | 80.9% acceptance

Hardware: M4 Pro 48GB

High acceptance (80.9%) but low speed. Per-step verification latency seems too high.