Skip to content

Fix DeepSeek-V4 Flash loading and chat#19

Open
DiscoStew6082 wants to merge 1 commit intoBlaizzy:pc/add-deepseekv4flash-modelfrom
DiscoStew6082:deepseek-v4-flash-load-chat
Open

Fix DeepSeek-V4 Flash loading and chat#19
DiscoStew6082 wants to merge 1 commit intoBlaizzy:pc/add-deepseekv4flash-modelfrom
DiscoStew6082:deepseek-v4-flash-load-chat

Conversation

@DiscoStew6082
Copy link
Copy Markdown

Summary

Fix loading and chatting with mlx-community/deepseek-ai-DeepSeek-V4-Flash-8bit.

  • Remap DeepSeek-V4 Flash checkpoint keys for quantized top-level embeddings/head tensors, grouped attn.wo_a tensors, and already-stacked routed expert tensors.
  • Load TokenizersBackend tokenizers directly from tokenizer.json to avoid AutoTokenizer forcing DeepSeek-V4 config validation.
  • Use the existing DeepSeek V3.2 chat template as a DeepSeek-V4 fallback when the tokenizer has no chat template, translating enable_thinking to the template's thinking_mode.

Why

The local DeepSeek-V4 Flash MLX checkpoint failed strict loading with:

ValueError: Received 1423 parameters not in model

After fixing weight remapping, mlx_lm.chat still failed because this model's tokenizer config uses TokenizersBackend and does not include a chat_template.

Verification

Tested locally with:

python -m mlx_lm generate \
  --model /Volumes/Envoy/models/mlx-community/deepseek-ai-DeepSeek-V4-Flash-8bit \
  --prompt "Hello" \
  --max-tokens 1 \
  --temp 0

printf 'Hello, can you talk like a pirate?\nq\n' | uv run --active mlx_lm.chat \
  --model /Volumes/Envoy/models/mlx-community/deepseek-ai-DeepSeek-V4-Flash-8bit \
  --max-tokens 1

Also ran:

python -m compileall mlx_lm/models/deepseek_v4.py mlx_lm/tokenizer_utils.py
git diff --check

@DiscoStew6082 DiscoStew6082 force-pushed the deepseek-v4-flash-load-chat branch 2 times, most recently from 05ee1f8 to 3c67ab8 Compare April 27, 2026 22:35
@DiscoStew6082 DiscoStew6082 force-pushed the deepseek-v4-flash-load-chat branch from 3c67ab8 to f57ac79 Compare April 27, 2026 22:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant