convert: clean Gemma vision/audio chat template markers #16749

pockers21 · 2025-10-24T05:59:55Z

Summary

Normalize Gemma chat templates at conversion time: replace <start_of_image>/<end_of_image> (and audio
equivalents) with the MTMD placeholder .

Context

Discovered via CI: the Server matrix intermittently failed the vision chat test (tools/server/tests/
unit/test_vision_api.py::test_vision_chat_completion) with empty content due to image injection not
happening.
Root cause is template marker mismatch (model emits <start_of_image> while llama.cpp expects
for MTMD insertion), not a CI infra problem.

Changes

M convert_hf_to_gguf.py
- Gemma2/Gemma3 set_vocab(): read tokenizer.chat_template and clean:
  - <start_of_image> → , <end_of_image> → ""
  - <start_of_audio> → , <end_of_audio> → ""
- If changed, write back with gguf_writer.add_chat_template(cleaned)

Testing

Build:
- cd /root/llama.cpp
- cmake -B build -DCMAKE_BUILD_TYPE=Release -DLLAMA_CURL=ON
- cmake --build build -j --target llama-server
Install tests:
- python3 -m venv .venv_server_tests && source .venv_server_tests/bin/activate
- pip install -r tools/server/tests/requirements.txt
External server (example port 18081):
- export LLAMA_CACHE=/root/autodl-tmp/llama-cache
- ./build/bin/llama-server --host 127.0.0.1 --port 18081 --temp 0.8 --seed 42 --hf-repo ggml-org/
  tinygemma3-GGUF --hf-file tinygemma3-Q8_0.gguf --batch-size 32 --no-slots --alias tinygemma3 --ctx-
  size 1024 --parallel 2 --n-predict 4 --mmproj-url https://huggingface.co/ggml-org/tinygemma3-GGUF/
  resolve/main/mmproj-tinygemma3.gguf
- DEBUG_EXTERNAL=1 PORT=18081 LLAMA_CACHE=$LLAMA_CACHE pytest -q -x tools/server/tests/unit/
  test_vision_api.py::test_vision_chat_completion -k 'IMG_URL_0 or IMG_BASE64_URI_0'
Expected: passes for both parameters.

Impact

Only affects models whose chat_template uses the above vision/audio markers; no change for other
models.
Keeps server runtime clean and model-agnostic; does not alter public inference APIs.

ngxson · 2025-10-24T21:29:40Z

Discovered via CI: the Server matrix intermittently failed the vision chat test (tools/server/tests/
unit/test_vision_api.py::test_vision_chat_completion) with empty content due to image injection not
happening.

Root cause is template marker mismatch (model emits <start_of_image> while llama.cpp expects
for MTMD insertion), not a CI infra problem.

What? When does the test fail? I can't see it fail in our CI.

Then how to you explain the "intermittently failed" part in your comment above? If this is really the problem with chat template, it should always fail, not intermittently

Your PR looks like hallucinated AI-generated content. Please explicitly state if you use AI to generate parts of this PR.

liyang added 3 commits October 23, 2025 17:10

server(chat): inject image_url via MTMD marker

ea44482

ci: retrigger workflows

65d1ee8

gguf: clean Gemma vision/audio markers to <media>

c469e8a

github-actions bot added the python python script changes label Oct 24, 2025

pockers21 force-pushed the bugfix-server-vision-mtmd branch 7 times, most recently from 5e2fa90 to 86d2de5 Compare October 24, 2025 08:16

Merge branch 'master' into bugfix-server-vision-mtmd

5fb33e3

pockers21 force-pushed the bugfix-server-vision-mtmd branch from 86d2de5 to 5fb33e3 Compare October 24, 2025 10:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

convert: clean Gemma vision/audio chat template markers #16749

convert: clean Gemma vision/audio chat template markers #16749

pockers21 commented Oct 24, 2025 •

edited

Loading

Uh oh!

ngxson commented Oct 24, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

convert: clean Gemma vision/audio chat template markers #16749

Are you sure you want to change the base?

convert: clean Gemma vision/audio chat template markers #16749

Conversation

pockers21 commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pockers21 commented Oct 24, 2025 •

edited

Loading

ngxson commented Oct 24, 2025 •

edited

Loading