[pull] master from ggml-org:master by pull[bot] · Pull Request #1203 · LongLeCE/llama.cpp

pull · 2026-05-20T20:42:03Z

See Commits and Changes for more details.

Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

* vulkan: optimize operations in the IM2COL shader * Add comments and improve the code formatting

…refactor (#23345) * mtmd : deepseek-ocr fixes, improvements and refactoring - image processing changes to achieve full parity with Pillow (reference impl) - SAM mask casting only when flash-attn is on - SAM refactor (build_sam() extracted so deepseek-ocr-2 can reuse it) - llama-chat changes to fix server/WebUI issue (new media_markers_first()) - adapted test-chat-template and added test cases for deepseek-ocr - changed regression test for deepseek-ocr to use CER+chrF scores for ground-truth comparison; removed embedding-model - ty.toml ignore unresolved-import for tools/mtmd/tests/** * image-text reordering fix removed * refactor bool add_padding + pad_rounding enum into a single pad_style enum

ggml_backend_dev_by_name always appends a nullptr sentinel to the devices vector. Skipping nullptr entries prevents assertion failure in ggml_backend_dev_name. Assisted-by: llama.cpp:local pi

* opencl: refactor initialization * opencl: refactor GPU identification * opencl: rename for consistency * opencl: cache global mem size in dev_ctx * opencl: adjust log level * opencl: load argsort and flash_attn kernels in supports_op * argsort kernel must be built for supports_op for querying the max workgroups * flash_attn kernel has many variants, only load them when needed

* Move to backend sampling for MTP draft path Run top_k(10) on the draft backend. D2H transfers happen only for the top 10 logits Make backend sampling more robust and fallback to CPU on failure cases, such as with "-sm tensor" or when a backend doesn't support TOP_K. * Allow sampler chains to be partially offloaded to backend * Add --spec-draft-backend-sampling argument. Enabled by default.

allozaur and others added 6 commits May 20, 2026 16:55

feat: Add WAV MIME type variants and improve audio format detection (#…

6ce9671

…23396)

vulkan: optimize operations in the IM2COL shader (#22685)

acd604f

* vulkan: optimize operations in the IM2COL shader * Add comments and improve the code formatting

common/speculative : fix nullptr crash in get_devices_str (#23386)

510b5c2

ggml_backend_dev_by_name always appends a nullptr sentinel to the devices vector. Skipping nullptr entries prevents assertion failure in ggml_backend_dev_name. Assisted-by: llama.cpp:local pi

pull Bot locked and limited conversation to collaborators May 20, 2026

pull Bot added the ⤵️ pull label May 20, 2026

pull Bot merged commit ad27757 into LongLeCE:master May 20, 2026

github-actions Bot added examples python ggml OpenCL Vulkan server/ui labels May 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull] master from ggml-org:master#1203

[pull] master from ggml-org:master#1203
pull[bot] merged 6 commits into
LongLeCE:masterfrom
ggml-org:master

pull Bot commented May 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

pull Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

pull Bot commented May 20, 2026 •

edited

Loading