Skip to content

[pull] master from ggml-org:master#1203

Merged
pull[bot] merged 6 commits into
LongLeCE:masterfrom
ggml-org:master
May 20, 2026
Merged

[pull] master from ggml-org:master#1203
pull[bot] merged 6 commits into
LongLeCE:masterfrom
ggml-org:master

Conversation

@pull
Copy link
Copy Markdown

@pull pull Bot commented May 20, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

allozaur and others added 6 commits May 20, 2026 16:55
* vulkan: optimize operations in the IM2COL shader

* Add comments and improve the code formatting
…refactor (#23345)

* mtmd : deepseek-ocr fixes, improvements and refactoring

- image processing changes to achieve full parity with Pillow (reference impl)
- SAM mask casting only when flash-attn is on
- SAM refactor (build_sam() extracted so deepseek-ocr-2 can reuse it)
- llama-chat changes to fix server/WebUI issue (new media_markers_first())
- adapted test-chat-template and added test cases for deepseek-ocr
- changed regression test for deepseek-ocr to use CER+chrF scores for ground-truth comparison; removed embedding-model
- ty.toml ignore unresolved-import for tools/mtmd/tests/**

* image-text reordering fix removed

* refactor bool add_padding + pad_rounding enum into a single pad_style enum
ggml_backend_dev_by_name always appends a nullptr sentinel to the devices
vector. Skipping nullptr entries prevents assertion failure in
ggml_backend_dev_name.

Assisted-by: llama.cpp:local pi
* opencl: refactor initialization

* opencl: refactor GPU identification

* opencl: rename for consistency

* opencl: cache global mem size in dev_ctx

* opencl: adjust log level

* opencl: load argsort and flash_attn kernels in supports_op

* argsort kernel must be built for supports_op for querying the max
  workgroups
* flash_attn kernel has many variants, only load them when needed
* Move to backend sampling for MTP draft path

Run top_k(10) on the draft backend. D2H transfers happen only for the top 10 logits

Make backend sampling more robust and fallback to CPU on failure cases, such as with "-sm tensor" or when a backend doesn't support TOP_K.

* Allow sampler chains to be partially offloaded to backend

* Add --spec-draft-backend-sampling argument. Enabled by default.
@pull pull Bot locked and limited conversation to collaborators May 20, 2026
@pull pull Bot added the ⤵️ pull label May 20, 2026
@pull pull Bot merged commit ad27757 into LongLeCE:master May 20, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants