Skip to content

[pull] master from ggml-org:master#1034

Merged
pull[bot] merged 15 commits into
LongLeCE:masterfrom
ggml-org:master
Mar 31, 2026
Merged

[pull] master from ggml-org:master#1034
pull[bot] merged 15 commits into
LongLeCE:masterfrom
ggml-org:master

Conversation

@pull
Copy link
Copy Markdown

@pull pull Bot commented Mar 31, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

@pull pull Bot locked and limited conversation to collaborators Mar 31, 2026
@pull pull Bot added the ⤵️ pull label Mar 31, 2026
…i-compat (#21090)

* server/webui: cleanup dual representation approach, simplify to openai-compat

* feat: Fix regression for Agentic Loop UI

* chore: update webui build output

* refactor: Post-review code improvements

* chore: update webui build output

* refactor: Cleanup

* chore: update webui build output

---------

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
satishkc7 and others added 3 commits March 31, 2026 10:52
* fix: include API key in CORS proxy requests for MCP connections

When llama-server is started with --api-key-file and --webui-mcp-proxy,
the /cors-proxy endpoint requires authentication. The WebUI was not
including the Authorization header in proxy requests, causing MCP
connections to fail with 401.

Inject getAuthHeaders() into requestInit when useProxy is true so the
proxy request carries the Bearer token alongside the forwarded target
headers.

Fixes #21167

* fix: simplify headers assignment based on reviewer suggestion

Apply buildProxiedHeaders only when useProxy is true, pass headers
directly to the transport otherwise.
…gfault on failed model load (#21082)

* common: add bounds check in common_init_result::sampler to prevent segfault on failed model load

* Revert a308e58

* Add regression test

* Remove regression test for init-fail sampler check
The build info is now only for debug, so we avoid the duplicate
with `--version`.

The UTF-8 setup at the beginning is needed to avoid logging
garbage on Windows.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
- emdeddings → embeddings (gemma3.cpp, gemma3n-iswa.cpp,
gemma-embedding.cpp)
- imlpemented → implemented (llama-adapter.cpp)
- interere → interfere (llama-graph.cpp)
- overridde → overridden (chat.cpp)
- stastistics → statistics (ngram-map.h)
- layed → laid (llama-kv-cache.h)
- worster → worst (llama-context.cpp)
- sequantial → sequential (llama-batch.h)
@github-actions github-actions Bot added the model label Mar 31, 2026
aldehir and others added 3 commits March 31, 2026 13:52
* webui: no more gzip

* try changing a small line

* Revert "try changing a small line"

This reverts commit 0d7a353.

* fix lint

* fix test

* rebuild

* split into html/css/js

* lint

* chore: update webui build output

* chore: Update git hooks script

* server: update webui build output

* chore: Update pre-commit hook

* refactor: Cleanup

---------

Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>
* CANN: fix multi-thread set_tensor race conditions

When ollama calls ggml_backend_tensor_set from multiple threads (each
writing a different chunk of the same tensor), the CANN backend had
three concurrency issues:

1. Quantized tensors (Q4_0/Q8_0) require a full-tensor format transform
   before uploading to device. Per-chunk transforms produced corrupt data.

2. ND-to-NZ weight conversion requires complete tensor data on device.
   Per-chunk conversion operated on incomplete data.

3. The global g_nz_workspaces array had unprotected concurrent access.

Fix by introducing a TensorSetTracker that accumulates write progress
per tensor. For quantized tensors, raw data is staged in a host buffer
and the transform + upload is deferred until all chunks arrive. For NZ
weights, chunks are uploaded directly but conversion is deferred. The
tracker and its staging buffer are released immediately after
post-processing completes.

Add per-device mutex to g_nz_workspaces to prevent data races.

* CANN: fix L2_NORM ignoring eps parameter

The L2_NORM implementation was not using the eps parameter from
op_params, causing incorrect results when eps is large (e.g. 10.0).
The CPU reference computes scale = 1/fmaxf(norm, eps), so add a
Clamp step to clamp the norm to at least eps before dividing.

* ggml/cann: compare op_params for POOL_2D in ACL graph cache matching

When ACL graph mode is enabled, the graph LRU cache checks whether a
cached graph matches the current computation graph. Previously,
GGML_OP_POOL_2D was not included in the op_params comparison, so two
POOL_2D nodes with different pooling parameters (kernel size, stride,
padding) but identical tensor shapes and addresses could incorrectly
reuse a cached graph, leading to wrong results or aclnn errors.

Add GGML_OP_POOL_2D to the list of ops that require op_params matching
in ggml_graph_node_properties::has_matching_properties().

* cann: fix ACL graph cache matching by adding tensor type and unconditional op_params comparison

The ACL graph LRU cache was incorrectly reusing cached graphs for
operations with different tensor types or op_params, causing test
failures for CPY (f16 vs bf16), POOL_2D, L2_NORM, NORM_MUL_ADD,
RMS_NORM_MUL_ADD, and ADD_RMS_NORM.

Changes:
- Add node_type and src_type[] fields to ggml_graph_node_properties
  so the cache can distinguish tensors with different types but
  identical ne/nb (e.g. f16 and bf16 both have 2-byte elements)
- Compare op_params unconditionally for all ops instead of only for
  SCALE/UNARY/GLU/ROPE/POOL_2D
```
$ build/bin/llama-server -hf unsloth/Qwen3.5-0.8B-GGUF
common_download_file_single_online: HEAD failed, status: 404
no remote preset found, skipping
Downloading mmproj-BF16.gguf ——————————————————————————————————————— 100%
Downloading Qwen3.5-0.8B-Q4_K_M.gguf ——————————————————————————————— 100%
...
```

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
@pull pull Bot merged commit 6307ec0 into LongLeCE:master Mar 31, 2026
1 check passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.