-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Open
Labels
Apple Metalhttps://en.wikipedia.org/wiki/Metal_(API)https://en.wikipedia.org/wiki/Metal_(API)bug-unconfirmed
Description
Name and Version
This issue began with build b6791.
Any build b6790 or below are unaffected.
Operating systems
Mac
Which llama.cpp modules do you know to be affected?
llama-server
Command line
./llama-server --host 0.0.0.0 --port 5001 --mlock --ctx-size 65535 --gpu-layers 200 --model /Users/socg/models/GLM-4.5-Air-Q8_0/GLM-4.5-Air-Q8_0-00001-of-00003.gguf --jinja -lv 1 -fa onProblem description & steps to reproduce
When loading split models on the Mac, the Metal_Mapped model buffer only shows the very last split gguf file. So if there are 3 files, it's only file number 3 that gets shown. In the case of the unsloth GLM 4.5 Air quant, the model loads 3 files: 47GB, 47GB and 13GB file. The output for Metal_Mapped model buffer only shows the 13GB file (see log output below)
Build: b6790
ggml_metal_log_allocated_size: allocated buffer, size = 47477.53 MiB, (47477.91 / 147456.00)
ggml_metal_log_allocated_size: allocated buffer, size = 47482.36 MiB, (94960.27 / 147456.00)
ggml_metal_log_allocated_size: allocated buffer, size = 13380.31 MiB, (108340.58 / 147456.00)
load_tensors: offloading 47 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 48/48 layers to GPU
load_tensors: Metal_Mapped model buffer size = 47477.52 MiB
load_tensors: Metal_Mapped model buffer size = 47482.35 MiB
load_tensors: Metal_Mapped model buffer size = 13380.29 MiB
load_tensors: CPU_Mapped model buffer size = 629.00 MiB
Build: b6791
ggml_metal_log_allocated_size: allocated buffer, size = 47477.53 MiB, (47477.91 / 147456.00)
ggml_metal_log_allocated_size: allocated buffer, size = 47482.36 MiB, (94960.27 / 147456.00)
ggml_metal_log_allocated_size: allocated buffer, size = 13380.31 MiB, (108340.58 / 147456.00)
load_tensors: offloading 47 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 48/48 layers to GPU
load_tensors: Metal_Mapped model buffer size = 13380.29 MiB
load_tensors: CPU_Mapped model buffer size = 629.00 MiB
First Bad Commit
Commit 66b0dbc
PR 16581: #16581
Relevant log output
ggml_metal_log_allocated_size: allocated buffer, size = 47477.53 MiB, (47477.91 / 147456.00)
ggml_metal_log_allocated_size: allocated buffer, size = 47482.36 MiB, (94960.27 / 147456.00)
ggml_metal_log_allocated_size: allocated buffer, size = 13380.31 MiB, (108340.58 / 147456.00)
load_tensors: offloading 47 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 48/48 layers to GPU
load_tensors: Metal_Mapped model buffer size = 13380.29 MiB
load_tensors: CPU_Mapped model buffer size = 629.00 MiBMetadata
Metadata
Assignees
Labels
Apple Metalhttps://en.wikipedia.org/wiki/Metal_(API)https://en.wikipedia.org/wiki/Metal_(API)bug-unconfirmed