Skip to content

Misc. bug: Metal_Mapped buffer size now incorrectly reporting total on split models (possible memory issues beyond reporting) #16762

@SomeOddCodeGuy

Description

@SomeOddCodeGuy

Name and Version

This issue began with build b6791.

Any build b6790 or below are unaffected.

Operating systems

Mac

Which llama.cpp modules do you know to be affected?

llama-server

Command line

./llama-server --host 0.0.0.0 --port 5001 --mlock --ctx-size 65535 --gpu-layers 200 --model /Users/socg/models/GLM-4.5-Air-Q8_0/GLM-4.5-Air-Q8_0-00001-of-00003.gguf --jinja -lv 1 -fa on

Problem description & steps to reproduce

When loading split models on the Mac, the Metal_Mapped model buffer only shows the very last split gguf file. So if there are 3 files, it's only file number 3 that gets shown. In the case of the unsloth GLM 4.5 Air quant, the model loads 3 files: 47GB, 47GB and 13GB file. The output for Metal_Mapped model buffer only shows the 13GB file (see log output below)

Build: b6790

ggml_metal_log_allocated_size: allocated buffer, size = 47477.53 MiB, (47477.91 / 147456.00)
ggml_metal_log_allocated_size: allocated buffer, size = 47482.36 MiB, (94960.27 / 147456.00)
ggml_metal_log_allocated_size: allocated buffer, size = 13380.31 MiB, (108340.58 / 147456.00)
load_tensors: offloading 47 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 48/48 layers to GPU
load_tensors: Metal_Mapped model buffer size = 47477.52 MiB
load_tensors: Metal_Mapped model buffer size = 47482.35 MiB
load_tensors: Metal_Mapped model buffer size = 13380.29 MiB
load_tensors:   CPU_Mapped model buffer size =   629.00 MiB

Build: b6791

ggml_metal_log_allocated_size: allocated buffer, size = 47477.53 MiB, (47477.91 / 147456.00)
ggml_metal_log_allocated_size: allocated buffer, size = 47482.36 MiB, (94960.27 / 147456.00)
ggml_metal_log_allocated_size: allocated buffer, size = 13380.31 MiB, (108340.58 / 147456.00)
load_tensors: offloading 47 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 48/48 layers to GPU
load_tensors: Metal_Mapped model buffer size = 13380.29 MiB
load_tensors:   CPU_Mapped model buffer size =   629.00 MiB

First Bad Commit

Commit 66b0dbc

PR 16581: #16581

66b0dbc

Relevant log output

ggml_metal_log_allocated_size: allocated buffer, size = 47477.53 MiB, (47477.91 / 147456.00)
ggml_metal_log_allocated_size: allocated buffer, size = 47482.36 MiB, (94960.27 / 147456.00)
ggml_metal_log_allocated_size: allocated buffer, size = 13380.31 MiB, (108340.58 / 147456.00)
load_tensors: offloading 47 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 48/48 layers to GPU
load_tensors: Metal_Mapped model buffer size = 13380.29 MiB
load_tensors:   CPU_Mapped model buffer size =   629.00 MiB

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions