Misc. bug: Metal_Mapped buffer size now incorrectly reporting total on split models (possible memory issues beyond reporting)

### Name and Version

This issue began with build **b6791**.

Any build b6790 or below are unaffected.

### Operating systems

Mac

### Which llama.cpp modules do you know to be affected?

llama-server

### Command line

```shell
./llama-server --host 0.0.0.0 --port 5001 --mlock --ctx-size 65535 --gpu-layers 200 --model /Users/socg/models/GLM-4.5-Air-Q8_0/GLM-4.5-Air-Q8_0-00001-of-00003.gguf --jinja -lv 1 -fa on
```

### Problem description & steps to reproduce

When loading split models on the Mac, the Metal_Mapped model buffer only shows the very last split gguf file. So if there are 3 files, it's only file number 3 that gets shown. In the case of the unsloth GLM 4.5 Air quant, the model loads 3 files: 47GB, 47GB and 13GB file. The output for Metal_Mapped model buffer only shows the 13GB file *(see log output below)*

### Build: b6790
```
ggml_metal_log_allocated_size: allocated buffer, size = 47477.53 MiB, (47477.91 / 147456.00)
ggml_metal_log_allocated_size: allocated buffer, size = 47482.36 MiB, (94960.27 / 147456.00)
ggml_metal_log_allocated_size: allocated buffer, size = 13380.31 MiB, (108340.58 / 147456.00)
load_tensors: offloading 47 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 48/48 layers to GPU
load_tensors: Metal_Mapped model buffer size = 47477.52 MiB
load_tensors: Metal_Mapped model buffer size = 47482.35 MiB
load_tensors: Metal_Mapped model buffer size = 13380.29 MiB
load_tensors:   CPU_Mapped model buffer size =   629.00 MiB
```


### Build: b6791
```
ggml_metal_log_allocated_size: allocated buffer, size = 47477.53 MiB, (47477.91 / 147456.00)
ggml_metal_log_allocated_size: allocated buffer, size = 47482.36 MiB, (94960.27 / 147456.00)
ggml_metal_log_allocated_size: allocated buffer, size = 13380.31 MiB, (108340.58 / 147456.00)
load_tensors: offloading 47 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 48/48 layers to GPU
load_tensors: Metal_Mapped model buffer size = 13380.29 MiB
load_tensors:   CPU_Mapped model buffer size =   629.00 MiB
```

### First Bad Commit

Commit `66b0dbc`

PR 16581: https://github.com/ggml-org/llama.cpp/pull/16581

https://github.com/ggml-org/llama.cpp/commit/66b0dbcb2d462e7b70ba5a69ee8c3899ac2efb1c

### Relevant log output

```shell
ggml_metal_log_allocated_size: allocated buffer, size = 47477.53 MiB, (47477.91 / 147456.00)
ggml_metal_log_allocated_size: allocated buffer, size = 47482.36 MiB, (94960.27 / 147456.00)
ggml_metal_log_allocated_size: allocated buffer, size = 13380.31 MiB, (108340.58 / 147456.00)
load_tensors: offloading 47 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 48/48 layers to GPU
load_tensors: Metal_Mapped model buffer size = 13380.29 MiB
load_tensors:   CPU_Mapped model buffer size =   629.00 MiB
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: Metal_Mapped buffer size now incorrectly reporting total on split models (possible memory issues beyond reporting) #16762

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

Build: b6790

Build: b6791

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Misc. bug: Metal_Mapped buffer size now incorrectly reporting total on split models (possible memory issues beyond reporting) #16762

Description

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

Build: b6790

Build: b6791

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions