Summary
An attacker‐supplied GGUF model vocabulary can trigger a buffer overflow in llama.cpp’s vocabulary‐loading code. Specifically, the helper _try_copy in llama.cpp/src/vocab.cpp: llama_vocab::impl::token_to_piece() casts a very large size_t token length into an int32_t, causing the length check (if (length < (int32_t)size)) to be bypassed. As a result, memcpy is still called with that oversized size, letting a malicious model overwrite memory beyond the intended buffer. This can lead to arbitrary memory corruption and potential code execution.
Details
The vulnerability lies in the function:
llama.cpp/src/vocab.cpp
  llama_vocab::impl::token_to_piece(llama_token token,
                                    char * buf,
                                    int32_t length,
                                    int32_t lstrip,
                                    bool special) const 
Specifically, the inline helper _try_copy performs a signed comparison against a potentially oversized size_t without handling cases where size_t exceeds INT32_MAX. When that happens, the cast to int32_t wraps into a negative value, causing the length check to be bypassed and leading to an unchecked memcpy.
// File: llama.cpp/src/vocab.cpp (around line 2570)
auto _try_copy = [=](const char * token, size_t size) -> int32_t {
    // 1) Skip up to `lstrip` leading spaces in the token string.
    for (int32_t i = 0; i < lstrip && size && *token == ' '; ++i) {
        token++;
        size--;
    }
    // 2) Bound check (VULNERABLE):
    //    - `length` is the maximum number of bytes the caller promised `buf` can hold (signed int32_t).
    //    - `size` is the unsigned token length (size_t). If size > INT32_MAX, casting to int32_t overflows
    //      and produces a negative value.
    if (length < (int32_t) size) {
        // Intention: return a negative error code when the token is too large to fit.
        // But when size > INT32_MAX:
        //    (int32_t)size becomes a negative integer (e.g. size_t=2,147,483,648 → (int32_t)=−2,147,483,648).
        //    Then (length < negative) is always false, so this branch is skipped.
        return -(int32_t) size;
    }
    // 3) Unchecked memcpy (VULNERABLE):
    //    At this point, even if `size` is far larger than `length`, the code will reach this memcpy,
    //    because the prior check falsely evaluated to false when (int32_t)size wrapped negative.
    //    This copies `size` bytes into `buf`, overrunning the buffer whenever size > length.
    memcpy(buf, token, size);
    // 4) Return the number of bytes copied (signed).
    //    Note: this cast also overflows if size > INT32_MAX, but the overflow has already happened.
    return (int32_t) size;
}; 
Why This Check Fails for Extremely Large Tokens:
- Unsigned size vs. Signed length:
- size is 
size_t (e.g., 64-bit on most platforms). 
- length is 
int32_t (maximum positive value = 2,147,483,647). 
 
- Cast Overflow:
- If 
token_text.size() > INT32_MAX, then (int32_t) size wraps into a negative value (two’s-complement). For example: 
size_t size = 2,147,483,648  // one more than INT32_MAX
(int32_t)size → −2,147,483,648
 
- The comparison if (length < (int32_t) size) becomes effectively if (small_positive < large_negative), which is always false.
 
 
- Unchecked 
memcpy
- Because the bound check is bypassed, the code executes memcpy(buf, token, size).
 
- Even though buf only has room for length bytes, 
memcpy uses the full (very large) size, causing a buffer overflow to the tune of billions of bytes. 
 
Callers and Code Paths
Any “token → string” conversion can overflow if token_text.size() > INT32_MAX. Notable call sites include:
- Model loading (each GGUF token string passes through 
token_to_piece()) 
- Detokenization (
llama_vocab::impl::detokenize(...)) 
- Grammar routines (
llama_grammar_apply_impl, llama_grammar_accept_impl) 
- Sampling & infill (
llama_sampler_infill_apply, etc.) 
- Public API (
llama_token_to_piece(...)) 
As soon as llama.cpp loads the oversized token, it will crash with a buffer‐overflow in _try_copy().

Impact
   
Summary
An attacker‐supplied GGUF model vocabulary can trigger a buffer overflow in llama.cpp’s vocabulary‐loading code. Specifically, the helper
_try_copyinllama.cpp/src/vocab.cpp: llama_vocab::impl::token_to_piece()casts a very largesize_ttoken length into anint32_t, causing the length check (if (length < (int32_t)size)) to be bypassed. As a result, memcpy is still called with that oversized size, letting a malicious model overwrite memory beyond the intended buffer. This can lead to arbitrary memory corruption and potential code execution.Details
The vulnerability lies in the function:
Specifically, the inline helper
_try_copyperforms a signed comparison against a potentially oversizedsize_twithout handling cases wheresize_texceedsINT32_MAX. When that happens, the cast toint32_twraps into a negative value, causing the length check to be bypassed and leading to an unchecked memcpy.Why This Check Fails for Extremely Large Tokens:
size_t(e.g., 64-bit on most platforms).int32_t(maximum positive value =2,147,483,647).token_text.size() > INT32_MAX, then (int32_t) size wraps into a negative value (two’s-complement). For example:memcpymemcpyuses the full (very large) size, causing a buffer overflow to the tune of billions of bytes.Callers and Code Paths
Any “token → string” conversion can overflow if
token_text.size() > INT32_MAX. Notable call sites include:token_to_piece())llama_vocab::impl::detokenize(...))llama_grammar_apply_impl,llama_grammar_accept_impl)llama_sampler_infill_apply, etc.)llama_token_to_piece(...))As soon as llama.cpp loads the oversized token, it will crash with a buffer‐overflow in _try_copy().

Impact
Vulnerability Type
_try_copy().Attack Vector
token_text.size()exceedsINT32_MAX.size_tbypasses the length check and triggers an uncheckedmemcpy.Affected Component
llama_vocab::impl::token_to_piece(), which is invoked by:llama_grammar_apply_impl,llama_grammar_accept_impl)llama_sampler_infill_apply, etc.)llama_token_to_piece())Severity
Consequences
Who Is Impacted
Mitigation & Recommendations
_try_copyso thatlengthandsizeare compared in an unsigned context, for example:sizevalues aboveINT32_MAXcannot bypass the bound check.