-
Notifications
You must be signed in to change notification settings - Fork 8
Update dependency vllm to v0.11.0 [SECURITY] #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
renovate
wants to merge
1
commit into
main
Choose a base branch
from
renovate/pypi-vllm-vulnerability
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
4781d93
to
a70fcd1
Compare
a70fcd1
to
27c0fb1
Compare
27c0fb1
to
bcb2a4f
Compare
Signed-off-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
bcb2a4f
to
fb53a3d
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
None yet
0 participants
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains the following updates:
==v0.8.4
->==0.11.0
==v0.6.6
->==0.11.0
==v0.6.4
->==0.11.0
==0.8.4
->==0.11.0
==0.6.6
->==0.11.0
==0.6.4
->==0.11.0
phi4mm: Quadratic Time Complexity in Input Token Processing leads to denial of service
CVE-2025-46560 / GHSA-vc6m-hm49-g9qg
More information
Details
Summary
A critical performance vulnerability has been identified in the input preprocessing logic of the multimodal tokenizer. The code dynamically replaces placeholder tokens (e.g., <|audio_|>, <|image_|>) with repeated tokens based on precomputed lengths. Due to inefficient list concatenation operations, the algorithm exhibits quadratic time complexity (O(n²)), allowing malicious actors to trigger resource exhaustion via specially crafted inputs.
Details
Affected Component: input_processor_for_phi4mm function.
https://github.com/vllm-project/vllm/blob/8cac35ba435906fb7eb07e44fe1a8c26e8744f4e/vllm/model_executor/models/phi4mm.py#L1182-L1197
The code modifies the input_ids list in-place using input_ids = input_ids[:i] + tokens + input_ids[i+1:]. Each concatenation operation copies the entire list, leading to O(n) operations per replacement. For k placeholders expanding to m tokens, total time becomes O(kmn), approximating O(n²) in worst-case scenarios.
PoC
Test data demonstrates exponential time growth:
Doubling input size increases runtime by ~4x (consistent with O(n²)).
Impact
Denial-of-Service (DoS): An attacker could submit inputs with many placeholders (e.g., 10,000 <|audio_1|> tokens), causing CPU/memory exhaustion.
Example: 10,000 placeholders → ~100 million operations.
Remediation Recommendations
Precompute all placeholder positions and expansion lengths upfront.
Replace dynamic list concatenation with a single preallocated array.
Severity
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H
References
This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).
vLLM Vulnerable to Remote Code Execution via Mooncake Integration
CVE-2025-32444 / GHSA-hj4w-hm2g-p6w5 / PYSEC-2025-42
More information
Details
Impacted Deployments
Note that vLLM instances that do NOT make use of the mooncake integration are NOT vulnerable.
Description
vLLM integration with mooncake is vaulnerable to remote code execution due to using
pickle
based serialization over unsecured ZeroMQ sockets. The vulnerable sockets were set to listen on all network interfaces, increasing the likelihood that an attacker is able to reach the vulnerable ZeroMQ sockets to carry out an attack.This is a similar to GHSA - x3m8 - f7g5 - qhm7, the problem is in
https://github.com/vllm-project/vllm/blob/32b14baf8a1f7195ca09484de3008063569b43c5/vllm/distributed/kv_transfer/kv_pipe/mooncake_pipe.py#L179
Here recv_pyobj() Contains implicit
pickle.loads()
, which leads to potential RCE.Severity
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:C/C:H/I:H/A:H
References
This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).
vLLM Allows Remote Code Execution via PyNcclPipe Communication Service
CVE-2025-47277 / GHSA-hjq4-87xh-g4fv
More information
Details
Impacted Environments
This issue ONLY impacts environments using the
PyNcclPipe
KV cache transfer integration with the V0 engine. No other configurations are affected.Summary
vLLM supports the use of the
PyNcclPipe
class to establish a peer-to-peer communication domain for data transmission between distributed nodes. The GPU-side KV-Cache transmission is implemented through thePyNcclCommunicator
class, while CPU-side control message passing is handled via thesend_obj
andrecv_obj
methods on the CPU side.A remote code execution vulnerability exists in the
PyNcclPipe
service. Attackers can exploit this by sending malicious serialized data to gain server control privileges.The intention was that this interface should only be exposed to a private network using the IP address specified by the
--kv-ip
CLI parameter. The vLLM documentation covers how this must be limited to a secured network: https://docs.vllm.ai/en/latest/deployment/security.htmlUnfortunately, the default behavior from PyTorch is that the
TCPStore
interface will listen on ALL interfaces, regardless of what IP address is provided. The IP address given was only used as a client-side address to use. vLLM was fixed to use a workaround to force theTCPStore
instance to bind its socket to a specified private interface.This issue was reported privately to PyTorch and they determined that this behavior was intentional.
Details
The
PyNcclPipe
implementation contains a critical security flaw where it directly processes client-provided data usingpickle.loads
, creating an unsafe deserialization vulnerability that can lead to Remote Code Execution.PyNcclPipe
service configured to listen on port18888
when launched:PyNcclPipe
service:The call stack triggering RCE is as follows:
Getshell as follows:
Reporters
This issue was reported independently by three different parties:
Fix
TCPStore
socket to the private interface as configured.Severity
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H
References
This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).
CVE-2025-32444 / GHSA-hj4w-hm2g-p6w5 / PYSEC-2025-42
More information
Details
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.6.5 and prior to 0.8.5, having vLLM integration with mooncake, are vulnerable to remote code execution due to using pickle based serialization over unsecured ZeroMQ sockets. The vulnerable sockets were set to listen on all network interfaces, increasing the likelihood that an attacker is able to reach the vulnerable ZeroMQ sockets to carry out an attack. vLLM instances that do not make use of the mooncake integration are not vulnerable. This issue has been patched in version 0.8.5.
Severity
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H
References
This data is provided by OSV and the PyPI Advisory Database (CC-BY 4.0).
Data exposure via ZeroMQ on multi-node vLLM deployment
CVE-2025-30202 / GHSA-9f8f-2vmf-885j
More information
Details
Impact
In a multi-node vLLM deployment, vLLM uses ZeroMQ for some multi-node communication purposes. The primary vLLM host opens an
XPUB
ZeroMQ socket and binds it to ALL interfaces. While the socket is always opened for a multi-node deployment, it is only used when doing tensor parallelism across multiple hosts.Any client with network access to this host can connect to this
XPUB
socket unless its port is blocked by a firewall. Once connected, these arbitrary clients will receive all of the same data broadcasted to all of the secondary vLLM hosts. This data is internal vLLM state information that is not useful to an attacker.By potentially connecting to this socket many times and not reading data published to them, an attacker can also cause a denial of service by slowing down or potentially blocking the publisher.
Detailed Analysis
The
XPUB
socket in question is created here:https://github.com/vllm-project/vllm/blob/c21b99b91241409c2fdf9f3f8c542e8748b317be/vllm/distributed/device_communicators/shm_broadcast.py#L236-L237
Data is published over this socket via
MessageQueue.enqueue()
which is called byMessageQueue.broadcast_object()
:https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/device_communicators/shm_broadcast.py#L452-L453
https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/device_communicators/shm_broadcast.py#L475-L478
The
MessageQueue.broadcast_object()
method is called by theGroupCoordinator.broadcast_object()
method inparallel_state.py
:https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/parallel_state.py#L364-L366
The broadcast over ZeroMQ is only done if the
GroupCoordinator
was created withuse_message_queue_broadcaster
set toTrue
:https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/parallel_state.py#L216-L219
The only case where
GroupCoordinator
is created withuse_message_queue_broadcaster
is the coordinator for the tensor parallelism group:https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/parallel_state.py#L931-L936
To determine what data is broadcasted to the tensor parallism group, we must continue tracing.
GroupCoordinator.broadcast_object()
is called byGroupCoordinator.broadcoast_tensor_dict()
:https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/parallel_state.py#L489
which is called by
broadcast_tensor_dict()
incommunication_op.py
:https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/distributed/communication_op.py#L29-L34
If we look at
_get_driver_input_and_broadcast()
in the V0worker_base.py
, we'll see how this tensor dict is formed:https://github.com/vllm-project/vllm/blob/790b79750b596043036b9fcbee885827fdd2ef3d/vllm/worker/worker_base.py#L332-L352
but the data actually sent over ZeroMQ is the
metadata_list
portion that is split from thistensor_dict
. The tensor parts are sent viatorch.distributed
and only metadata about those tensors is sent via ZeroMQ.https://github.com/vllm-project/vllm/blob/54a66e5fee4a1ea62f1e4c79a078b20668e408c6/vllm/distributed/parallel_state.py#L61-L83
Patches
Workarounds
Prior to the fix, your options include:
XPUB
socket. Note that port used is random.References
Severity
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H
References
This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).
Remote Code Execution Vulnerability in vLLM Multi-Node Cluster Configuration
CVE-2025-30165 / GHSA-9pcc-gvx5-r5wm
More information
Details
Affected Environments
Note that this issue only affects the V0 engine, which has been off by default since v0.8.0. Further, the issue only applies to a deployment using tensor parallelism across multiple hosts, which we do not expect to be a common deployment pattern.
Since V0 is has been off by default since v0.8.0 and the fix is fairly invasive, we have decided not to fix this issue. Instead we recommend that users ensure their environment is on a secure network in case this pattern is in use.
The V1 engine is not affected by this issue.
Impact
In a multi-node vLLM deployment using the V0 engine, vLLM uses ZeroMQ for some multi-node communication purposes. The secondary vLLM hosts open a
SUB
ZeroMQ socket and connect to anXPUB
socket on the primary vLLM host.https://github.com/vllm-project/vllm/blob/c21b99b91241409c2fdf9f3f8c542e8748b317be/vllm/distributed/device_communicators/shm_broadcast.py#L295-L301
When data is received on this
SUB
socket, it is deserialized withpickle
. This is unsafe, as it can be abused to execute code on a remote machine.https://github.com/vllm-project/vllm/blob/c21b99b91241409c2fdf9f3f8c542e8748b317be/vllm/distributed/device_communicators/shm_broadcast.py#L468-L470
Since the vulnerability exists in a client that connects to the primary vLLM host, this vulnerability serves as an escalation point. If the primary vLLM host is compromised, this vulnerability could be used to compromise the rest of the hosts in the vLLM deployment.
Attackers could also use other means to exploit the vulnerability without requiring access to the primary vLLM host. One example would be the use of ARP cache poisoning to redirect traffic to a malicious endpoint used to deliver a payload with arbitrary code to execute on the target machine.
Severity
CVSS:3.1/AV:A/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H
References
This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).
CVE-2025-46570 / GHSA-4qjh-9fv9-r85r / PYSEC-2025-53
More information
Details
vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.9.0, when a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). These timing differences caused by matching chunks are significant enough to be recognized and exploited. This issue has been patched in version 0.9.0.
Severity
Unknown
References
This data is provided by OSV and the PyPI Advisory Database (CC-BY 4.0).
vLLM DOS: Remotely kill vllm over http with invalid JSON schema
CVE-2025-48942 / GHSA-6qc9-v4r8-22xg / PYSEC-2025-54
More information
Details
Summary
Hitting the /v1/completions API with a invalid json_schema as a Guided Param will kill the vllm server
Details
The following API call
(venv) [derekh@ip-172-31-15-108 ]$ curl -s http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "meta-llama/Llama-3.2-3B-Instruct","prompt": "Name two great reasons to visit Sligo ", "max_tokens": 10, "temperature": 0.5, "guided_json":"{\"properties\":{\"reason\":{\"type\": \"stsring\"}}}"}'
will provoke a Uncaught exceptions from xgrammer in
./lib64/python3.11/site-packages/xgrammar/compiler.py
Issue with more information: https://github.com/vllm-project/vllm/issues/17248
PoC
Make a call to vllm with invalid json_scema e.g.
{\"properties\":{\"reason\":{\"type\": \"stsring\"}}}
curl -s http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "meta-llama/Llama-3.2-3B-Instruct","prompt": "Name two great reasons to visit Sligo ", "max_tokens": 10, "temperature": 0.5, "guided_json":"{\"properties\":{\"reason\":{\"type\": \"stsring\"}}}"}'
Impact
vllm crashes
example traceback
Fix
Severity
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H
References
This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).
vLLM has a Regular Expression Denial of Service (ReDoS, Exponential Complexity) Vulnerability in
pythonic_tool_parser.py
CVE-2025-48887 / GHSA-w6q7-j642-7c25 / PYSEC-2025-50
More information
Details
Summary
A Regular Expression Denial of Service (ReDoS) vulnerability exists in the file
vllm/entrypoints/openai/tool_parsers/pythonic_tool_parser.py
of the vLLM project. The root cause is the use of a highly complex and nested regular expression for tool call detection, which can be exploited by an attacker to cause severe performance degradation or make the service unavailable.Details
The following regular expression is used to match tool/function call patterns:
This pattern contains multiple nested quantifiers (
*
,+
), optional groups, and inner repetitions which make it vulnerable to catastrophic backtracking.Attack Example:
A malicious input such as
can cause the regular expression engine to consume CPU exponentially with the input length, effectively freezing or crashing the server (DoS).
Proof of Concept:
A Python script demonstrates that matching such a crafted string with the above regex results in exponential time complexity. Even moderate input lengths can bring the system to a halt.
Impact
Fix
Severity
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H
References
This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).
CVE-2025-48942 / GHSA-6qc9-v4r8-22xg / PYSEC-2025-54
More information
Details
vLLM is an inference and serving engine for large language models (LLMs). In versions 0.8.0 up to but excluding 0.9.0, hitting the /v1/completions API with a invalid json_schema as a Guided Param kills the vllm server. This vulnerability is similar GHSA-9hcf-v7m4-6m2j/CVE-2025-48943, but for regex instead of a JSON schema. Version 0.9.0 fixes the issue.
Severity
Unknown
References
This data is provided by OSV and the PyPI Advisory Database (CC-BY 4.0).
CVE-2025-48943 / GHSA-9hcf-v7m4-6m2j / PYSEC-2025-55
More information
Details
vLLM is an inference and serving engine for large language models (LLMs). Version 0.8.0 up to but excluding 0.9.0 have a Denial of Service (ReDoS) that causes the vLLM server to crash if an invalid regex was provided while using structured output. This vulnerability is similar to GHSA-6qc9-v4r8-22xg/CVE-2025-48942, but for regex instead of a JSON schema. Version 0.9.0 fixes the issue.
Severity
Unknown
References
This data is provided by OSV and the PyPI Advisory Database (CC-BY 4.0).
vLLM vulnerable to Regular Expression Denial of Service
GHSA-j828-28rj-hfhp
More information
Details
Summary
A recent review identified several regular expressions in the vllm codebase that are susceptible to Regular Expression Denial of Service (ReDoS) attacks. These patterns, if fed with crafted or malicious input, may cause severe performance degradation due to catastrophic backtracking.
1. vllm/lora/utils.py Line 173
https://github.com/vllm-project/vllm/blob/2858830c39da0ae153bc1328dbba7680f5fbebe1/vllm/lora/utils.py#L173
Risk Description:
r"\((.*?)\)\$?$"
matches content inside parentheses. If input such as((((a|)+)+)+)
is passed in, it can cause catastrophic backtracking, leading to a ReDoS vulnerability..*?
(non-greedy match) inside group parentheses can be highly sensitive to input length and nesting complexity.Remediation Suggestions:
2. vllm/entrypoints/openai/tool_parsers/phi4mini_tool_parser.py Line 52
https://github.com/vllm-project/vllm/blob/2858830c39da0ae153bc1328dbba7680f5fbebe1/vllm/entrypoints/openai/tool_parsers/phi4mini_tool_parser.py#L52
Risk Description:
r'functools\[(.*?)\]'
uses.*?
to match content inside brackets, together withre.DOTALL
. If the input contains a large number of nested or crafted brackets, it can cause backtracking and ReDoS.Remediation Suggestions:
model_output
.re.finditer()
and enforce a length constraint on each match.3. vllm/entrypoints/openai/serving_chat.py Line 351
https://github.com/vllm-project/vllm/blob/2858830c39da0ae153bc1328dbba7680f5fbebe1/vllm/entrypoints/openai/serving_chat.py#L351
Risk Description:
r'.*"parameters":\s*(.*)'
can trigger backtracking ifcurrent_text
is very long and contains repeated structures..*
matching any content is high risk.Remediation Suggestions:
current_text
length..*
to capture large blocks of text; prefer structured parsing when possible.4. benchmarks/benchmark_serving_structured_output.py Line 650
https://github.com/vllm-project/vllm/blob/2858830c39da0ae153bc1328dbba7680f5fbebe1/benchmarks/benchmark_serving_structured_output.py#L650
Risk Description:
r'\{.*\}'
is used to extract JSON inside curly braces. If theactual
string is very long with unbalanced braces, it can cause backtracking, leading to a ReDoS vulnerability.Remediation Suggestions:
actual
.{
and}
or use a robust JSON extraction tool.Fix
Severity
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:L
References
This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).
vLLM has a Weakness in MultiModalHasher Image Hashing Implementation
CVE-2025-46722 / GHSA-c65p-x677-fgj6 / PYSEC-2025-43
More information
Details
Summary
In the file
vllm/multimodal/hasher.py
, theMultiModalHasher
class has a security and data integrity issue in its image hashing method. Currently, it serializesPIL.Image.Image
objects using onlyobj.tobytes()
, which returns only the raw pixel data, without including metadata such as the image’s shape (width, height, mode). As a result, two images of different sizes (e.g., 30x100 and 100x30) with the same pixel byte sequence could generate the same hash value. This may lead to hash collisions, incorrect cache hits, and even data leakage or security risks.Details
vllm/multimodal/hasher.py
MultiModalHasher.serialize_item
https://github.com/vllm-project/vllm/blob/9420a1fc30af1a632bbc2c66eb8668f3af41f026/vllm/multimodal/hasher.py#L34-L35
Image.Image
instances, onlyobj.tobytes()
is used for hashing.obj.tobytes()
does not include the image’s width, height, or mode metadata.Recommendation
In the
serialize_item
method, serialization ofImage.Image
objects should include not only pixel data, but also all critical metadata—such as dimensions (size
), color mode (mode
), format, and especially theinfo
dictionary. Theinfo
dictionary is particularly important in palette-based images (e.g., mode'P'
), where the palette itself is stored ininfo
. Ignoringinfo
can result in hash collisions between visually distinct images with the same pixel bytes but different palettes or metadata. This can lead to incorrect cache hits or even data leakage.Summary:
Serializing only the raw pixel data is insecure. Always include all image metadata (
size
,mode
,format
,info
) in the hash calculation to prevent collisions, especially in cases like palette-based images.Impact for other modalities
For the influence of other modalities, since the video modality is transformed into a multi-dimensional array containing the length, width, time, etc. of the video, the same problem exists due to the incorrect sequence of numpy as well.
For audio, since the momo function is not enabled in librosa.load, the loaded audio is automatically encoded into single channels by librosa and returns a one-dimensional array of numpy, thus keeping the structure of numpy fixed and not affected by this issue.
Fixes
Severity
CVSS:3.1/AV:N/AC:H/PR:L/UI:N/S:U/C:L/I:N/A:L
References
This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).
Potential Timing Side-Channel Vulnerability in vLLM’s Chunk-Based Prefix Caching
CVE-2025-46570 / GHSA-4qjh-9fv9-r85r / PYSEC-2025-53
More information
Details
This issue arises from the prefix caching mechanism, which may expose the system to a timing side-channel attack.
Description
When a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). Our tests revealed that the timing differences caused by matching chunks are significant enough to be recognized and exploited.
For instance, if the victim has submitted a sensitive prompt or if a valuable system prompt has been cached, an attacker sharing the same backend could attempt to guess the victim's input. By measuring the TTFT based on prefix matches, the attacker could verify if their guess is correct, leading to potential leakage of private information.
Unlike token-by-token sharing mechanisms, vLLM’s chunk-based approach (PageAttention) processes tokens in larger units (chunks). In our tests, with chunk_size=2, the timing differences became noticeable enough to allow attackers to infer whether portions of their input match the victim's prompt at the chunk level.
Environment
Configuration: We launched vLLM using the default settings and adjusted chunk_size=2 to evaluate the TTFT.
Leakage
We conducted our tests using LLaMA2-70B-GPTQ on a single device. We analyzed the timing differences when prompts shared prefixes of 2 chunks, and plotted the corresponding ROC curves. Our results suggest that timing differences can be reliably used to distinguish prefix matches, demonstrating a potential side-channel vulnerability.

Results
In our experiment, we analyzed the response time differences between cache hits and misses in vLLM's PageAttention mechanism. Using ROC curve analysis to assess the distinguishability of these timing differences, we observed the following results:
Fixes
Severity
CVSS:3.1/AV:N/AC:H/PR:L/UI:R/S:U/C:L/I:N/A:N
References
This data is provided by OSV and the GitHub Advisory Database (CC-BY 4.0).
vLLM Tool Schema allows DoS via Malformed pattern and type Fields
CVE-2025-48944 / GHSA-vrq3-r879-7m65
More information
Details
Summary
The vLLM backend used with the /v1/chat/completions OpenAPI endpoint fails to validate unexpected or malformed input in the "pattern" and "type" fields when the tools functionality is invoked. These inputs are not validated before being compiled or parsed, causing a crash of the inference worker with a single request. The worker will remain down until it is restarted.
Details
The "type" field is expected to be one of: "string", "number", "object", "boolean", "array", or "null". Supplying any other value will cause the worker to crash with the following error:
RuntimeError: [11:03:34] /project/cpp/json_schema_converter.cc:637: Unsupported type "something_or_nothing"
The "pattern" field undergoes Jinja2 rendering (I think) prior to being passed unsafely into the native regex compiler without validation or escaping. This allows malformed expressions to reach the underlying C++ regex engine, resulting in fatal errors.
For example, the following inputs will crash the worker:
Unclosed {, [, or (
Closed:{} and []
Here are some of runtime errors on the crash depending on what gets injected:
RuntimeError: [12:05:04] /project/cpp/regex_converter.cc:73: Regex parsing error at position 4: The parenthesis is not closed.
RuntimeError: [10:52:27] /project/cpp/regex_converter.cc:73: Regex parsing error at position 2: Invalid repetition count.
RuntimeError: [12:07:18] /project/cpp/regex_converter.cc:73: Regex parsing error at position 6: Two consecutive repetition modifiers are not allowed.
PoC
Here is the POST request using the type field to crash the worker. Note the type field is set to "something" rather than the expected types it is looking for:
POST /v1/chat/completions HTTP/1.1
Host:
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:138.0) Gecko/20100101 Firefox/138.0
Accept: application/json
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer:
Content-Type: application/json
Content-Length: 579
Origin:
Sec-Fetch-Dest: empty
Sec-Fetch-Mode: cors
Sec-Fetch-Site: same-origin
Priority: u=0
Te: trailers
Connection: keep-alive
{
"model": "mistral-nemo-instruct",
"messages": [{ "role": "user", "content": "crash via type" }],
"tools": [
{
"type": "function",
"function": {
"name": "crash01",
"parameters": {
"type": "object",
"properties": {
"a": {
"type": "something"
}
}
}
}
}
],
"tool_choice": {
"type": "function",
"function": {
"name": "crash01",
"arguments": { "a": "test" }
}
},
"stream": false,
"max_tokens": 1
}
Here is the POST request using the pattern field to crash the worker. Note the pattern field is set to a RCE payload, it could have just been set to {{}}. I was not able to get RCE in my testing, but is does crash the worker.
POST /v1/chat/completions HTTP/1.1
Host:
User-Ag