Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions provider/src/inference.rs
Original file line number Diff line number Diff line change
Expand Up @@ -331,6 +331,13 @@ for _name in (
/// OpenAI-compatible features (chat templates, tool calling, structured
/// output) without starting an HTTP server.
fn load_vllm_mlx(&self, py: Python<'_>) -> Result<()> {
// Enforce both security layers in the same GIL scope that runs vllm_mlx.
// lock_python_path was already called in detect_engine(), but re-running it
// here is safe (idempotent) and ensures the blocker is installed in the
// same interpreter state that will execute the model load.
Self::lock_python_path(py)?;
Self::block_dangerous_modules(py)?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Delay dangerous-module blocking until model is loaded

Calling block_dangerous_modules() before _load_model() can break cold-start model loads when the model is not already cached: this function explicitly documents that load() may download weights if needed, and the new blocker removes/blocks socket and subprocess imports that Python download stacks rely on. In that scenario, load_vllm_mlx will fail before the engine initializes, so first-run providers cannot come up unless models are pre-downloaded.

Useful? React with 👍 / 👎.


let model = serde_json::to_string(&self.model_id).context("invalid model path")?;
let cache_key = serde_json::to_string(&self.cache_key).context("invalid cache key")?;
let code = format!(
Expand Down
Loading