Skip to content

Autopatch fails when ENABLE_KVCACHED is set inside a Python script (requires explicit from kvcached import autopatch) #320

@ztang2370

Description

@ztang2370

Summary

When kvcached is used from a custom Python script, users must add from kvcached import autopatch before import vllm / import sglang for the patches to take effect. Setting os.environ["ENABLE_KVCACHED"]="1" inside the script does not work, even though the documentation suggests this env var is the toggle. This is confusing and forces a kvcached-specific source-level import in user code.

Related to issue #316.

Reproduction

import os
os.environ["ENABLE_KVCACHED"] = "1"
os.environ["KVCACHED_AUTOPATCH"] = "1"

from vllm.engine.arg_utils import AsyncEngineArgs
from vllm.usage.usage_lib import UsageContext
from vllm.v1.engine.async_llm import AsyncLLM

engine_args = AsyncEngineArgs(
    model="tencent/HunyuanOCR",
    trust_remote_code=True,
    gpu_memory_utilization=0.7,
    max_model_len=4096,
    enable_prefix_caching=False,
    max_num_batched_tokens=8192,
    mm_processor_cache_gb=0,
)
vllm_config = engine_args.create_engine_config(usage_context=UsageContext.OPENAI_API_SERVER)

async_llm = AsyncLLM.from_vllm_config(
    vllm_config=vllm_config,
    usage_context=UsageContext.OPENAI_API_SERVER,
    stat_loggers=None,
    enable_log_requests=engine_args.enable_log_requests,
    aggregate_engine_logging=engine_args.aggregate_engine_logging,
    disable_log_stats=engine_args.disable_log_stats,
)

Expected: vllm is patched by kvcached.
Actual: vllm runs unpatched. Patching only happens if either:

  1. ENABLE_KVCACHED=1 is exported in the shell before launching Python, or
  2. from kvcached import autopatch is added to the script before import vllm.

Root cause

The autopatch entry point is kvcached_autopatch.pth:

# kvcached_autopatch.pth
import os, importlib, importlib.util; (
    os.environ.setdefault("KVCACHED_AUTOPATCH", "1"),
    getattr(importlib.import_module("kvcached.autopatch"), "autopatch_all", lambda: None)()
) if os.getenv("ENABLE_KVCACHED", "false").lower() in ("true", "1")
  and importlib.util.find_spec("kvcached.autopatch") is not None else None

Python processes .pth files at interpreter startup, before any user code runs. So:

Shell-exported ENABLE_KVCACHED=1 → .pth sees it → calls autopatch_all() → registers @when_imported("vllm") / @when_imported("sglang") hooks → patches apply when the user imports vllm/sglang. ✅

os.environ["ENABLE_KVCACHED"]="1" set inside the script → executes after the .pth already short-circuited → autopatch_all() was never called → no when_imported hooks were registered → import vllm triggers nothing. ❌

KVCACHED_AUTOPATCH set inside the script is read later by _env_enabled() in kvcached/integration/vllm/autopatch.py, but it is consulted only by hooks that were never registered — so it has no effect on its own.

Proposed fix

Decouple hook registration (must happen at interpreter startup) from the enable check (should happen at vllm/sglang-import time, so env vars set inside the script are honored).

kvcached_autopatch.pth: always register hooks; drop the ENABLE_KVCACHED gate.

import importlib, importlib.util; importlib.import_module("kvcached.autopatch").autopatch_all() if importlib.util.find_spec("kvcached.autopatch") is not None else None

kvcached/integration/vllm/autopatch.py:_env_enabled and kvcached/integration/sglang/autopatch.py:_env_enabled accept either env var, so ENABLE_KVCACHED works as documented.

def _env_enabled() -> bool:
    return (
        os.getenv("ENABLE_KVCACHED", "false").lower() in ("true", "1")
        or os.getenv("KVCACHED_AUTOPATCH", "false").lower() in ("true", "1")
    )

After this change, setting ENABLE_KVCACHED=1 (or KVCACHED_AUTOPATCH=1) inside the user's script — at any point before import vllm — will work. No source-level from kvcached import autopatch required.

Cost: registering two when_imported hooks at every Python startup on systems where kvcached is installed. Cheap (no vllm/sglang import is triggered) but non-zero. An optional KVCACHED_DISABLE_AUTOPATCH=1 escape hatch in the .pth would preserve a fully-off mode.

Workarounds (current behavior)

  1. Export ENABLE_KVCACHED=1 in the shell before launching Python, or
  2. Add from kvcached import autopatch before any import vllm / import sglang.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions