Summary
When kvcached is used from a custom Python script, users must add from kvcached import autopatch before import vllm / import sglang for the patches to take effect. Setting os.environ["ENABLE_KVCACHED"]="1" inside the script does not work, even though the documentation suggests this env var is the toggle. This is confusing and forces a kvcached-specific source-level import in user code.
Related to issue #316.
Reproduction
import os
os.environ["ENABLE_KVCACHED"] = "1"
os.environ["KVCACHED_AUTOPATCH"] = "1"
from vllm.engine.arg_utils import AsyncEngineArgs
from vllm.usage.usage_lib import UsageContext
from vllm.v1.engine.async_llm import AsyncLLM
engine_args = AsyncEngineArgs(
model="tencent/HunyuanOCR",
trust_remote_code=True,
gpu_memory_utilization=0.7,
max_model_len=4096,
enable_prefix_caching=False,
max_num_batched_tokens=8192,
mm_processor_cache_gb=0,
)
vllm_config = engine_args.create_engine_config(usage_context=UsageContext.OPENAI_API_SERVER)
async_llm = AsyncLLM.from_vllm_config(
vllm_config=vllm_config,
usage_context=UsageContext.OPENAI_API_SERVER,
stat_loggers=None,
enable_log_requests=engine_args.enable_log_requests,
aggregate_engine_logging=engine_args.aggregate_engine_logging,
disable_log_stats=engine_args.disable_log_stats,
)
Expected: vllm is patched by kvcached.
Actual: vllm runs unpatched. Patching only happens if either:
ENABLE_KVCACHED=1 is exported in the shell before launching Python, or
from kvcached import autopatch is added to the script before import vllm.
Root cause
The autopatch entry point is kvcached_autopatch.pth:
# kvcached_autopatch.pth
import os, importlib, importlib.util; (
os.environ.setdefault("KVCACHED_AUTOPATCH", "1"),
getattr(importlib.import_module("kvcached.autopatch"), "autopatch_all", lambda: None)()
) if os.getenv("ENABLE_KVCACHED", "false").lower() in ("true", "1")
and importlib.util.find_spec("kvcached.autopatch") is not None else None
Python processes .pth files at interpreter startup, before any user code runs. So:
Shell-exported ENABLE_KVCACHED=1 → .pth sees it → calls autopatch_all() → registers @when_imported("vllm") / @when_imported("sglang") hooks → patches apply when the user imports vllm/sglang. ✅
os.environ["ENABLE_KVCACHED"]="1" set inside the script → executes after the .pth already short-circuited → autopatch_all() was never called → no when_imported hooks were registered → import vllm triggers nothing. ❌
KVCACHED_AUTOPATCH set inside the script is read later by _env_enabled() in kvcached/integration/vllm/autopatch.py, but it is consulted only by hooks that were never registered — so it has no effect on its own.
Proposed fix
Decouple hook registration (must happen at interpreter startup) from the enable check (should happen at vllm/sglang-import time, so env vars set inside the script are honored).
kvcached_autopatch.pth: always register hooks; drop the ENABLE_KVCACHED gate.
import importlib, importlib.util; importlib.import_module("kvcached.autopatch").autopatch_all() if importlib.util.find_spec("kvcached.autopatch") is not None else None
kvcached/integration/vllm/autopatch.py:_env_enabled and kvcached/integration/sglang/autopatch.py:_env_enabled accept either env var, so ENABLE_KVCACHED works as documented.
def _env_enabled() -> bool:
return (
os.getenv("ENABLE_KVCACHED", "false").lower() in ("true", "1")
or os.getenv("KVCACHED_AUTOPATCH", "false").lower() in ("true", "1")
)
After this change, setting ENABLE_KVCACHED=1 (or KVCACHED_AUTOPATCH=1) inside the user's script — at any point before import vllm — will work. No source-level from kvcached import autopatch required.
Cost: registering two when_imported hooks at every Python startup on systems where kvcached is installed. Cheap (no vllm/sglang import is triggered) but non-zero. An optional KVCACHED_DISABLE_AUTOPATCH=1 escape hatch in the .pth would preserve a fully-off mode.
Workarounds (current behavior)
- Export
ENABLE_KVCACHED=1 in the shell before launching Python, or
- Add
from kvcached import autopatch before any import vllm / import sglang.
Summary
When kvcached is used from a custom Python script, users must add
from kvcached import autopatchbeforeimport vllm/import sglangfor the patches to take effect. Settingos.environ["ENABLE_KVCACHED"]="1"inside the script does not work, even though the documentation suggests this env var is the toggle. This is confusing and forces a kvcached-specific source-level import in user code.Related to issue #316.
Reproduction
Expected: vllm is patched by kvcached.
Actual: vllm runs unpatched. Patching only happens if either:
ENABLE_KVCACHED=1is exported in the shell before launching Python, orfrom kvcached import autopatchis added to the script beforeimport vllm.Root cause
The autopatch entry point is
kvcached_autopatch.pth:Python processes .pth files at interpreter startup, before any user code runs. So:
Shell-exported
ENABLE_KVCACHED=1→ .pth sees it → callsautopatch_all()→ registers@when_imported("vllm")/@when_imported("sglang")hooks → patches apply when the user imports vllm/sglang. ✅os.environ["ENABLE_KVCACHED"]="1"set inside the script → executes after the .pth already short-circuited →autopatch_all()was never called → nowhen_importedhooks were registered → import vllm triggers nothing. ❌KVCACHED_AUTOPATCH set inside the script is read later by _env_enabled() in
kvcached/integration/vllm/autopatch.py, but it is consulted only by hooks that were never registered — so it has no effect on its own.Proposed fix
Decouple hook registration (must happen at interpreter startup) from the enable check (should happen at vllm/sglang-import time, so env vars set inside the script are honored).
kvcached_autopatch.pth: always register hooks; drop theENABLE_KVCACHEDgate.import importlib, importlib.util; importlib.import_module("kvcached.autopatch").autopatch_all() if importlib.util.find_spec("kvcached.autopatch") is not None else Nonekvcached/integration/vllm/autopatch.py:_env_enabledandkvcached/integration/sglang/autopatch.py:_env_enabledaccept either env var, soENABLE_KVCACHEDworks as documented.After this change, setting
ENABLE_KVCACHED=1(orKVCACHED_AUTOPATCH=1) inside the user's script — at any point before import vllm — will work. No source-levelfrom kvcached import autopatchrequired.Cost: registering two
when_importedhooks at every Python startup on systems where kvcached is installed. Cheap (no vllm/sglang import is triggered) but non-zero. An optionalKVCACHED_DISABLE_AUTOPATCH=1escape hatch in the .pth would preserve a fully-off mode.Workarounds (current behavior)
ENABLE_KVCACHED=1in the shell before launching Python, orfrom kvcached import autopatchbefore anyimport vllm/import sglang.