Autopatch fails when ENABLE_KVCACHED is set inside a Python script (requires explicit from kvcached import autopatch)

## Summary

When kvcached is used from a custom Python script, users must add `from kvcached import autopatch` before `import vllm` / `import sglang` for the patches to take effect. Setting `os.environ["ENABLE_KVCACHED"]="1"` inside the script does not work, even though the documentation suggests this env var is the toggle. This is confusing and forces a kvcached-specific source-level import in user code.

Related to issue https://github.com/ovg-project/kvcached/issues/316.

## Reproduction

```
import os
os.environ["ENABLE_KVCACHED"] = "1"
os.environ["KVCACHED_AUTOPATCH"] = "1"

from vllm.engine.arg_utils import AsyncEngineArgs
from vllm.usage.usage_lib import UsageContext
from vllm.v1.engine.async_llm import AsyncLLM

engine_args = AsyncEngineArgs(
    model="tencent/HunyuanOCR",
    trust_remote_code=True,
    gpu_memory_utilization=0.7,
    max_model_len=4096,
    enable_prefix_caching=False,
    max_num_batched_tokens=8192,
    mm_processor_cache_gb=0,
)
vllm_config = engine_args.create_engine_config(usage_context=UsageContext.OPENAI_API_SERVER)

async_llm = AsyncLLM.from_vllm_config(
    vllm_config=vllm_config,
    usage_context=UsageContext.OPENAI_API_SERVER,
    stat_loggers=None,
    enable_log_requests=engine_args.enable_log_requests,
    aggregate_engine_logging=engine_args.aggregate_engine_logging,
    disable_log_stats=engine_args.disable_log_stats,
)
```

Expected: vllm is patched by kvcached.
Actual: vllm runs unpatched. Patching only happens if either:
1. `ENABLE_KVCACHED=1` is exported in the shell before launching Python, or 
2. `from kvcached import autopatch` is added to the script before `import vllm`.

## Root cause

The autopatch entry point is `kvcached_autopatch.pth`:

```
# kvcached_autopatch.pth
import os, importlib, importlib.util; (
    os.environ.setdefault("KVCACHED_AUTOPATCH", "1"),
    getattr(importlib.import_module("kvcached.autopatch"), "autopatch_all", lambda: None)()
) if os.getenv("ENABLE_KVCACHED", "false").lower() in ("true", "1")
  and importlib.util.find_spec("kvcached.autopatch") is not None else None
```

Python processes .pth files at interpreter startup, before any user code runs. So:

Shell-exported `ENABLE_KVCACHED=1` → .pth sees it → calls `autopatch_all()` → registers `@when_imported("vllm")` / `@when_imported("sglang") `hooks → patches apply when the user imports vllm/sglang. ✅

`os.environ["ENABLE_KVCACHED"]="1"` set inside the script → executes after the .pth already short-circuited → `autopatch_all()` was never called → no `when_imported` hooks were registered → import vllm triggers nothing. ❌

KVCACHED_AUTOPATCH set inside the script is read later by _env_enabled() in `kvcached/integration/vllm/autopatch.py`, but it is consulted only by hooks that were never registered — so it has no effect on its own.

## Proposed fix

Decouple hook registration (must happen at interpreter startup) from the enable check (should happen at vllm/sglang-import time, so env vars set inside the script are honored).

`kvcached_autopatch.pth`: always register hooks; drop the `ENABLE_KVCACHED` gate.

`import importlib, importlib.util; importlib.import_module("kvcached.autopatch").autopatch_all() if importlib.util.find_spec("kvcached.autopatch") is not None else None`

`kvcached/integration/vllm/autopatch.py:_env_enabled` and `kvcached/integration/sglang/autopatch.py:_env_enabled` accept either env var, so `ENABLE_KVCACHED` works as documented.

```
def _env_enabled() -> bool:
    return (
        os.getenv("ENABLE_KVCACHED", "false").lower() in ("true", "1")
        or os.getenv("KVCACHED_AUTOPATCH", "false").lower() in ("true", "1")
    )
```

After this change, setting `ENABLE_KVCACHED=1` (or `KVCACHED_AUTOPATCH=1`) inside the user's script — at any point before import vllm — will work. No source-level `from kvcached import autopatch` required.

Cost: registering two `when_imported` hooks at every Python startup on systems where kvcached is installed. Cheap (no vllm/sglang import is triggered) but non-zero. An optional `KVCACHED_DISABLE_AUTOPATCH=1` escape hatch in the .pth would preserve a fully-off mode.

## Workarounds (current behavior)

1. Export `ENABLE_KVCACHED=1` in the shell before launching Python, or
2. Add `from kvcached import autopatch` before any `import vllm` / `import sglang`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autopatch fails when ENABLE_KVCACHED is set inside a Python script (requires explicit from kvcached import autopatch) #320

Summary

Reproduction

Root cause

Proposed fix

Workarounds (current behavior)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Autopatch fails when ENABLE_KVCACHED is set inside a Python script (requires explicit from kvcached import autopatch) #320

Description

Summary

Reproduction

Root cause

Proposed fix

Workarounds (current behavior)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions