baidu · zhihui96 · Jan 26, 2026
diff --git a/docs/source/user_guide/configuration/env_vars.md b/docs/source/user_guide/configuration/env_vars.md
@@ -14,4 +14,5 @@ vllm-kunlun uses the following environment variables to configure the system:
 | `export XMLIR_FORCE_USE_XPU_GRAPH`       | `1`               | ***\*Forces the enablement of XPU Graph mode.\****. This can capture and optimize the model execution graph, significantly boosting inference performance. |
 | `export VLLM_HOST_IP`                    | `$(hostname -i)`  | ***\*Sets the host IP address for the vLLM service\****. This uses a shell command to dynamically get the current host's internal IP. It's used for inter-node communication in a distributed environment. |
 | `export XMLIR_ENABLE_MOCK_TORCH_COMPILE` | `false`           | ***\*Disable Mock Torch Compile Function\****. Set to `false` to ensure the actual compilation and optimization flow is used, rather than mock mode. |
-| `FUSED_QK_ROPE_OP`                           | `0`               | ***\*Control whether to use the Fused QK-Norm and RoPE implementation\****. Default is `0` (use original/standard RoPE). Setting to `1` may be used to enable QWEN3. |
+| `FUSED_QK_ROPE_OP`                           | `0`               | ***\*Control whether to use the Fused QK-Norm and RoPE implementation\****. Default is `0` (use original/standard RoPE). Setting to `1` may be used to enable QWEN3. |
+| `VLLM_KUNLUN_ENABLE_INT8_BMM`            | `1`               | ***\*Control whether to enable int8 bmm\****. Default is `0`. Setting to `1` can save some memory when using int8 quantization. |