add envs VLLM_KUNLUN_ENABLE_INT8_BMM to docs by zhihui96 · Pull Request #150 · baidu/vLLM-Kunlun

zhihui96 · 2026-01-26T02:39:56Z

PR Description

add envs VLLM_KUNLUN_ENABLE_INT8_BMM to docs

Checklist (Required)

Before submitting this PR, please ensure that all the following items are completed:

All code changes pass the pre-commit checks.
Commits are signed off using git commit -s.
The PR title is properly classified (see below).

PR Type

Please prefix the PR title with one or more of the following labels to help reviewers quickly understand the nature of the change:

[Feature] – New features or enhancements (e.g. Attention, Communicator, Kernel, Worker, etc.)
[Bugfix] – Bug fixes
[CI/Build] – CI, build system, or infrastructure improvements
[Doc] – Documentation updates or fixes
[Misc] – Other changes that do not fit the above categories (use sparingly)

Note: If the PR spans multiple categories, include all relevant prefixes.

Detailed Checklist (Click to Expand)

Thank you for contributing to vLLM Kunlun! To help us maintain high code quality and streamline the review process, please ensure your PR meets the following requirements.

1. Code Quality

All linting and formatting checks pass (pre-commit).
The code is well-structured and sufficiently documented.
The change is designed with maintainability and readability in mind.

2. Testing

Relevant unit tests are added or updated.
Integration tests are included when applicable.
Existing tests continue to pass.

3. DCO Compliance

This project follows the Developer Certificate of Origin (DCO).

All commits include a Signed-off-by: line.
Use git commit -s to automatically add the sign-off.

4. Review Expectations

During the review process, maintainers may:

Request code refactoring or additional tests.
Ask for clarifications on design decisions.
Suggest performance, stability, or maintainability improvements.

We appreciate your patience and collaboration throughout the review process!

liwei109 · 2026-01-26T02:45:45Z

docs/source/user_guide/configuration/env_vars.md

 | `export XMLIR_ENABLE_MOCK_TORCH_COMPILE` | `false`           | ***\*Disable Mock Torch Compile Function\****. Set to `false` to ensure the actual compilation and optimization flow is used, rather than mock mode. |
-| `FUSED_QK_ROPE_OP`                           | `0`               | ***\*Control whether to use the Fused QK-Norm and RoPE implementation\****. Default is `0` (use original/standard RoPE). Setting to `1` may be used to enable QWEN3. |
+| `FUSED_QK_ROPE_OP`                           | `0`               | ***\*Control whether to use the Fused QK-Norm and RoPE implementation\****. Default is `0` (use original/standard RoPE). Setting to `1` may be used to enable QWEN3. |
+| `VLLM_KUNLUN_ENABLE_INT8_BMM`            | `1`               | ***\*Control whether to enable int8 bmm\****. Default is `0`. Setting to `1` can save some memory when using int8 quantization. |


If this is useful for INT8, why not set it to be "1" as default?

It's not available for unquant case. Just set it to true when using int8.

I don't think we should use this environment variable to control whether to run quantization. There are standard methods for judging quantization. And I understand that if I run a quantization model but forget to enable this environment variable, it will cause an error?

if VLLM_KUNLUN_ENABLE_INT8_BMM=False, use torch.bmm and flaot16/bfloat16 to do W_UK/W_UV calculation.
Otherwise, use xtorch_ops.mla_bmm_I8 and int8 to do it. For DS V3.1, this don't improve the performance，and just save memory. I think this feature requires more testing and prefer to add a new environment to control the behavior.

liwei109 · 2026-01-26T03:16:09Z

if not isinstance(layer.quant_method, UnquantizedLinearMethod)
You can determine whether to run int8 bmm in this way, instead of adding an environment variable, because only someone who knows the environment variable can run it correctly that way.

add envs VLLM_KUNLUN_ENABLE_INT8_BMM to docs

960b190

liwei109 reviewed Jan 26, 2026

View reviewed changes

liwei109 force-pushed the main branch from 2d1ed16 to 726cefb Compare February 1, 2026 05:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add envs VLLM_KUNLUN_ENABLE_INT8_BMM to docs#150

add envs VLLM_KUNLUN_ENABLE_INT8_BMM to docs#150
zhihui96 wants to merge 1 commit intobaidu:mainfrom
zhihui96:update-doc

zhihui96 commented Jan 26, 2026

Uh oh!

liwei109 Jan 26, 2026

Uh oh!

zhihui96 Jan 26, 2026

Uh oh!

liwei109 Jan 26, 2026

Uh oh!

zhihui96 Jan 26, 2026

Uh oh!

liwei109 commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zhihui96 commented Jan 26, 2026

PR Description

Checklist (Required)

PR Type

1. Code Quality

2. Testing

3. DCO Compliance

4. Review Expectations

Uh oh!

liwei109 Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

zhihui96 Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

liwei109 Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

zhihui96 Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

liwei109 commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants