♻️ use fp8 model for testing SB + CB #359

prashantgupta24 · 2025-08-04T19:18:38Z

Description

The general idea is that the FP8 model can be used in all SB/CB scenarios.

This requires unreleased changes from fms and fms-mo in order to work:

Change how quantized layers are sharded during Tensor Parallel to support FP8 and other cases foundation-model-stack/foundation-model-stack#457
fix: FP8 TP fixes foundation-model-stack/fms-model-optimizer#176

While the tests may pass on cpu, they will continue to fail on spyre

Related Issues

Also addresses: #356

Signed-off-by: Prashant Gupta <[email protected]>

github-actions · 2025-08-04T19:18:48Z

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

Signed-off-by: Prashant Gupta <[email protected]>

prashantgupta24 · 2025-08-04T20:40:35Z

tests/e2e/test_spyre_online.py

I did a major refactoring on this file just because we were using the same logic for the tests but different parameters. Do double check and let me know if I missed something!

prashantgupta24 · 2025-08-04T20:41:33Z

tests/e2e/test_spyre_online.py

    assert len(completion.choices) == 2
    assert len(completion.choices[0].text) > 0

+    # rest are SB tests


I think FMS handles all this? Signed-off-by: Prashant Gupta <[email protected]>

Signed-off-by: Prashant Gupta <[email protected]>

.github/workflows/test.yml

Signed-off-by: Prashant Gupta <[email protected]>

prashantgupta24 · 2025-08-05T17:14:29Z

.github/workflows/test.yml

+          if [ "${{ matrix.test_suite.name }}" == "quantized" ]; then
+            export VLLM_SPYRE_TEST_MODEL_LIST="ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8"
+          fi
+       


I guess a better way to do this would be to select the quantized model when using -m quantized 🤷

Can be made specific to quantization! Signed-off-by: Prashant Gupta <[email protected]>

Thanks to Antoni! Signed-off-by: Prashant Gupta <[email protected]>

It's broken Signed-off-by: Prashant Gupta <[email protected]>

Signed-off-by: Prashant Gupta <[email protected]>

prashantgupta24 · 2025-08-05T21:00:14Z

Seem to run fine on my local M1, apart from that 1 test:

pytest -v -m "cpu and decoder and not multi and not cb" tests/e2e

collected 61 items / 43 deselected / 18 selected                                                                                                                                                                                    

tests/e2e/test_spyre_async_llm.py::test_abort[RequestOutputKind.DELTA-warmup_shapes0-0-eager-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8] PASSED                                                                            [  5%]
tests/e2e/test_spyre_async_llm.py::test_abort[RequestOutputKind.FINAL_ONLY-warmup_shapes0-0-eager-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8] PASSED                                                                       [ 11%]
tests/e2e/test_spyre_basic.py::test_output[max_num_seqs(4)-eager-TP(1)-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8-0] PASSED                                                                                                [ 16%]
tests/e2e/test_spyre_basic.py::test_batch_handling[0-eager-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8] PASSED                                                                                                              [ 22%]
tests/e2e/test_spyre_basic.py::test_full_batch_scheduling[eager-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8] PASSED                                                                                                         [ 27%]
tests/e2e/test_spyre_max_new_tokens.py::test_output[eager-warmup_shape0-True-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8-0] PASSED                                                                                          [ 33%]
tests/e2e/test_spyre_max_new_tokens.py::test_output[eager-warmup_shape0-False-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8-0] PASSED                                                                                         [ 38%]
tests/e2e/test_spyre_online.py::test_openai_serving[max_model_len(256)-max_num_seqs(2)-0-warmup_shape0-eager-TP(1)-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8] PASSED                                                      [ 44%]
tests/e2e/test_spyre_prompt_logprobs.py::test_prompt_logprobs[TP(1)-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8-eager] FAILED                                                                                               [ 50%]
tests/e2e/test_spyre_prompt_logprobs.py::test_prompt_logprobs_must_be_enabled PASSED                                                                                                                                          [ 55%]
tests/e2e/test_spyre_prompt_logprobs.py::test_prompt_logprobs_not_supported_with_cb[ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8] PASSED                                                                                     [ 61%]
tests/e2e/test_spyre_prompt_logprobs.py::test_prompt_logprobs_on_single_requests_only[ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8] PASSED                                                                                   [ 66%]
tests/e2e/test_spyre_seed.py::test_seed[eager-warmup_shape0-42-0.1-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8] PASSED                                                                                                      [ 72%]
tests/e2e/test_spyre_seed.py::test_seed[eager-warmup_shape0-42-1.0-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8] PASSED                                                                                                      [ 77%]
tests/e2e/test_spyre_static_batching_limits.py::test_max_prompt_len_and_new_tokens[eager-warmup_shapes0-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8] PASSED                                                                 [ 83%]
tests/e2e/test_spyre_static_batching_limits.py::test_max_prompt_len_and_new_tokens[eager-warmup_shapes1-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8] PASSED                                                                 [ 88%]
tests/e2e/test_spyre_warmup_shapes.py::test_output[eager-warmup_shapes0-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8] PASSED                                                                                                 [ 94%]
tests/e2e/test_spyre_warmup_shapes.py::test_invalid_prompt_len[eager-warmup_shapes0-prompts0-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8] PASSED

1 failed, 17 passed, 43 deselected in 756.79s (0:12:36)

Signed-off-by: Joe Runde <[email protected]>

tests/e2e/test_spyre_online.py

Signed-off-by: Joe Runde <[email protected]>

maxdebayser

LGTM

Signed-off-by: Joe Runde <[email protected]>

joerunde · 2025-08-08T19:25:59Z

bot:test

joerunde · 2025-08-08T19:38:34Z

bot:test
MARKERS="spyre and quantized"

♻️ use fp8 model for testing SB + CB

86424a2

Signed-off-by: Prashant Gupta <[email protected]>

🎨 typo

1b600fe

Signed-off-by: Prashant Gupta <[email protected]>

prashantgupta24 commented Aug 4, 2025

View reviewed changes

prashantgupta24 added 3 commits August 4, 2025 15:45

🔥 rip out gptq stuff

b3a507f

I think FMS handles all this? Signed-off-by: Prashant Gupta <[email protected]>

🔥 rip out gptq stuff

0939fdf

I think FMS handles all this? Signed-off-by: Prashant Gupta <[email protected]>

🚧 temporarily install fms main

8ac83d5

Signed-off-by: Prashant Gupta <[email protected]>

prashantgupta24 commented Aug 4, 2025

View reviewed changes

.github/workflows/test.yml Show resolved Hide resolved

prashantgupta24 added 5 commits August 4, 2025 16:05

⏪ bring back quantized marker

4f69dfc

Signed-off-by: Prashant Gupta <[email protected]>

🎨 typo?

4aa46f1

Signed-off-by: Prashant Gupta <[email protected]>

🐛 typo

df6e3f0

Signed-off-by: Prashant Gupta <[email protected]>

🐛 linear_type is needed

a8bd0ed

Signed-off-by: Prashant Gupta <[email protected]>

⬆️ ibm-fms 1.2.0

c284fad

Signed-off-by: Prashant Gupta <[email protected]>

prashantgupta24 commented Aug 5, 2025

View reviewed changes

prashantgupta24 added 5 commits August 5, 2025 11:58

⬆️ bump tolerance for CPU tests

7bd7a3a

Can be made specific to quantization! Signed-off-by: Prashant Gupta <[email protected]>

🐛 fix up some params

891a655

Thanks to Antoni! Signed-off-by: Prashant Gupta <[email protected]>

🚧 don't run TP with FP

baa2b80

It's broken Signed-off-by: Prashant Gupta <[email protected]>

➕ fms-model-optimizer[fp8]

bdb09bd

Signed-off-by: Prashant Gupta <[email protected]>

🚧 omit cb ones for now too

7230e0e

Signed-off-by: Prashant Gupta <[email protected]>

joerunde added 2 commits August 6, 2025 15:58

♻️ Add default fp8 test model

620f98e

Signed-off-by: Joe Runde <[email protected]>

🐛 xfail fp8 tests on spyre

d8fe05b

Signed-off-by: Joe Runde <[email protected]>

joerunde marked this pull request as ready for review August 6, 2025 22:08

joerunde requested review from ckadner, joerunde, rafvasq, sducouedic, tdoublep and yannicks1 as code owners August 6, 2025 22:08

joerunde requested a review from nikolaospapandreou as a code owner August 6, 2025 22:08

joerunde added 4 commits August 7, 2025 11:00

⬆️ bump fms packages

bfe4015

Signed-off-by: Joe Runde <[email protected]>

🐛 separate out CB/SB concerns for fp8

00336bc

Signed-off-by: Joe Runde <[email protected]>

🥅 disable fp8 on CB

545ef84

Signed-off-by: Joe Runde <[email protected]>

🥅 skip fp8 prompt logprobs test

46ede6a

Signed-off-by: Joe Runde <[email protected]>

maxdebayser reviewed Aug 8, 2025

View reviewed changes

tests/e2e/test_spyre_online.py Show resolved Hide resolved

joerunde added 2 commits August 8, 2025 09:44

⚗️ try only basic tests for fp8

5893df1

Signed-off-by: Joe Runde <[email protected]>

⚗️ select one test and check duration

b1186d3

Signed-off-by: Joe Runde <[email protected]>

maxdebayser approved these changes Aug 8, 2025

View reviewed changes

joerunde added 3 commits August 8, 2025 12:25

🐛 update to math_fp8 attention

1f4b8bf

Signed-off-by: Joe Runde <[email protected]>

⚡ limit testing to only tp2 test

5df9f2c

Signed-off-by: Joe Runde <[email protected]>

🎨 lint

51a3913

Signed-off-by: Joe Runde <[email protected]>

joerunde merged commit 1037b62 into main Aug 8, 2025
23 checks passed

joerunde deleted the fp8-model-testing branch August 8, 2025 21:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

♻️ use fp8 model for testing SB + CB #359

♻️ use fp8 model for testing SB + CB #359

Uh oh!

prashantgupta24 commented Aug 4, 2025 •

edited by joerunde

Loading

Uh oh!

github-actions bot commented Aug 4, 2025

Uh oh!

prashantgupta24 Aug 4, 2025

Uh oh!

prashantgupta24 Aug 4, 2025

Uh oh!

Uh oh!

prashantgupta24 Aug 5, 2025

Uh oh!

prashantgupta24 commented Aug 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

maxdebayser left a comment

Uh oh!

joerunde commented Aug 8, 2025

Uh oh!

joerunde commented Aug 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

♻️ use fp8 model for testing SB + CB #359

♻️ use fp8 model for testing SB + CB #359

Uh oh!

Conversation

prashantgupta24 commented Aug 4, 2025 • edited by joerunde Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Uh oh!

github-actions bot commented Aug 4, 2025

Uh oh!

prashantgupta24 Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

prashantgupta24 Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

prashantgupta24 Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

prashantgupta24 commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

maxdebayser left a comment

Choose a reason for hiding this comment

Uh oh!

joerunde commented Aug 8, 2025

Uh oh!

joerunde commented Aug 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

prashantgupta24 commented Aug 4, 2025 •

edited by joerunde

Loading

prashantgupta24 commented Aug 5, 2025 •

edited

Loading