Skip to content

Conversation

@prashantgupta24
Copy link
Collaborator

@prashantgupta24 prashantgupta24 commented Aug 4, 2025

Description

The general idea is that the FP8 model can be used in all SB/CB scenarios.

This requires unreleased changes from fms and fms-mo in order to work:

While the tests may pass on cpu, they will continue to fail on spyre

Related Issues

Also addresses: #356

@github-actions
Copy link

github-actions bot commented Aug 4, 2025

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

Signed-off-by: Prashant Gupta <[email protected]>
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a major refactoring on this file just because we were using the same logic for the tests but different parameters. Do double check and let me know if I missed something!

assert len(completion.choices) == 2
assert len(completion.choices[0].text) > 0

# rest are SB tests
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think?

I think FMS handles all this?

Signed-off-by: Prashant Gupta <[email protected]>
I think FMS handles all this?

Signed-off-by: Prashant Gupta <[email protected]>
Signed-off-by: Prashant Gupta <[email protected]>
Signed-off-by: Prashant Gupta <[email protected]>
Signed-off-by: Prashant Gupta <[email protected]>
Signed-off-by: Prashant Gupta <[email protected]>
Signed-off-by: Prashant Gupta <[email protected]>
if [ "${{ matrix.test_suite.name }}" == "quantized" ]; then
export VLLM_SPYRE_TEST_MODEL_LIST="ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8"
fi
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess a better way to do this would be to select the quantized model when using -m quantized 🤷

Can be made specific to quantization!

Signed-off-by: Prashant Gupta <[email protected]>
Thanks to Antoni!

Signed-off-by: Prashant Gupta <[email protected]>
It's broken

Signed-off-by: Prashant Gupta <[email protected]>
Signed-off-by: Prashant Gupta <[email protected]>
Signed-off-by: Prashant Gupta <[email protected]>
@prashantgupta24
Copy link
Collaborator Author

prashantgupta24 commented Aug 5, 2025

Seem to run fine on my local M1, apart from that 1 test:

pytest -v -m "cpu and decoder and not multi and not cb" tests/e2e

collected 61 items / 43 deselected / 18 selected                                                                                                                                                                                    

tests/e2e/test_spyre_async_llm.py::test_abort[RequestOutputKind.DELTA-warmup_shapes0-0-eager-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8] PASSED                                                                            [  5%]
tests/e2e/test_spyre_async_llm.py::test_abort[RequestOutputKind.FINAL_ONLY-warmup_shapes0-0-eager-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8] PASSED                                                                       [ 11%]
tests/e2e/test_spyre_basic.py::test_output[max_num_seqs(4)-eager-TP(1)-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8-0] PASSED                                                                                                [ 16%]
tests/e2e/test_spyre_basic.py::test_batch_handling[0-eager-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8] PASSED                                                                                                              [ 22%]
tests/e2e/test_spyre_basic.py::test_full_batch_scheduling[eager-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8] PASSED                                                                                                         [ 27%]
tests/e2e/test_spyre_max_new_tokens.py::test_output[eager-warmup_shape0-True-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8-0] PASSED                                                                                          [ 33%]
tests/e2e/test_spyre_max_new_tokens.py::test_output[eager-warmup_shape0-False-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8-0] PASSED                                                                                         [ 38%]
tests/e2e/test_spyre_online.py::test_openai_serving[max_model_len(256)-max_num_seqs(2)-0-warmup_shape0-eager-TP(1)-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8] PASSED                                                      [ 44%]
tests/e2e/test_spyre_prompt_logprobs.py::test_prompt_logprobs[TP(1)-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8-eager] FAILED                                                                                               [ 50%]
tests/e2e/test_spyre_prompt_logprobs.py::test_prompt_logprobs_must_be_enabled PASSED                                                                                                                                          [ 55%]
tests/e2e/test_spyre_prompt_logprobs.py::test_prompt_logprobs_not_supported_with_cb[ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8] PASSED                                                                                     [ 61%]
tests/e2e/test_spyre_prompt_logprobs.py::test_prompt_logprobs_on_single_requests_only[ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8] PASSED                                                                                   [ 66%]
tests/e2e/test_spyre_seed.py::test_seed[eager-warmup_shape0-42-0.1-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8] PASSED                                                                                                      [ 72%]
tests/e2e/test_spyre_seed.py::test_seed[eager-warmup_shape0-42-1.0-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8] PASSED                                                                                                      [ 77%]
tests/e2e/test_spyre_static_batching_limits.py::test_max_prompt_len_and_new_tokens[eager-warmup_shapes0-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8] PASSED                                                                 [ 83%]
tests/e2e/test_spyre_static_batching_limits.py::test_max_prompt_len_and_new_tokens[eager-warmup_shapes1-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8] PASSED                                                                 [ 88%]
tests/e2e/test_spyre_warmup_shapes.py::test_output[eager-warmup_shapes0-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8] PASSED                                                                                                 [ 94%]
tests/e2e/test_spyre_warmup_shapes.py::test_invalid_prompt_len[eager-warmup_shapes0-prompts0-ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8] PASSED

1 failed, 17 passed, 43 deselected in 756.79s (0:12:36)
                                                                            

@joerunde joerunde marked this pull request as ready for review August 6, 2025 22:08
Copy link
Collaborator

@maxdebayser maxdebayser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@joerunde
Copy link
Collaborator

joerunde commented Aug 8, 2025

bot:test

@joerunde
Copy link
Collaborator

joerunde commented Aug 8, 2025

bot:test
MARKERS="spyre and quantized"

@joerunde joerunde merged commit 1037b62 into main Aug 8, 2025
23 checks passed
@joerunde joerunde deleted the fp8-model-testing branch August 8, 2025 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants