-
Notifications
You must be signed in to change notification settings - Fork 31
♻️ use fp8 model for testing SB + CB #359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Prashant Gupta <[email protected]>
|
👋 Hi! Thank you for contributing to vLLM support on Spyre. Or this can be done with Now you are good to go 🚀 |
Signed-off-by: Prashant Gupta <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a major refactoring on this file just because we were using the same logic for the tests but different parameters. Do double check and let me know if I missed something!
| assert len(completion.choices) == 2 | ||
| assert len(completion.choices[0].text) > 0 | ||
|
|
||
| # rest are SB tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think?
I think FMS handles all this? Signed-off-by: Prashant Gupta <[email protected]>
I think FMS handles all this? Signed-off-by: Prashant Gupta <[email protected]>
Signed-off-by: Prashant Gupta <[email protected]>
Signed-off-by: Prashant Gupta <[email protected]>
Signed-off-by: Prashant Gupta <[email protected]>
Signed-off-by: Prashant Gupta <[email protected]>
Signed-off-by: Prashant Gupta <[email protected]>
Signed-off-by: Prashant Gupta <[email protected]>
| if [ "${{ matrix.test_suite.name }}" == "quantized" ]; then | ||
| export VLLM_SPYRE_TEST_MODEL_LIST="ibm-ai-platform/micro-g3.3-8b-instruct-1b-FP8" | ||
| fi | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess a better way to do this would be to select the quantized model when using -m quantized 🤷
Can be made specific to quantization! Signed-off-by: Prashant Gupta <[email protected]>
Thanks to Antoni! Signed-off-by: Prashant Gupta <[email protected]>
It's broken Signed-off-by: Prashant Gupta <[email protected]>
Signed-off-by: Prashant Gupta <[email protected]>
Signed-off-by: Prashant Gupta <[email protected]>
|
Seem to run fine on my local M1, apart from that 1 test: |
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
maxdebayser
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
Signed-off-by: Joe Runde <[email protected]>
|
bot:test |
|
bot:test |
Description
The general idea is that the FP8 model can be used in all SB/CB scenarios.
This requires unreleased changes from fms and fms-mo in order to work:
While the tests may pass on cpu, they will continue to fail on spyre
Related Issues
Also addresses: #356