-
Notifications
You must be signed in to change notification settings - Fork 920
GPT OSS model on GCP C4 #3107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
GPT OSS model on GCP C4 #3107
Conversation
* running gpt-oss on Intel Xeon Signed-off-by: jiqing-feng <[email protected]> * add TTFT image Signed-off-by: jiqing-feng <[email protected]> * add _blog.yml Signed-off-by: jiqing-feng <[email protected]> * minor fix Signed-off-by: jiqing-feng <[email protected]> * fix blog Signed-off-by: jiqing-feng <[email protected]> * fix content Signed-off-by: jiqing-feng <[email protected]> * update thumbnail Signed-off-by: jiqing-feng <[email protected]> * update expert parallelism diagram Signed-off-by: jiqing-feng <[email protected]> * fix model name and model link Signed-off-by: jiqing-feng <[email protected]> * fix result image links Signed-off-by: jiqing-feng <[email protected]> * fix script Signed-off-by: jiqing-feng <[email protected]> * update results Signed-off-by: jiqing-feng <[email protected]> --------- Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
@kding1 @IlyasMoutawwakil , pls help review, thx very much |
## Results | ||
### Normalized Throughput per vCPU | ||
Across batch sizes up to 64, Intel Xeon 6 processor‑powered `C4` consistently outperforms `C3` with a 1.4x ~ 1.7× throughput per-vCPU. The formula is: | ||
|
||
$$normalized\\_throughput\\_per\\_vCPU = (throughput\\_C4 / vCPUs\\_C4) / (throughput\\_C3 / vCPUs\\_C3)$$ | ||
|
||
<kbd> | ||
<img src="assets/gpt-oss-on-intel-xeon/throughput-gpt-oss-per-vcpu.png"> | ||
</kbd> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to also include throughput numbers ? I understand that normalized throughput has increased but I don't see from where to where, as in what throughput should one expect from deployment on this cpu instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ! I think it can be worth it to add un-normalized metrics as well, like TTFT, TPS and TPOT.
This blog shows: Intel Granite Rapids (C4) provides both performance gains and better cost efficiency for large MoE inference than Sapphire Rapids (C3).