Skip to content

Conversation

jiqing-feng
Copy link

This blog shows: Intel Granite Rapids (C4) provides both performance gains and better cost efficiency for large MoE inference than Sapphire Rapids (C3).

* running gpt-oss on Intel Xeon

Signed-off-by: jiqing-feng <[email protected]>

* add TTFT image

Signed-off-by: jiqing-feng <[email protected]>

* add _blog.yml

Signed-off-by: jiqing-feng <[email protected]>

* minor fix

Signed-off-by: jiqing-feng <[email protected]>

* fix blog

Signed-off-by: jiqing-feng <[email protected]>

* fix content

Signed-off-by: jiqing-feng <[email protected]>

* update thumbnail

Signed-off-by: jiqing-feng <[email protected]>

* update expert parallelism diagram

Signed-off-by: jiqing-feng <[email protected]>

* fix model name and model link

Signed-off-by: jiqing-feng <[email protected]>

* fix result image links

Signed-off-by: jiqing-feng <[email protected]>

* fix script

Signed-off-by: jiqing-feng <[email protected]>

* update results

Signed-off-by: jiqing-feng <[email protected]>

---------

Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
Signed-off-by: jiqing-feng <[email protected]>
@jiqing-feng jiqing-feng marked this pull request as draft September 30, 2025 14:46
@jiqing-feng jiqing-feng marked this pull request as ready for review September 30, 2025 14:46
@yao-matrix
Copy link
Contributor

@kding1 @IlyasMoutawwakil , pls help review, thx very much

Comment on lines +174 to +182
## Results
### Normalized Throughput per vCPU
Across batch sizes up to 64, Intel Xeon 6 processor‑powered `C4` consistently outperforms `C3` with a 1.4x ~ 1.7× throughput per-vCPU. The formula is:

$$normalized\\_throughput\\_per\\_vCPU = (throughput\\_C4 / vCPUs\\_C4) / (throughput\\_C3 / vCPUs\\_C3)$$

<kbd>
<img src="assets/gpt-oss-on-intel-xeon/throughput-gpt-oss-per-vcpu.png">
</kbd>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to also include throughput numbers ? I understand that normalized throughput has increased but I don't see from where to where, as in what throughput should one expect from deployment on this cpu instance.

Copy link
Member

@IlyasMoutawwakil IlyasMoutawwakil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ! I think it can be worth it to add un-normalized metrics as well, like TTFT, TPS and TPOT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants