Skip to content

Commit de1c7fd

Browse files
authored
Fix openvino vlm blog post publication date (#3126)
* Fix openvino vlm blog post tilde cross out benchmark * small fix * fix date
1 parent f405483 commit de1c7fd

File tree

2 files changed

+3
-3
lines changed

2 files changed

+3
-3
lines changed

_blog.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6792,7 +6792,7 @@
67926792
title: "Get your VLM running in 3 simple steps on Intel CPUs"
67936793
author: ezelanza
67946794
thumbnail: /blog/assets/optimum_intel/intel_thumbnail.png
6795-
date: Oct 13, 2025
6795+
date: Oct 15, 2025
67966796
tags:
67976797
- intel
67986798
- optimum

openvino-vlm.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -128,7 +128,7 @@ If you have a recent Intel laptop, Intel AI PC, or Intel discrete GPU, you can l
128128
model = OVModelForVisualCausalLM.from_pretrained(model_id, device="gpu")
129129
```
130130

131-
We also created a [space](https://huggingface.co/spaces/echarlaix/vision-langage-openvino) so you can play with the [original model](https://huggingface.co/echarlaix/SmolVLM2-256M-Video-Instruct-openvino) and its quantized variants obtained by respectively applying [weight-only quantization](https://huggingface.co/echarlaix/SmolVLM2-256M-Video-Instruct-openvino-8bit-woq-data-free) and [mixed quantization](https://huggingface.co/echarlaix/SmolVLM2-256M-Video-Instruct-openvino-8bit-mixed). This demo runs on 4th Generation Intel Xeon (Sapphire Rapids) processors.
131+
We also [created a space](https://huggingface.co/spaces/echarlaix/vision-langage-openvino) so you can play with the [original model](https://huggingface.co/echarlaix/SmolVLM2-256M-Video-Instruct-openvino) and its quantized variants obtained by respectively applying [weight-only quantization](https://huggingface.co/echarlaix/SmolVLM2-256M-Video-Instruct-openvino-8bit-woq-data-free) and [mixed quantization](https://huggingface.co/echarlaix/SmolVLM2-256M-Video-Instruct-openvino-8bit-mixed). This demo runs on 4th Generation Intel Xeon (Sapphire Rapids) processors.
132132

133133

134134
<p align="center">
@@ -156,7 +156,7 @@ Here are the results on Intel CPU:
156156
| openvino-8bit-woq| 0.247 | 0.016 | 0.482 | 63.928 |
157157

158158

159-
This benchmark demonstrates how small, optimized multimodal models, like [SmolVLM2-256M](https://huggingface.co/HuggingFaceTB/SmolVLM2-256M-Video-Instruct), perform on Intel CPUs across different configurations. According to the tests, the PyTorch version shows high latency, with a time to first token (TTFT) of over 5s with a decoding throughput of ~0.7 tokens/s. Simply converting the model with Optimum and running it on OpenVINO drastically reduces the time to first token (TTFT) to 0.42s (~x12 speedup) and raises throughput to ~47 tokens/s (~x65). Applying 8-bit weight-only quantization further reduces TTFT (x1.7) and increases throughput (x1.4), while also reducing model size and improving efficiency.
159+
This benchmark demonstrates how small, optimized multimodal models, like [SmolVLM2-256M](https://huggingface.co/HuggingFaceTB/SmolVLM2-256M-Video-Instruct), perform on Intel CPUs across different configurations. According to the tests, the PyTorch version shows high latency, with a time to first token (TTFT) of over 5s with a decoding throughput of 0.7 tokens/s. Simply converting the model with Optimum and running it on OpenVINO drastically reduces the time to first token (TTFT) to 0.42s (~**x12** speedup) and raises throughput to 47 tokens/s (~**x65**). Applying 8-bit weight-only quantization further reduces TTFT (x1.7) and increases throughput (x1.4), while also reducing model size and improving efficiency.
160160

161161
> [!NOTE]
162162
> **Platform configuration**

0 commit comments

Comments
 (0)