Fix openvino vlm blog post publication date (#3126)

echarlaix · web-flow · commit de1c7fd326b8 · 2025-10-15T11:05:19.000+02:00
* Fix openvino vlm blog post tilde cross out benchmark

* small fix

* fix date
diff --git a/_blog.yml b/_blog.yml
@@ -6792,7 +6792,7 @@
   title: "Get your VLM running in 3 simple steps on Intel CPUs"
   author: ezelanza
   thumbnail: /blog/assets/optimum_intel/intel_thumbnail.png
-  date: Oct 13, 2025
+  date: Oct 15, 2025
   tags:
     - intel
     - optimum
diff --git a/openvino-vlm.md b/openvino-vlm.md
@@ -128,7 +128,7 @@ If you have a recent Intel laptop, Intel AI PC, or Intel discrete GPU, you can l
 model = OVModelForVisualCausalLM.from_pretrained(model_id, device="gpu")
 ```
 
-We also created a [space](https://huggingface.co/spaces/echarlaix/vision-langage-openvino) so you can play with the [original model](https://huggingface.co/echarlaix/SmolVLM2-256M-Video-Instruct-openvino) and its quantized variants obtained by respectively applying [weight-only quantization](https://huggingface.co/echarlaix/SmolVLM2-256M-Video-Instruct-openvino-8bit-woq-data-free) and [mixed quantization](https://huggingface.co/echarlaix/SmolVLM2-256M-Video-Instruct-openvino-8bit-mixed). This demo runs on 4th Generation Intel Xeon (Sapphire Rapids) processors.
+We also [created a space](https://huggingface.co/spaces/echarlaix/vision-langage-openvino) so you can play with the [original model](https://huggingface.co/echarlaix/SmolVLM2-256M-Video-Instruct-openvino) and its quantized variants obtained by respectively applying [weight-only quantization](https://huggingface.co/echarlaix/SmolVLM2-256M-Video-Instruct-openvino-8bit-woq-data-free) and [mixed quantization](https://huggingface.co/echarlaix/SmolVLM2-256M-Video-Instruct-openvino-8bit-mixed). This demo runs on 4th Generation Intel Xeon (Sapphire Rapids) processors.
 
 
 <p align="center">
@@ -156,7 +156,7 @@ Here are the results on Intel CPU:
 | openvino-8bit-woq| 0.247                    | 0.016                      | 0.482                 | 63.928                        |
 
 
-This benchmark demonstrates how small, optimized multimodal models, like [SmolVLM2-256M](https://huggingface.co/HuggingFaceTB/SmolVLM2-256M-Video-Instruct), perform on Intel CPUs across different configurations. According to the tests, the PyTorch version shows high latency, with a time to first token (TTFT) of over 5s with a decoding throughput of ~0.7 tokens/s. Simply converting the model with Optimum and running it on OpenVINO drastically reduces the time to first token (TTFT) to 0.42s (~x12 speedup) and raises throughput to ~47 tokens/s (~x65). Applying 8-bit weight-only quantization further reduces TTFT (x1.7) and increases throughput (x1.4), while also reducing model size and improving efficiency.
+This benchmark demonstrates how small, optimized multimodal models, like [SmolVLM2-256M](https://huggingface.co/HuggingFaceTB/SmolVLM2-256M-Video-Instruct), perform on Intel CPUs across different configurations. According to the tests, the PyTorch version shows high latency, with a time to first token (TTFT) of over 5s with a decoding throughput of 0.7 tokens/s. Simply converting the model with Optimum and running it on OpenVINO drastically reduces the time to first token (TTFT) to 0.42s (~**x12** speedup) and raises throughput to 47 tokens/s (~**x65**). Applying 8-bit weight-only quantization further reduces TTFT (x1.7) and increases throughput (x1.4), while also reducing model size and improving efficiency.
 
 > [!NOTE]
 > **Platform configuration**