Replies: 1 comment 3 replies
-
Oh well, answering my own question, OpenVINO was definitely worth pursuing! Used a 3.10 venv as outlined in the documentation to convert the Will try to modify the convert script to convert my French distilled model and I should be all set. Is it still worth it to build ggml with BLAS and Intel MKL or is everything happening on the GPU? |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all, I'm looking to build a fast STT for use in Home Assistant. I'm coming from faster-whisper with a small model running directly on the N100 machine which runs Home Assistant. Each command was taking 6-7 seconds with really hit or miss results.
I recently learned about Vulkan support in whisper.cpp and decided to migrate the STT component to my home server, running a Xeon D-1521 and a discrete GPU. I am now able to run a large-v3 model in about 8-9 seconds with infinitely better accuracy, which I'll trade a couple seconds for any day. Vulkan is really a game changer, as it is about 10x faster compared to the CPU backend. It would be awesome if I could bring that down under the 5 second mark, but I'm struggling as everything I tried so far has had no effect at all.
Here's everything I tried:
bofenghuang/whisper-large-v3-french-distil-dec4
(which I understand is about the same as using the newer turbo model)The one thing I haven't tried yet is using OpenVINO which I believe can also run on the Arc GPU, however I haven't been able to yet as I'm currently stuck with Python and OpenVINO versions that are seemingly too recent for whisper.cpp.
Should I pursue OpenVINO given my current hardware, or have I hit a hard limit? Anything else that is worth trying (besides downgrading to a smaller model and sacrificing accuracy)?
Beta Was this translation helpful? Give feedback.
All reactions