Replies: 3 comments
-
$ time ../whisper -l en -fa -bs 2 -m ../models/ggml-small.en-q5_1.bin -osrt -of test.srt audiofile.wav $ ffprobe audiofile.wav $ time ../whisper -l en -fa -bs 2 -m ../models/ggml-large-v2.bin -osrt -of test.srt audiofile.wav 2>/dev/null [00:00:00.000 --> 00:00:10.000] [BLANK_AUDIO] $ time ../whisper -l en -m ../models/ggml-large-v2.bin -osrt -of test.srt audiofile.wav 2>/dev/null [00:00:00.000 --> 00:00:10.000] [BLANK_AUDIO] $ time ../whisper -m ../models/ggml-tiny.en-q5_1.bin -of test.txt audiofile.wav 2>/dev/null [00:00:00.000 --> 00:00:09.080] [BLANK_AUDIO] $ time ../whisper.may15 -m ../models/ggml-tiny.en-q5_1.bin -of test.txt audiofile.wav 2>/dev/null [00:00:00.000 --> 00:00:02.580] (upbeat music) |
Beta Was this translation helpful? Give feedback.
-
With -ng i am seeing function and correct transcription. Something maybe broken with my CUDA? Edit: Nope! 2024.05.15 compiles and runs fine. |
Beta Was this translation helpful? Give feedback.
-
git bisect run bash -c ' git clean -fdx ; git submodule update --init --recursive ; WHISPER_CUDA=1 make -j $(nproc) ; time ./main -m /pr/Neural/Voice_Recognition_Whispr_GGML/temp/good-whisper.cpp/models/ggml-small.en-q5_1.bin -l en -bs 2 -of test.txt /pr/Neural/Voice_Recognition_Whispr_GGML/audiodump.wav' 1b51fdf is the first bad commit
CMakeLists.txt |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Default beam size has increased from 2 to 5 which has major performance impact.
With -bs 2 identical to old beam size, I get a good boost with flash attention:
2024.05.15 whisper.cpp, without -fa
real 1m6.259s
user 1m33.432s
sys 0m1.561s
2024.06.08., With -fa -bs 2
real 0m50.597s
user 0m57.888s
sys 0m1.684s
Pretty great! THANKS!
Further testing...
2024.06.08, With -fa -bs 5
real 2m21.298s
user 2m50.949s
sys 0m4.076s
2024.06.08, -bs 2 without -fa
real 2m5.511s
user 2m41.488s
sys 0m4.112s
no way? Rerun...
real 2m4.830s
user 2m40.821s
sys 0m3.799s
Well, that's -bs 2. Why is it so much slower?
(linux, cuda 11, rtx 3090 temp-limited to 75°C, /models/ggml-small.en-tdrz.bin )
Looking at the output of 2024.06.08,
I get only one word
1
[00:00:00.000 --> 00:00:02.060] you
2
[00:00:02.060 --> 00:00:04.120] you
3
[00:00:04.120 --> 00:00:06.180] you
4
[00:00:06.180 --> 00:00:08.240] you
5
[00:00:08.240 --> 00:00:10.320] you
6
[00:00:10.320 --> 00:00:12.380] you
7
Back to may15 i guess.
Beta Was this translation helpful? Give feedback.
All reactions