If you see performance regressions with June 2024 update, set beam size to 2 #2224

clort81 · 2024-06-08T09:54:52Z

clort81
Jun 8, 2024

Default beam size has increased from 2 to 5 which has major performance impact.
With -bs 2 identical to old beam size, I get a good boost with flash attention:

2024.05.15 whisper.cpp, without -fa
real 1m6.259s
user 1m33.432s
sys 0m1.561s

2024.06.08., With -fa -bs 2
real 0m50.597s
user 0m57.888s
sys 0m1.684s

Pretty great! THANKS!

Further testing...

2024.06.08, With -fa -bs 5
real 2m21.298s
user 2m50.949s
sys 0m4.076s

2024.06.08, -bs 2 without -fa
real 2m5.511s
user 2m41.488s
sys 0m4.112s

no way? Rerun...
real 2m4.830s
user 2m40.821s
sys 0m3.799s

Well, that's -bs 2. Why is it so much slower?

(linux, cuda 11, rtx 3090 temp-limited to 75°C, /models/ggml-small.en-tdrz.bin )

Looking at the output of 2024.06.08,
I get only one word
1
[00:00:00.000 --> 00:00:02.060] you
2
[00:00:02.060 --> 00:00:04.120] you
3
[00:00:04.120 --> 00:00:06.180] you
4
[00:00:06.180 --> 00:00:08.240] you
5
[00:00:08.240 --> 00:00:10.320] you
6
[00:00:10.320 --> 00:00:12.380] you
7
Back to may15 i guess.

clort81 · 2024-06-08T10:20:38Z

clort81
Jun 8, 2024
Author

$ time ../whisper -l en -fa -bs 2 -m ../models/ggml-small.en-q5_1.bin -osrt -of test.srt audiofile.wav
[00:00:00.000 --> 00:00:10.000] [BLANK_AUDIO]
[00:00:10.000 --> 00:00:20.000] [BLANK_AUDIO]
[00:00:21.000 --> 00:00:30.000] [BLANK_AUDIO]

$ ffprobe audiofile.wav
Input #0, wav, from 'audiofile.wav':
Metadata:
encoder : Lavf60.5.100
Duration: 01:46:24.11, bitrate: 256 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 256 kb/s

$ time ../whisper -l en -fa -bs 2 -m ../models/ggml-large-v2.bin -osrt -of test.srt audiofile.wav 2>/dev/null

[00:00:00.000 --> 00:00:10.000] [BLANK_AUDIO]
[00:00:10.000 --> 00:00:20.000] [BLANK_AUDIO]
[00:00:20.000 --> 00:00:30.000] [BLANK_AUDIO]

$ time ../whisper -l en -m ../models/ggml-large-v2.bin -osrt -of test.srt audiofile.wav 2>/dev/null

[00:00:00.000 --> 00:00:10.000] [BLANK_AUDIO]
[00:00:10.000 --> 00:00:20.000] [BLANK_AUDIO]
[00:00:20.000 --> 00:00:30.000] [BLANK_AUDIO]

$ time ../whisper -m ../models/ggml-tiny.en-q5_1.bin -of test.txt audiofile.wav 2>/dev/null

[00:00:00.000 --> 00:00:09.080] [BLANK_AUDIO]
[00:00:09.080 --> 00:00:19.080] [BLANK_AUDIO]
[00:00:19.080 --> 00:00:24.080] [BLANK_AUDIO]

$ time ../whisper.may15 -m ../models/ggml-tiny.en-q5_1.bin -of test.txt audiofile.wav 2>/dev/null

[00:00:00.000 --> 00:00:02.580] (upbeat music)
[00:00:02.580 --> 00:00:21.500] - Hey there, you're listening to Unlimited Hangout.
[00:00:21.500 --> 00:00:23.640] I'm your host, Whitney Webb.
[00:00:23.640 --> 00:00:26.520] Over the past week or so, the international media
[00:00:26.520 --> 00:00:30.600] has been quite focused on the COP26 conference
... no problems

0 replies

clort81 · 2024-06-08T10:36:14Z

clort81
Jun 8, 2024
Author

With -ng i am seeing function and correct transcription.

Something maybe broken with my CUDA?

Edit: Nope! 2024.05.15 compiles and runs fine.

0 replies

clort81 · 2024-06-08T11:09:11Z

clort81
Jun 8, 2024
Author

git bisect run bash -c ' git clean -fdx ; git submodule update --init --recursive ; WHISPER_CUDA=1 make -j $(nproc) ; time ./main -m /pr/Neural/Voice_Recognition_Whispr_GGML/temp/good-whisper.cpp/models/ggml-small.en-q5_1.bin -l en -bs 2 -of test.txt /pr/Neural/Voice_Recognition_Whispr_GGML/audiodump.wav'

1b51fdf is the first bad commit
commit 1b51fdf
Author: William Tambellini [email protected]
Date: Tue May 21 08:31:41 2024 -0700

examples : add support for decoding input with ffmpeg (Linux) (#2133)

- search for ffmpeg libs/headers at cmake time
- added ffmpeg-transcode.cpp into libcommon if ffmpeg on
- hooked ffmpeg trancoding in common read_wav(...)
- passed test:
./main -m ggml-base.en.bin -f samples/jfk.mp3

CMakeLists.txt

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

If you see performance regressions with June 2024 update, set beam size to 2 #2224

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

If you see performance regressions with June 2024 update, set beam size to 2 #2224

Uh oh!

Uh oh!

clort81 Jun 8, 2024

Replies: 3 comments

Uh oh!

Uh oh!

clort81 Jun 8, 2024 Author

Uh oh!

Uh oh!

clort81 Jun 8, 2024 Author

Uh oh!

clort81 Jun 8, 2024 Author

clort81
Jun 8, 2024

clort81
Jun 8, 2024
Author

clort81
Jun 8, 2024
Author

clort81
Jun 8, 2024
Author