Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whisper Redesigned Solution #1229

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open

Whisper Redesigned Solution #1229

wants to merge 19 commits into from

Conversation

kunal-vaishnavi
Copy link
Contributor

Description

This PR re-designs how Whisper is created and supported in ONNX Runtime GenAI. The new solution is designed to be used in conjunction with this work in ONNX Runtime.

Some of the added changes include:

  • Re-designed GenAI config that separates the encoder model and decoder model
    • Removes the encoder-decoder-init section
    • Creates a new encoder section
    • Separates session options, EP options, and model properties to be per-model instead of re-using the decoder's options for all components
    • Re-assigns pre-computed cross-attention KV caches as outputs to encoder model instead of inputs to decoder model
  • Re-designed runtime support that makes the states and steps much clearer
    • Creates AudioEncoder, WhisperDecoder (i.e. TextDecoder), and WhisperState as separate states
    • Creates AudioFeatures class that can be re-used for other speech models
    • Adds generic support for FP32 CPU, FP32 CUDA, FP16 CUDA, and any quantized versions
    • Removes temporary workarounds for past-present buffer sharing due to restrictions from both the exported ONNX model and ONNX Runtime
    • Handles models with and without the following: buffer sharing, DecoderMaskedMultiHeadAttention, and alignment heads

Known Issues

  • This branch still has to be synced with the latest changes in the main branch of ONNX Runtime GenAI and active dev branches that will materially change this PR.
  • The cross QK kernels do not have parity with the alternative, more-accurate approach to compute the cross QKs as a separate inference pass. Currently, it is recommended to use the alternative approach for calculating word-level timestamps.
  • The cross QK kernels are only supported for CUDA.
  • The end-to-end working example is still under development here. Once working, a copy of those scripts will be added as a sub-folder in the Python examples.

Motivation and Context

The original implementation of Whisper was added in ONNX Runtime GenAI to create an initial foundation. This new approach is more flexible and more customizable for users. It also introduces an encoder-decoder architecture setup that can be used for other encoder-decoder models or other speech models.

commit acba52c
Author: Ryan Hill <[email protected]>
Date:   Mon Feb 3 15:24:33 2025 -0800

    Update src/models/model.h

    Co-authored-by: aciddelgado <[email protected]>

commit 0765339
Merge: 4f2f084 6da4195
Author: Ryan Hill <[email protected]>
Date:   Fri Jan 31 16:17:42 2025 -0800

    Merge remote-tracking branch 'origin/main' into ryanunderhill/providers

commit 4f2f084
Author: Ryan Hill <[email protected]>
Date:   Thu Jan 30 16:02:48 2025 -0800

    Refactor device_type

commit e6b77f2
Author: Ryan Hill <[email protected]>
Date:   Thu Jan 30 01:24:37 2025 -0800

    Device check simplifications

commit 198e8f8
Author: Ryan Hill <[email protected]>
Date:   Wed Jan 29 22:40:37 2025 -0800

    Remove accidental change

commit f8ed9ce
Author: Ryan Hill <[email protected]>
Date:   Wed Jan 29 18:16:08 2025 -0800

    Previous change also added device interfaces for webgpu & qnn
    Lint

commit e804697
Author: Ryan Hill <[email protected]>
Date:   Wed Jan 29 18:14:28 2025 -0800

    Clean up allocators, now everything is through p_device_* interfaces.

commit 53c666c
Author: Ryan Hill <[email protected]>
Date:   Wed Jan 29 12:51:08 2025 -0800

    Edward gave me ideas.

commit 45dad2b
Author: Ryan Hill <[email protected]>
Date:   Tue Jan 28 11:51:15 2025 -0800

    Review feedback

commit c11704f
Author: Ryan Hill <[email protected]>
Date:   Sun Jan 26 18:13:35 2025 -0800

    Type tweak

commit a011fe0
Author: Ryan Hill <[email protected]>
Date:   Sun Jan 26 18:06:18 2025 -0800

    Leftover #ifdef fix

commit 6736517
Author: Ryan Hill <[email protected]>
Date:   Fri Jan 24 20:59:24 2025 -0800

    Android tweak

commit 2df5fe1
Author: Ryan Hill <[email protected]>
Date:   Fri Jan 24 20:44:57 2025 -0800

    Fix iOS break

commit d87807c
Author: Ryan Hill <[email protected]>
Date:   Fri Jan 24 20:40:43 2025 -0800

    Don't load cuda library outside of linux & windows

commit 0303592
Author: Ryan Hill <[email protected]>
Date:   Fri Jan 24 18:17:04 2025 -0800

    Undefined behavior fix in startup

commit 67d914c
Merge: fd788d7 0636ce3
Author: Ryan Hill <[email protected]>
Date:   Fri Jan 24 13:57:58 2025 -0800

    Merge with main

commit fd788d7
Author: Ryan Hill <[email protected]>
Date:   Fri Jan 24 13:48:33 2025 -0800

    Extra debug logging

commit 2bc83eb
Author: Ryan Hill <[email protected]>
Date:   Fri Jan 24 13:46:49 2025 -0800

    Crash investigation

commit 1734f5c
Author: Ryan Hill <[email protected]>
Date:   Fri Jan 24 03:36:49 2025 -0800

    Test instrumenting

commit afecf1d
Author: Ryan Hill <[email protected]>
Date:   Wed Jan 22 16:19:11 2025 -0800

    Test theory

commit 0e7064c
Merge: b079b74 8bfd286
Author: Ryan Hill <[email protected]>
Date:   Wed Jan 22 16:14:33 2025 -0800

    Merge with main

commit b079b74
Author: Ryan Hill <[email protected]>
Date:   Tue Jan 21 20:51:21 2025 -0800

    Try again to fix C# test

commit 133d5a0
Author: Ryan Hill <[email protected]>
Date:   Tue Jan 21 16:56:59 2025 -0800

    Fix C# unit tests

commit d3db2f6
Author: Ryan Hill <[email protected]>
Date:   Tue Jan 21 14:06:51 2025 -0800

    Fix input_ids issue from merge

commit 49b51ef
Author: Ryan Hill <[email protected]>
Date:   Thu Jan 16 21:33:46 2025 -0800

    Build fix

commit 5244049
Author: Ryan Hill <[email protected]>
Date:   Thu Jan 16 21:21:24 2025 -0800

    Build fix

commit 0f2ea36
Merge: 0bc39a5 ee318f1
Author: Ryan Hill <[email protected]>
Date:   Thu Jan 16 20:20:54 2025 -0800

    Merge with main

commit 0bc39a5
Author: Ryan Hill <[email protected]>
Date:   Thu Jan 16 20:15:19 2025 -0800

    Build fixes

commit 66321dd
Author: Ryan Hill <[email protected]>
Date:   Wed Jan 15 23:08:47 2025 -0800

    Formatting

commit bdbb09c
Author: Ryan Hill <[email protected]>
Date:   Wed Jan 15 16:04:50 2025 -0800

    Fix merge build issues

commit 237fb1e
Merge: 41b462a 014c5f6
Author: Ryan Hill <[email protected]>
Date:   Wed Jan 15 15:43:42 2025 -0800

    Merge with main

commit 41b462a
Author: Ryan Hill <[email protected]>
Date:   Wed Jan 15 15:34:15 2025 -0800

    Finish refactoring model processing
    Remove as many #if USE_CUDA/USE_DML as possible

commit 3823664
Author: Ryan Hill <[email protected]>
Date:   Sun Dec 15 23:52:16 2024 -0800

    Summary: Remove #ifdefs for providers and go through device interface.
    Details:

    Add a DML DeviceInterface and DML DeviceBuffer handler.
    Remove #if blocks that are doing memory copies between device/cpu memory and use the DeviceSpan interface.

commit 35e79ce
Merge: 34381af c5745fd
Author: Ryan Hill <[email protected]>
Date:   Mon Nov 25 17:00:03 2024 -0800

    Merge remote-tracking branch 'origin/main' into ryanunderhill/providers

commit 34381af
Merge: 7e4668b 4819a8c
Author: Ryan Hill <[email protected]>
Date:   Fri Nov 22 17:24:43 2024 -0800

    Merge remote-tracking branch 'origin/main' into ryanunderhill/providers

commit 7e4668b
Author: Ryan Hill <[email protected]>
Date:   Fri Nov 22 17:24:35 2024 -0800

    Use DeviceInterface for debugging
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants