Why isn't SDPA supported in cuda within candle? #2725

Murad-Awad · 2025-01-18T01:42:59Z

Hi all, forgive my ignorance but my understanding is that SDPA is cuda compatible but I can't find an implementation within candle_nn for it. I want to run some transformer models (mainly whisper) on T4 GPUs (which don't support flash-attention 2) and want to extract more performance. Optimally, would like something similar to the python transformers library where SDPA is by default used when able.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why isn't SDPA supported in cuda within candle? #2725

Why isn't SDPA supported in cuda within candle? #2725

Murad-Awad commented Jan 18, 2025

Why isn't SDPA supported in cuda within candle? #2725

Why isn't SDPA supported in cuda within candle? #2725

Comments

Murad-Awad commented Jan 18, 2025