ROCm and sliding windows fixes #2033

fxmarty · 2024-06-06T12:37:54Z

Fix models requiring window attention - no need to raise an error at load time in case max_input_tokens < sliding_window.

Also, updates vllm fork commit to use the fixes from ROCm/vllm#28 & fix a few rocm issues

…re disabled

Narsil · 2024-06-06T14:02:51Z

Fix models requiring window attention - no need to raise an error at load time. Only if context > window length at runtime.

context --max-total-tokens, right ? (We need to crash at load time, crashing randomly at runtime is a terrible UX).

server/text_generation_server/models/__init__.py

fxmarty · 2024-06-07T09:23:00Z

@Narsil I assume paged attention always works with sliding window

server/text_generation_server/models/__init__.py

launcher/src/main.rs

Narsil

LGTM

* update vllm commit & fix models using sliding window * update * update commit * fix bug where tunableop is bound to cuda graph even when cuda graph are disabled * enable tunableop by default * fix sliding window * address review * dead code * precise comment * is it flaky?

fxmarty added 3 commits June 6, 2024 07:51

update vllm commit & fix models using sliding window

0d3cc03

update

e9b9a96

update commit

35d1946

fxmarty requested a review from Narsil June 6, 2024 12:40

fxmarty added 2 commits June 6, 2024 13:53

fix bug where tunableop is bound to cuda graph even when cuda graph a…

c36c7ec

…re disabled

enable tunableop by default

0d9b2f2

Narsil reviewed Jun 6, 2024

View reviewed changes

server/text_generation_server/models/__init__.py Show resolved Hide resolved

fix sliding window

4220423

fxmarty changed the title ~~ROCm fixes~~ ROCm and sliding windows fixes Jun 7, 2024

fxmarty requested a review from Narsil June 7, 2024 09:20

Narsil reviewed Jun 7, 2024

View reviewed changes

server/text_generation_server/models/__init__.py Outdated Show resolved Hide resolved

Narsil reviewed Jun 7, 2024

View reviewed changes

launcher/src/main.rs Outdated Show resolved Hide resolved

fxmarty added 2 commits June 7, 2024 12:36

address review

b884d2b

dead code

fb5487d

fxmarty requested a review from Narsil June 7, 2024 12:40

fxmarty added 2 commits June 7, 2024 12:43

precise comment

b8ac9ba

is it flaky?

979b670

Narsil approved these changes Jun 7, 2024

View reviewed changes

fxmarty merged commit 9b3674d into main Jun 10, 2024
5 checks passed

fxmarty deleted the rocm-fixes branch June 10, 2024 07:09

Narsil mentioned this pull request Jun 24, 2024

Fixing AMD CI #2109

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROCm and sliding windows fixes #2033

ROCm and sliding windows fixes #2033

fxmarty commented Jun 6, 2024 •

edited

Loading

Narsil commented Jun 6, 2024

fxmarty commented Jun 7, 2024

Narsil left a comment

ROCm and sliding windows fixes #2033

ROCm and sliding windows fixes #2033

Conversation

fxmarty commented Jun 6, 2024 • edited Loading

Narsil commented Jun 6, 2024

fxmarty commented Jun 7, 2024

Narsil left a comment

Choose a reason for hiding this comment

fxmarty commented Jun 6, 2024 •

edited

Loading