TTGIR Test Bed for Triton Kernels

The following is a test bed for experimenting with modified TTGIR Triton kernels.

For testing out ragged HSTU try out the run.sh script in the ragged_hstu_test_bed directory:

cd ragged_hstu_test_bed
HSTU_BENCH_EXPERIMENT=4 bash run.sh

By default the run.sh script runs the kernel 3 times to get an average. The output will look like this:

local_gather for TW AND PW: 6% reduction

Running 3 times and printing the average overhead reduction (higher % is better)

Overriding kernel with file _ragged_hstu_attn_fwd.ttgir

P50 latency is 1.36168 ms
P20 latency is 1.35772 ms
P80 latency is 1.36673 ms

P50 latency is 1.36346 ms
P20 latency is 1.35869 ms
P80 latency is 1.36879 ms

P50 latency is 1.36536 ms
P20 latency is 1.35964 ms
P80 latency is 1.37253 ms

TOTAL RUNNING TIME FOR ALL 3x3 RUNS: 12.27460
Overhead Reduction for HSTU: 5.77649225346220305200%

For AMD targets use USE_ROCM=1 and to only run the benchmark once use RUN_ONCE=1

If you're curious about the HSTU_BENCH_EXPERIMENT number, read the run.sh script code, but heres a run down:

These are all 64x64 num_stages=2 num_warps=4:

HSTU_BENCH_EXPERIMENT=1: DROP masks for TW and PW tl.loads
HSTU_BENCH_EXPERIMENT=2: TW with local_gather
HSTU_BENCH_EXPERIMENT=3: TW with local_gather, drop PW mask
HSTU_BENCH_EXPERIMENT=4: local_gather for TW AND PW
HSTU_BENCH_EXPERIMENT=5: Original HSTU TTGIR Unmodified
HSTU_BENCH_EXPERIMENT=6: local_gather for PW
HSTU_BENCH_EXPERIMENT=7: local_gather for TW AND PW, PW no mask
HSTU_BENCH_EXPERIMENT=8: local_gather for TW AND PW, no mask for either
HSTU_BENCH_EXPERIMENT=9: local_gather for TW AND PW, TW no mask


Tile Size 128x128:

HSTU_BENCH_EXPERIMENT=4128: local_gather for TW AND PW
HSTU_BENCH_EXPERIMENT=5128: Original HSTU Kernel

Tile Size 32x32 num_stages=2 num_warps=2:

HSTU_BENCH_EXPERIMENT=432: local_gather for TW AND PW
HSTU_BENCH_EXPERIMENT=532: Original HSTU Kernel

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
gather_kernels		gather_kernels
ragged_hstu_test_bed		ragged_hstu_test_bed
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TTGIR Test Bed for Triton Kernels

About

Releases

Packages

Languages

plotfi/ttgir-override-testbed

Folders and files

Latest commit

History

Repository files navigation

TTGIR Test Bed for Triton Kernels

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages