flash-attention pre-build wheels

This repository provides wheels for the pre-built flash-attention.

Since building flash-attention takes a very long time and is resource-intensive, I also build and provide combinations of CUDA and PyTorch that are not officially distributed.

This repository uses a self-hosted runner and AWS CodeBuild for building the wheels. If you find this project helpful, please consider sponsoring to help maintain the infrastructure!

Special thanks to @KiralyCraft for providing the computing resources used to build wheels. Thank you!!

Install

Select the versions for Python, CUDA, PyTorch, and flash_attn.

flash_attn-[flash_attn Version]+cu[CUDA Version]torch[PyTorch Version]-cp[Python Version]-cp[Python Version]-linux_x86_64.whl

# Example: Python 3.11, CUDA 12.4, PyTorch 2.5, and flash_attn 2.6.3
flash_attn-2.6.3+cu124torch2.5-cp312-cp312-linux_x86_64.whl

Find the corresponding version of a wheel from the Useful Search Page, Packages page, or releases page.
Direct Install or Download and Local Install

# Direct Install
pip install https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.0.0/flash_attn-2.6.3+cu124torch2.5-cp312-cp312-linux_x86_64.whl

# Download and Local Install
wget https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.0.0/flash_attn-2.6.3+cu124torch2.5-cp312-cp312-linux_x86_64.whl
pip install ./flash_attn-2.6.3+cu124torch2.5-cp312-cp312-linux_x86_64.whl

Packages

Coverage

Platform	Existing	Missing	Coverage
Linux x86_64	309	17	94.8%
Linux ARM64	55	41	57.3%
Windows	86	10	89.6%
Total	450	68	86.9%

Note

Since v0.8.0, Flash Attention 3 (flash_attn_3) wheels are also available. Flash Attention 3 requires Hopper (SM90) or newer GPUs and CUDA 12.3+.

Note

Since v0.7.0, wheels are built with manylinux2_28 platform. These wheels for Linux x86_64 and ManyLinux are compatible with old glibc versions (<=2.17).

Note

Since v0.5.0, wheels are built with a local version label indicating the CUDA and PyTorch versions. Example: pip list -> flash_attn==2.8.3 (old) -> flash_attn==2.8.3+cu130torch2.9 (>= built since v0.5.0)

See ./doc/packages.md for the full list of available packages.

History

The history of this repository is available here.

Citation

If you use this repository in your research and find it helpful, please cite this repository!

@misc{flash-attention-prebuild-wheels,
 author = {Morioka, Junya},
 year = {2025},
 title = {mjun0812/flash-attention-prebuild-wheels},
 url = {https://github.com/mjun0812/flash-attention-prebuild-wheels},
 howpublished = {https://github.com/mjun0812/flash-attention-prebuild-wheels},
}

Acknowledgments

@okaris : Sponsored me!
@xhiroga : Sponsored me!
cjustus613 : Buy me a coffee!
@KiralyCraft : Provided with computing resource!
@kun432 : Buy me a coffee 3 times!
@wodeyuzhou : Sponsored me!
Gabr1e1 : Buy me a coffee!
wp : Buy me a coffee!
wangxiyu191: Buy me a coffee!
@sr99622 : Sponsored me!

Star History and Download Statistics

Original Repository

repo

@inproceedings{dao2022flashattention,
  title={Flash{A}ttention: Fast and Memory-Efficient Exact Attention with {IO}-Awareness},
  author={Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R{\'e}, Christopher},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2022}
}
@inproceedings{dao2023flashattention2,
  title={Flash{A}ttention-2: Faster Attention with Better Parallelism and Work Partitioning},
  author={Dao, Tri},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2024}
}

Self build

If you cannot find the version you are looking for, you can fork this repository and create a wheel on GitHub Actions.

Fork this repository
(Optional) Set up your self-hosted runner.
Edit Python script create_matrix.py to set the version you want to build. You can use GitHub hosted runners or self-hosted runners.
Add tag v*.*.* to trigger the build workflow. git tag v*.*.* && git push --tags

Please note that depending on the combination of versions, it may not be possible to build.

Self-Hosted Runner Build

In some version combinations, you cannot build wheels on GitHub-hosted runners due to job time limitations. To build the wheels for these versions, you can use self-hosted runners.

See self-hosted-runner/README.md for detailed setup instructions.

Build Environments

This repository builds wheels across multiple platforms and environments:

Platform	Runner Type	Container Image
Linux x86_64	GitHub-hosted (`ubuntu-22.04`)	-
Linux x86_64	Self-hosted	`ubuntu:24.04` or `manylinux_2_28_x86_64`
Linux ARM64	GitHub-hosted (`ubuntu-22.04-arm`)	-
Windows x86_64	GitHub-hosted (`windows-2022`)	-
Windows x86_64	Self-hosted (`windows11`)	-
Windows x86_64	AWS CodeBuild	-

Name		Name	Last commit message	Last commit date
Latest commit History 669 Commits
.github		.github
doc		doc
docs		docs
patches/fa3		patches/fa3
scripts		scripts
self-hosted-runner		self-hosted-runner
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
build_linux.sh		build_linux.sh
build_windows.ps1		build_windows.ps1
create_matrix.py		create_matrix.py
get_torch_cuda_version.py		get_torch_cuda_version.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

flash-attention pre-build wheels

Install

Packages

Coverage

History

Citation

Acknowledgments

Star History and Download Statistics

Original Repository

Self build

Self-Hosted Runner Build

Build Environments

About

Uh oh!

Releases 61

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

flash-attention pre-build wheels

Install

Packages

Coverage

History

Citation

Acknowledgments

Star History and Download Statistics

Original Repository

Self build

Self-Hosted Runner Build

Build Environments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 61

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages