Releases: woct0rdho/triton-windows
v3.2.0-windows.post15
Define Py_LIMITED_API
and exclude new Python C API that cannot be compiled by TinyCC, see #92
v3.3.0-windows.post14
Fix getMMAVersionSafe
for RTX 50xx (sm120), see #83 (comment)
v3.2.0-windows.post13
TinyCC is bundled in the wheels, so you don't need to install MSVC to use Triton. Packages that directly call triton.jit
, such as SageAttention, will just work.
You still need to install a C++ compiler if you use torch.compile
targeting CPU. This may happen when you use nodes like 'CompileModel' in ComfyUI. Triton does not affect how PyTorch configures the C++ compiler in this case.
tcc
Testing out bundling TinyCC in the wheels, so the users no longer need to install MSVC.
The TinyCC release is downloaded from https://download.savannah.gnu.org/releases/tinycc/tcc-0.9.27-win64-bin.zip
The def files used by Triton are generated by
tcc -impdef C:\Windows\System32\nvcuda.dll -o lib\cuda.def
tcc -impdef path\to\python3.dll -o lib\python3.def
The version of nvcuda.dll
is 32.0.15.7270 as of today. The python3.dll
is from Python 3.9.13 because currently Python 3.9 is the minimal Python version supported by Triton.
The pip package tinycc
was not used because these def files also need to be bundled.
v3.2.0-windows.post12
Let the environment variables TRITON_LIBCUDA_PATH
and CUDA_PATH
take higher precedence than the bundled CUDA
v3.2.0-windows.post11
- Since the release
post11
, the wheels are published to PyPI, and no longer to GitHub. You can simply install the wheel usingpip install -U triton-windows
- A minimal toolchain of CUDA is bundled in the wheels, so you don't need to manually install it. (You still need to manually install MSVC, Windows SDK, and vcredist)
- The wheels are linked against the LLVM from
oaitriton.blob.core.windows.net
, to better align with the official Triton - The JIT-compiled C binaries (
cuda_utils.pyd
,__triton_launcher.pyd
) are linked against the Python stable ABI, so there should be less error likeDLL load failed while importing cuda_utils
when switching the Python version
v3.2.0-windows.post10
For conda, support pytorch-gpu
installed in conda-forge
channel and cuda-toolkit
installed in nvidia
channel. Starting from PyTorch 2.6, PyTorch is no longer released in pytorch
channel
v3.2.0-windows.post9
Following the official Triton, I release wheels for Python 3.9 to 3.13 .
v3.1.0-windows.post9
- Fix PTX ISA version for CUDA 12.8
- Fix int64 overflow in
make_launcher
v3.1.0-windows.post8
Support CUDA from pip