Skip to content

Releases: woct0rdho/triton-windows

v3.2.0-windows.post15

16 Mar 15:35
Compare
Choose a tag to compare

Define Py_LIMITED_API and exclude new Python C API that cannot be compiled by TinyCC, see #92

v3.3.0-windows.post14

15 Mar 12:38
Compare
Choose a tag to compare
v3.3.0-windows.post14 Pre-release
Pre-release

Fix getMMAVersionSafe for RTX 50xx (sm120), see #83 (comment)

v3.2.0-windows.post13

12 Mar 13:07
Compare
Choose a tag to compare

TinyCC is bundled in the wheels, so you don't need to install MSVC to use Triton. Packages that directly call triton.jit, such as SageAttention, will just work.

You still need to install a C++ compiler if you use torch.compile targeting CPU. This may happen when you use nodes like 'CompileModel' in ComfyUI. Triton does not affect how PyTorch configures the C++ compiler in this case.

tcc

12 Mar 10:06
df50695
Compare
Choose a tag to compare
tcc Pre-release
Pre-release

Testing out bundling TinyCC in the wheels, so the users no longer need to install MSVC.

The TinyCC release is downloaded from https://download.savannah.gnu.org/releases/tinycc/tcc-0.9.27-win64-bin.zip

The def files used by Triton are generated by

tcc -impdef C:\Windows\System32\nvcuda.dll -o lib\cuda.def
tcc -impdef path\to\python3.dll -o lib\python3.def

The version of nvcuda.dll is 32.0.15.7270 as of today. The python3.dll is from Python 3.9.13 because currently Python 3.9 is the minimal Python version supported by Triton.

The pip package tinycc was not used because these def files also need to be bundled.

v3.2.0-windows.post12

10 Mar 15:34
Compare
Choose a tag to compare

Let the environment variables TRITON_LIBCUDA_PATH and CUDA_PATH take higher precedence than the bundled CUDA

v3.2.0-windows.post11

10 Mar 06:35
Compare
Choose a tag to compare
  • Since the release post11, the wheels are published to PyPI, and no longer to GitHub. You can simply install the wheel using pip install -U triton-windows
  • A minimal toolchain of CUDA is bundled in the wheels, so you don't need to manually install it. (You still need to manually install MSVC, Windows SDK, and vcredist)
  • The wheels are linked against the LLVM from oaitriton.blob.core.windows.net, to better align with the official Triton
  • The JIT-compiled C binaries (cuda_utils.pyd, __triton_launcher.pyd) are linked against the Python stable ABI, so there should be less error like DLL load failed while importing cuda_utils when switching the Python version

v3.2.0-windows.post10

19 Feb 04:18
Compare
Choose a tag to compare

For conda, support pytorch-gpu installed in conda-forge channel and cuda-toolkit installed in nvidia channel. Starting from PyTorch 2.6, PyTorch is no longer released in pytorch channel

v3.2.0-windows.post9

01 Feb 03:44
Compare
Choose a tag to compare

Following the official Triton, I release wheels for Python 3.9 to 3.13 .

v3.1.0-windows.post9

28 Jan 02:46
Compare
Choose a tag to compare
  • Fix PTX ISA version for CUDA 12.8
  • Fix int64 overflow in make_launcher

v3.1.0-windows.post8

21 Jan 09:09
Compare
Choose a tag to compare

Support CUDA from pip