You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
stale-issue-message: 'Issue has not received an update in over 14 days. Adding stale label. Please note the issue will be closed in 14 days after being marked stale if there is no update.'
19
+
stale-pr-message: 'PR has not received an update in over 14 days. Adding stale label. Please note the PR will be closed in 14 days after being marked stale if there is no update.'
20
+
close-issue-message: 'This issue was closed because it has been 14 days without activity since it has been marked as stale.'
21
+
close-pr-message: 'This PR was closed because it has been 14 days without activity since it has been marked as stale.'
22
+
days-before-issue-stale: 14
23
+
days-before-close: 14
24
+
only-labels: 'waiting for feedback'
25
+
labels-to-add-when-unstale: 'investigating'
26
+
labels-to-remove-when-unstale: 'stale,waiting for feedback'
Copy file name to clipboardExpand all lines: CHANGELOG.md
+40-2Lines changed: 40 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,17 +1,55 @@
1
1
# TensorRT OSS Release Changelog
2
2
3
-
## 10.11.0 GA - 2025-5-21
3
+
## 10.12.0 GA - 2025-6-10
4
+
- Plugin changes
5
+
- Migrated `IPluginV2`-descendent version 1 of `cropAndResizeDynamic`, to version 2, which implements `IPluginV3`.
6
+
- Note: The newer versions preserve the attributes and I/O of the corresponding older plugin version. The older plugin versions are deprecated and will be removed in a future release
7
+
- Deprecated the listed versions of the following plugins:
8
+
-`DecodeBbox3DPlugin` (version 1)
9
+
-`DetectionLayer_TRT` (version 1)
10
+
-`EfficientNMS_TRT` (version 1)
11
+
-`FlattenConcat_TRT` (version 1)
12
+
-`GenerateDetection_TRT` (version 1)
13
+
-`GridAnchor_TRT` (version 1)
14
+
-`GroupNormalizationPlugin` (version 1)
15
+
-`InstanceNormalization_TRT` (version 2)
16
+
-`ModulatedDeformConv2d` (version 1)
17
+
-`MultilevelCropAndResize_TRT` (version 1)
18
+
-`MultilevelProposeROI_TRT` (version 1)
19
+
-`RPROI_TRT` (version 1)
20
+
-`PillarScatterPlugin` (version 1)
21
+
-`PriorBox_TRT` (version 1)
22
+
-`ProposalLayer_TRT` (version 1)
23
+
-`ProposalDynamic` (version 1)
24
+
-`Region_TRT` (version 1)
25
+
-`Reorg_TRT` (version 2)
26
+
-`ResizeNearest_TRT` (version 1)
27
+
-`ScatterND` (version 1)
28
+
-`VoxelGeneratorPlugin` (version 1)
29
+
- Demo changes
30
+
- Added [Image-to-Image](demo/Diffusion#generate-an-image-with-stable-diffusion-v35-large-with-controlnet-guided-by-an-image-and-a-text-prompt) support for Stable Diffusion v3.5-large ControlNet models.
31
+
- Enabled download of [pre-exported ONNX models](https://huggingface.co/stabilityai/stable-diffusion-3.5-large-tensorrt) for the Stable Diffusion v3.5-large pipeline.
32
+
- Sample changes
33
+
- Added two refactored python samples [1_run_onnx_with_tensorrt](samples/python/refactored/1_run_onnx_with_tensorrt) and [2_construct_network_with_layer_apis](samples/python/refactored/2_construct_network_with_layer_apis)
34
+
- Parser changes
35
+
- Added support for integer-typed base tensors for `Pow` operations
36
+
- Added support for custom `MXFP8` quantization operations
37
+
- Added support for ellipses, diagonal, and broadcasting in `Einsum` operations
38
+
39
+
40
+
## 10.11.0 GA - 2025-5-16
4
41
5
42
Key Features and Updates:
6
43
7
44
- Plugin changes
8
-
- Migrated `IPluginV2`-descendent version 1 of `modulatedDeformConvPlugin`, to version 2, which implements `IPluginV3`.
45
+
- Migrated `IPluginV2`-descendent version 1 of `cropAndResizePluginDynamic`, to version 2, which implements `IPluginV3`.
9
46
- Migrated `IPluginV2`-descendent version 1 of `DisentangledAttention_TRT`, to version 2, which implements `IPluginV3`.
10
47
- Migrated `IPluginV2`-descendent version 1 of `MultiscaleDeformableAttnPlugin_TRT`, to version 2, which implements `IPluginV3`.
11
48
- Note: The newer versions preserve the attributes and I/O of the corresponding older plugin version. The older plugin versions are deprecated and will be removed in a future release.
12
49
- Demo changes
13
50
- demoDiffusion
14
51
- Added support for Stable Diffusion 3.5-medium and 3.5-large pipelines in BF16 and FP16 precisions.
52
+
- Added support for Stable Diffusion 3.5-large pipeline in FP8 precision.
15
53
- Parser changes
16
54
- Added `kENABLE_UINT8_AND_ASYMMETRIC_QUANTIZATION_DLA` parser flag to enable UINT8 asymmetric quantization on engines targeting DLA.
17
55
- Removed restriction that inputs to `RandomNormalLike` and `RandomUniformLike` must be tensors.
Copy file name to clipboardExpand all lines: README.md
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -32,7 +32,7 @@ To build the TensorRT-OSS components, you will first need the following software
32
32
33
33
**TensorRT GA build**
34
34
35
-
- TensorRT v10.11.0.33
35
+
- TensorRT v10.12.0.36
36
36
- Available from direct download links listed below
37
37
38
38
**System Packages**
@@ -86,24 +86,24 @@ To build the TensorRT-OSS components, you will first need the following software
86
86
87
87
Else download and extract the TensorRT GA build from [NVIDIA Developer Zone](https://developer.nvidia.com) with the direct links below:
88
88
89
-
-[TensorRT 10.11.0.33 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.11.0/tars/TensorRT-10.11.0.33.Linux.x86_64-gnu.cuda-11.8.tar.gz)
90
-
-[TensorRT 10.11.0.33 for CUDA 12.9, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.11.0/tars/TensorRT-10.11.0.33.Linux.x86_64-gnu.cuda-12.9.tar.gz)
91
-
-[TensorRT 10.11.0.33 for CUDA 11.8, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.11.0/zip/TensorRT-10.11.0.33.Windows.win10.cuda-11.8.zip)
92
-
-[TensorRT 10.11.0.33 for CUDA 12.9, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.11.0/zip/TensorRT-10.11.0.33.Windows.win10.cuda-12.9.zip)
89
+
-[TensorRT 10.12.0.36 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.12.0/tars/TensorRT-10.12.0.36.Linux.x86_64-gnu.cuda-11.8.tar.gz)
90
+
-[TensorRT 10.12.0.36 for CUDA 12.9, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.12.0/tars/TensorRT-10.12.0.36.Linux.x86_64-gnu.cuda-12.9.tar.gz)
91
+
-[TensorRT 10.12.0.36 for CUDA 11.8, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.12.0/zip/TensorRT-10.12.0.36.Windows.win10.cuda-11.8.zip)
92
+
-[TensorRT 10.12.0.36 for CUDA 12.9, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.12.0/zip/TensorRT-10.12.0.36.Windows.win10.cuda-12.9.zip)
93
93
94
94
**Example: Ubuntu 20.04 on x86-64 with cuda-12.9**
95
95
96
96
```bash
97
97
cd~/Downloads
98
-
tar -xvzf TensorRT-10.11.0.33.Linux.x86_64-gnu.cuda-12.9.tar.gz
99
-
export TRT_LIBPATH=`pwd`/TensorRT-10.11.0.33
98
+
tar -xvzf TensorRT-10.12.0.36.Linux.x86_64-gnu.cuda-12.9.tar.gz
Copy file name to clipboardExpand all lines: demo/Diffusion/README.md
+25-3Lines changed: 25 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -49,7 +49,7 @@ onnx 1.15.0
49
49
onnx-graphsurgeon 0.5.2
50
50
onnxruntime 1.16.3
51
51
polygraphy 0.49.9
52
-
tensorrt 10.11.0.33
52
+
tensorrt 10.12.0.36
53
53
tokenizers 0.13.3
54
54
torch 2.2.0
55
55
transformers 4.42.2
@@ -154,6 +154,8 @@ python3 demo_controlnet.py "A beautiful bird with rainbow colors" --controlnet-t
154
154
155
155
> NOTE: Currently only `--controlnet-type canny` is supported. `--input-image` must be a pre-processed image corresponding to `--controlnet-type canny`. If unspecified, a sample image will be downloaded.
156
156
157
+
> NOTE: FP8 quantization (`--fp8`) is supported.
158
+
157
159
### Generate an image guided by a text prompt, and using specified LoRA model weight updates
158
160
159
161
```bash
@@ -208,10 +210,13 @@ Run the command below to generate an image using Stable Diffusion 3 and Stable D
208
210
python3 demo_txt2img_sd3.py "A vibrant street wall covered in colorful graffiti, the centerpiece spells \"SD3 MEDIUM\", in a storm of colors" --version sd3 --hf-token=$HF_TOKEN
209
211
210
212
# Stable Diffusion 3.5-medium
211
-
python3 demo_txt2img_sd35.py "a beautiful photograph of Mt. Fuji during cherry blossom" --version=3.5-medium --denoising-steps=30 --guidance-scale 3.5 --hf-token=$HF_TOKEN
213
+
python3 demo_txt2img_sd35.py "a beautiful photograph of Mt. Fuji during cherry blossom" --version=3.5-medium --denoising-steps=30 --guidance-scale 3.5 --hf-token=$HF_TOKEN --bf16
212
214
213
215
# Stable Diffusion 3.5-large
214
-
python3 demo_txt2img_sd35.py "a beautiful photograph of Mt. Fuji during cherry blossom" --version=3.5-large --denoising-steps=30 --guidance-scale 3.5 --hf-token=$HF_TOKEN
216
+
python3 demo_txt2img_sd35.py "a beautiful photograph of Mt. Fuji during cherry blossom" --version=3.5-large --denoising-steps=30 --guidance-scale 3.5 --hf-token=$HF_TOKEN --bf16 --download-onnx-models
217
+
218
+
# Stable Diffusion 3.5-large FP8
219
+
python3 demo_txt2img_sd35.py "a beautiful photograph of Mt. Fuji during cherry blossom" --version=3.5-large --denoising-steps=30 --guidance-scale 3.5 --hf-token=$HF_TOKEN --fp8 --download-onnx-models --onnx-dir onnx_35_fp8/ --engine-dir engine_35_fp8/
215
220
```
216
221
217
222
You can also specify an input image conditioning as shown below
@@ -225,6 +230,19 @@ python3 demo_txt2img_sd3.py "dog wearing a sweater and a blue collar" --version
225
230
226
231
Note that a denosing-percentage is applied to the number of denoising-steps when an input image conditioning is provided. Its default value is set to 0.6. This parameter can be updated using `--denoising-percentage`
227
232
233
+
### Generate an image with Stable Diffusion v3.5-large with ControlNet guided by an image and a text prompt
234
+
235
+
```bash
236
+
# Depth
237
+
python3 demo_controlnet_sd35.py "a photo of a man" --controlnet-type depth --hf-token=$HF_TOKEN --denoising-steps 40 --guidance-scale 4.5 --bf16
238
+
239
+
# Canny
240
+
python3 demo_controlnet_sd35.py "A Night time photo taken by Leica M11, portrait of a Japanese woman in a kimono, looking at the camera, Cherry blossoms" --controlnet-type canny --hf-token=$HF_TOKEN --denoising-steps 60 --guidance-scale 3.5 --bf16
241
+
242
+
# Blur
243
+
python3 demo_controlnet_sd35.py "generated ai art, a tiny, lost rubber ducky in an action shot close-up, surfing the humongous waves, inside the tube, in the style of Kelly Slater" --controlnet-type blur --hf-token=$HF_TOKEN --denoising-steps 60 --guidance-scale 3.5 --bf16
244
+
```
245
+
228
246
### Generate a video guided by an initial image using Stable Video Diffusion
229
247
230
248
Download the pre-exported ONNX model
@@ -442,3 +460,7 @@ Custom override paths to pre-built engine files can be provided using `--custom-
442
460
- To accelerate engine building time use `--timing-cache <path to cache file>`. The cache file will be created if it does not already exist. Note that performance may degrade if cache files are used across multiple GPU targets. It is recommended to use timing caches only during development. To achieve the best perfromance in deployment, please build engines without timing cache.
443
461
- Specify new directories for storing onnx and engine files when switching between versions, LoRAs, ControlNets, etc. This can be done using `--onnx-dir <new onnx dir>` and `--engine-dir <new engine dir>`.
444
462
- Inference performance can be improved by enabling [CUDA graphs](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-graphs) using `--use-cuda-graph`. Enabling CUDA graphs requires fixed input shapes, so this flag must be combined with `--build-static-batch` and cannot be combined with `--build-dynamic-shape`.
463
+
464
+
### Known Issues
465
+
466
+
The Stable Diffusion XL pipeline is currently not supported on RTX 5090 due to memory constraints. This issue will be resolved in an upcoming release.
0 commit comments