You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+25Lines changed: 25 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,30 @@
1
1
# TensorRT OSS Release Changelog
2
2
3
+
## 10.7.0 GA - 2024-12-4
4
+
Key Feature and Updates:
5
+
6
+
- Demo Changes
7
+
- demoDiffusion
8
+
- Enabled low-vram for the Flux pipeline. Users can now run the pipelines on systems with 32GB VRAM.
9
+
- Added support for [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) pipeline.
10
+
- Enabled weight streaming mode for Flux pipeline.
11
+
12
+
- Plugin Changes
13
+
- On Blackwell and later platforms, TensorRT will drop cuDNN support on the following categories of plugins
14
+
- User-written `IPluginV2Ext`, `IPluginV2DynamicExt`, and `IPluginV2IOExt` plugins that are dependent on cuDNN handles provided by TensorRT (via the `attachToContext()` API).
15
+
- TensorRT standard plugins that use cuDNN, specifically:
16
+
-`InstanceNormalization_TRT` (version: 1, 2, and 3) present in `plugin/instanceNormalizationPlugin/`.
17
+
-`GroupNormalizationPlugin` (version: 1) present in `plugin/groupNormalizationPlugin/`.
18
+
- Note: These normalization plugins are superseded by TensorRT’s native `INormalizationLayer` ([C++](https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_normalization_layer.html), [Python](https://docs.nvidia.com/deeplearning/tensorrt/operators/docs/Normalization.html)). TensorRT support for cuDNN-dependent plugins remain unchanged on pre-Blackwell platforms.
19
+
20
+
- Parser Changes
21
+
- Now prioritizes using plugins over local functions when a corresponding plugin is available in the registry.
22
+
- Added dynamic axes support for `Squeeze` and `Unsqueeze` operations.
23
+
- Added support for parsing mixed-precision `BatchNormalization` nodes in strongly-typed mode.
Copy file name to clipboardExpand all lines: README.md
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,7 +26,7 @@ You can skip the **Build** section to enjoy TensorRT with Python.
26
26
To build the TensorRT-OSS components, you will first need the following software packages.
27
27
28
28
**TensorRT GA build**
29
-
* TensorRT v10.6.0.26
29
+
* TensorRT v10.7.0.23
30
30
* Available from direct download links listed below
31
31
32
32
**System Packages**
@@ -73,25 +73,25 @@ To build the TensorRT-OSS components, you will first need the following software
73
73
If using the TensorRT OSS build container, TensorRT libraries are preinstalled under `/usr/lib/x86_64-linux-gnu` and you may skip this step.
74
74
75
75
Else download and extract the TensorRT GA build from [NVIDIA Developer Zone](https://developer.nvidia.com) with the direct links below:
76
-
-[TensorRT 10.6.0.26 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.6.0/tars/TensorRT-10.6.0.26.Linux.x86_64-gnu.cuda-11.8.tar.gz)
77
-
-[TensorRT 10.6.0.26 for CUDA 12.6, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.6.0/tars/TensorRT-10.6.0.26.Linux.x86_64-gnu.cuda-12.6.tar.gz)
78
-
-[TensorRT 10.6.0.26 for CUDA 11.8, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.6.0/zip/TensorRT-10.6.0.26.Windows.win10.cuda-11.8.zip)
79
-
-[TensorRT 10.6.0.26 for CUDA 12.6, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.6.0/zip/TensorRT-10.6.0.26.Windows.win10.cuda-12.6.zip)
76
+
-[TensorRT 10.7.0.23 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.7.0/tars/TensorRT-10.7.0.23.Linux.x86_64-gnu.cuda-11.8.tar.gz)
77
+
-[TensorRT 10.7.0.23 for CUDA 12.6, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.7.0/tars/TensorRT-10.7.0.23.Linux.x86_64-gnu.cuda-12.6.tar.gz)
78
+
-[TensorRT 10.7.0.23 for CUDA 11.8, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.7.0/zip/TensorRT-10.7.0.23.Windows.win10.cuda-11.8.zip)
79
+
-[TensorRT 10.7.0.23 for CUDA 12.6, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.7.0/zip/TensorRT-10.7.0.23.Windows.win10.cuda-12.6.zip)
80
80
81
81
82
82
**Example: Ubuntu 20.04 on x86-64 with cuda-12.6**
83
83
84
84
```bash
85
85
cd~/Downloads
86
-
tar -xvzf TensorRT-10.6.0.26.Linux.x86_64-gnu.cuda-12.6.tar.gz
87
-
export TRT_LIBPATH=`pwd`/TensorRT-10.6.0.26
86
+
tar -xvzf TensorRT-10.7.0.23.Linux.x86_64-gnu.cuda-12.6.tar.gz
help="Maximum sequence length to use with the prompt",
73
+
help="Maximum sequence length to use with the prompt. Can be up to 512 for the dev and 256 for the schnell variant.",
70
74
)
71
75
parser.add_argument(
72
-
"--bf16",
73
-
action='store_true',
74
-
help="Run pipeline in BFloat16 precision"
76
+
"--bf16", action="store_true", help="Run pipeline in BFloat16 precision"
75
77
)
76
78
parser.add_argument(
77
79
"--low-vram",
80
+
action="store_true",
81
+
help="Optimize for low VRAM usage, possibly at the expense of inference performance. Disabled by default.",
82
+
)
83
+
parser.add_argument(
84
+
"--optimization-level",
85
+
type=int,
86
+
default=3,
87
+
help=f"Set the builder optimization level to build the engine with. A higher level allows TensorRT to spend more building time for more optimization options. Must be one of {VALID_OPTIMIZATION_LEVELS}.",
88
+
)
89
+
parser.add_argument(
90
+
"--torch-fallback",
91
+
default=None,
92
+
type=str,
93
+
help="Name list of models to be inferenced using torch instead of TRT. For example --torch-fallback t5,transformer. If --torch-inference set, this parameter will be ignored."
94
+
)
95
+
96
+
parser.add_argument(
97
+
"--ws",
78
98
action='store_true',
79
-
help="Optimize for low VRAM usage, possibly at the expense of inference performance. Disabled by default."
99
+
help="Build TensorRT engines with weight streaming enabled."
80
100
)
81
101
102
+
parser.add_argument(
103
+
"--t5-ws-percentage",
104
+
type=int,
105
+
default=None,
106
+
help="Set runtime weight streaming budget as the percentage of the size of streamable weights for the T5 model. This argument only takes effect when --ws is set. 0 streams the most weights and 100 or None streams no weights. "
107
+
)
108
+
109
+
parser.add_argument(
110
+
"--transformer-ws-percentage",
111
+
type=int,
112
+
default=None,
113
+
help="Set runtime weight streaming budget as the percentage of the size of streamable weights for the transformer model. This argument only takes effect when --ws is set. 0 streams the most weights and 100 or None streams no weights."
0 commit comments