You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Add support for hardware compatibility for Ampere and later
architectures
- Add necessary functions to support the modification throughout the
stack, including C++ and Python components
- Update ABI version to address new metadata format for TRT Engines
- Update engine serialization schema accordingly
- Add test cases to validate feature
"""Compile a TorchScript module for NVIDIA GPUs using TensorRT
@@ -140,6 +143,7 @@ def compile(
140
143
use_python_runtime: (bool): Return a graph using a pure Python runtime, reduces options for serialization
141
144
use_fast_partitioner: (bool): Use the adjacency based partitioning scheme instead of the global partitioner. Adjacency partitioning is faster but may not be optiminal. Use the global paritioner (``False``) if looking for best performance
142
145
enable_experimental_decompositions (bool): Use the full set of operator decompositions. These decompositions may not be tested but serve to make the grap easier to covert to TensorRT, potentially increasing the amount of graphs run in TensorRT.
146
+
hardware_compatible (bool): Build the TensorRT engines compatible with GPU architectures other than that of the GPU on which the engine was built (currently works for NVIDIA Ampere and newer)
143
147
**kwargs: Any,
144
148
Returns:
145
149
torch.fx.GraphModule: Compiled FX Module, when run it will execute via TensorRT
Copy file name to clipboardExpand all lines: py/torch_tensorrt/dynamo/_settings.py
+3Lines changed: 3 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -12,6 +12,7 @@
12
12
DLA_SRAM_SIZE,
13
13
ENABLE_EXPERIMENTAL_DECOMPOSITIONS,
14
14
ENGINE_CAPABILITY,
15
+
HARDWARE_COMPATIBLE,
15
16
MAX_AUX_STREAMS,
16
17
MIN_BLOCK_SIZE,
17
18
NUM_AVG_TIMING_ITERS,
@@ -63,6 +64,7 @@ class CompilationSettings:
63
64
dla_sram_size (int): Fast software managed RAM used by DLA to communicate within a layer.
64
65
dla_local_dram_size (int): Host RAM used by DLA to share intermediate tensor data across operations
65
66
dla_global_dram_size (int): Host RAM used by DLA to store weights and metadata for execution
67
+
hardware_compatible (bool): Build the TensorRT engines compatible with GPU architectures other than that of the GPU on which the engine was built (currently works for NVIDIA Ampere and newer)
0 commit comments