Description
Context
While many key aten
converters are under development, the end-to-end performance of certain key models is greatly affected by segmentation. For instance, with min_block_size=5
we obtain nearly 100 distinct TRT Engines in the graph, as of the current main
. This is an unreasonable amount of segmentation, which slows down inference. A temporary solution to this is to increase the block size via min_block_size
, however this has the implication that every converter benefits the same from acceleration, which is not necessarily the case. For instance, convolution
may benefit from conversion much more than add
or sub
.
Feature Proposal
In the converter registry, each converter can be assigned a weight based on the operation it is converting and the relative speedup which it would gain from conversion. For instance, convolution
could have a weight of 5, while add
has a weight of 1. From here, there are a few options for picking better segments:
- Ensure the weighted sum of the converter weights is at least
min_block_size
- Remove
min_block_size
in favor of a simpler heuristic likeallowed_segmentation - {"none", "minimal", "moderate", "large", "most"}
Activity
github-actions commentedon Oct 5, 2023
This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days
narendasan commentedon Oct 23, 2023
Refocus on dry run providing rough recommendations and statistics on the graph partitioning
gs-olive commentedon Oct 24, 2023
See #2413 for an adjacent feature request discussing dry-run mode.
[-]✨[Feature] Engine prioritization scheme for Dynamo[/-][+]✨[Feature] Compilation Dry-Run in Dynamo[/+]