Description
Context
In the Dynamo paths, there is a great deal of information behind-the-scenes, which could be helpful to users outside of "debug" mode. Specifically, users may want to get a glimpse of the operator coverage for their model, the structure of the model graph, or potentially the composition of each partition in the TRT/Torch split. These usecases can all be enabled via the Dry-Run functionality.
Feature Proposal
A new compilation argument, dry_run
, which when enabled, proceeds through all the steps of compilation through partitioning, stopping before conversion. In the ir="dynamo"
path, the program would halt here, whereas in the ir="toch_compile"
path, we would return the Torch graph so as to not cause an error in the torch.compile
interpreter.
The dry_run
argument could print (at least) the following valuable information and statistics:
- List of supported and unsupported operators
- Number of graphs in the network after partitioning
- Average number of TRT operators per segmented graph
- Schematic of the graph compositions, roughly:
Inputs --> (45 operators, TRT) ---> (23 operators, Torch) ---> (30 operators, TRT) --> Outputs
- Automatic recommendations for
min_block_size
choices to reduce (or increase) segmentation, such as:
For the minimal amount of segmentation, consider {min_block_size=45}, which will result in {2} graph segments.
For a reasonable amount of segmentation, consider {min_block_size=30}, which will result in {3} graph segments.
For the maximal amount of TRT-run-operators and most segmentation, consider {min_block_size=5}, which will result in {10} graph segments.
- I/O schematic for the graph, roughly:
Inputs --(2, 3), (3, 4)-> (TRT) --(3, 7), (6, 3)-> (23 operators, Torch) --(5, 3)-> (30 operators, TRT) -(24, 1)-> Outputs
- (Future) Use
torch._dynamo.explain
to get graph break information from within the compiler, providing the number of graph breaks, and reasons for those breaks, in the Dry-Run outputs as well