-
Notifications
You must be signed in to change notification settings - Fork 161
Description
I can not judge if this has any practical reason, but if you try to export a engine via advanced sliders when no onnx-file has yet been created, and the dimensions are the "default", but with text_maxlen raised from 150 to 450 and text_optlen raised from 75 to 150 for example like this engine:
profile = modelobj.get_input_profile(
batch_min,
batch_opt,
batch_max,
height_min,
height_opt,
height_max,
width_min,
width_opt,
width_max,
static_shapes,
) := (1, 1, 4, 512, 512, 768, 512, 512, 768, False)
print(profile)
<{'sample': [(1, 4, 64, 64), (1, 4, 64, 64), (8, 4, 96, 96)], 'timesteps': [(1,), (1,), (8,)], 'encoder_hidden_states': [(1, 154, 768), (1, 154, 768), (8, 462, 768)]}
Then the export of the onnx-file fails at a assert
statement later. Which is a bit confusing, as it doesn't seem to be stated anywhere that the "default" engine has to be created for other exports to work nor that there are any restriction related to the token dimensions.
The same engine but with text_optlen left at the minimum value of 75, works as expected. In general I'm pretty confused why the initial onnx-file generation depends on the dimensions of the first engine created at all, when the onnx-file generated later can be reused for any arbitrary other engine dimensions. For example trying to export a static 16 batch engine will quickly eat over 60GB RAM and try to eat something over 24GB of VRAM when generating the onnx file (which crashes for me). But if I first export the smallest engine possible (and thereby implicitly the onnx file) I can later then export much larger tensorRT-engines than I would have been able in a single step.
I experimentally deleted these two lines, and at least my example case seems to work* (*although it creates a engine with encoder states of "encoder_hidden_states": [[1,154,768],[2,154,768],[8,462,768]), and so not a min_length of 77, but as I said, I do not know the reason for this if-clause, so...
https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT/blob/4c2bcafd854f7bc74d3ca9c5c3c90112e9fe6e55/models.py#L320
if self.text_optlen > 77:
return (min_batch, opt_batch, max_batch * 2)
Here is the exact reason for the failed export:
Here a opt_batch of 1
is returned:
Stable-Diffusion-WebUI-TensorRT/models.py
Line 320 in 4c2bcaf
return (min_batch, opt_batch, max_batch * 2) |
def get_batch_dim(self, min_batch, opt_batch, max_batch, static_batch):
if self.text_maxlen <= 77:
return (min_batch * 2, opt_batch * 2, max_batch * 2)
elif self.text_maxlen > 77 and static_batch:
return (opt_batch, opt_batch, opt_batch)
elif self.text_maxlen > 77 and not static_batch:
if self.text_optlen > 77:
return (min_batch, opt_batch, max_batch * 2)
return (min_batch, opt_batch * 2, max_batch * 2)
else:
raise Exception("Uncovered case in get_batch_dim")
This gets passed into the
export_onnx(
onnx_path,
modelobj,
profile=profile,
diable_optimizations=diable_optimizations,
)
https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT/blob/4c2bcafd854f7bc74d3ca9c5c3c90112e9fe6e55/ui_trt.py#L135C1-L135C1
inside the profile
.
Where inputs
is calculated like this
inputs = modelobj.get_sample_input(
profile["sample"][1][0] // 2,
profile["sample"][1][-2] * 8,
profile["sample"][1][-1] * 8,
)
And profile["sample"][1][0] // 2,
in this case will come out to (opt_batch :=1) // 2
which equals 0
and in get_sample_input
the method self.checkdims
then gets called with batch_size=0
as a result.
Stable-Diffusion-WebUI-TensorRT/models.py
Line 977 in 4c2bcaf
latent_height, latent_width = self.check_dims( |
Which in turn asserts:
Stable-Diffusion-WebUI-TensorRT/models.py
Line 259 in 4c2bcaf
assert batch_size >= self.min_batch and batch_size <= self.max_batch |
And fails.