-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[prototype] Optimize and clean up all affine methods #6945
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There are some related failures. I'll check them on Monday. |
@@ -915,7 +915,7 @@ def sample_inputs_rotate_video(): | |||
reference_inputs_fn=reference_inputs_rotate_image_tensor, | |||
float32_vs_uint8=True, | |||
# TODO: investigate | |||
closeness_kwargs=pil_reference_pixel_difference(100, agg_method="mean"), | |||
closeness_kwargs=pil_reference_pixel_difference(110, agg_method="mean"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flaky test unrelated to this PR that popped up previously on another PR:
Unrelated flakiness:
FAILED test/test_prototype_transforms_functional.py::TestKernels::test_against_reference[rotate_image_tensor-38] - AssertionError: The 'mean' of the absolute difference is 104.21571906354515, but only 100.0 is allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some high-level notes on the changes. After this PR we would very minimally use _FT
in our code-base. Some changes can be reverted or moved on FT
stable if we want to minimize copy-pasted code.
I will follow up with benchmarks for the affected methods.
raise ValueError(f"Interpolation mode '{interpolation}' is unsupported with Tensor input") | ||
|
||
|
||
def _affine_grid( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is modified minimally. Only a rename (to match the naming of other methods such as _perspective_grid
) and 1 tiny in-place that won't matter. Can be reverted.
def _get_inverse_affine_matrix( | ||
center: List[float], angle: float, translate: List[float], scale: float, shear: List[float], inverted: bool = True | ||
) -> List[float]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we do some caching of intermediate results and minor refactoring (especially with the negative values) to make the code a bit more readable IMO. Can be reverted.
return matrix | ||
|
||
|
||
def _compute_affine_output_size(matrix: List[float], w: int, h: int) -> Tuple[int, int]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aminmax
call + a bunch of in-place ops to speed things up.
return int(size[0]), int(size[1]) # w, h | ||
|
||
|
||
def _apply_grid_transform( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do in-place ops where possible, plus a bit of refactoring. Important note is that the input image must be float. This is because the handling/casting can be done more efficiently outside of the method.
return float_img | ||
|
||
|
||
def _assert_grid_transform_inputs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor clean ups to make the if statements clearer.
if ndim > 4: | ||
image = image.reshape((-1,) + shape[-3:]) | ||
needs_unsquash = True | ||
elif ndim == 3: | ||
image = image.unsqueeze(0) | ||
needs_unsquash = True | ||
else: | ||
needs_unsquash = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adopting the idiom from other places to handle squashing. I do the same in all methods where possible below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the special casing for ndim == 3
here? Is there actually a significant perf difference between reshape
and unsqueeze
in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a perf. This is because _apply_grid_transform
assumes batched input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, this is copied from something so not a new idiom that will be introduced here. I'll look into it in a follow-up.
dtype = image.dtype if fp else torch.float32 | ||
theta = torch.tensor(matrix, dtype=dtype, device=image.device).reshape(1, 2, 3) | ||
grid = _affine_grid(theta, w=width, h=height, ow=width, oh=height) | ||
output = _apply_grid_transform(image if fp else image.to(dtype), grid, interpolation.value, fill=fill) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Porting the code over from _FT.affine
to be able to use the optimized private methods defined above. I do the same to all other kernels where possible below.
output = image | ||
new_width, new_height = _compute_affine_output_size(matrix, width, height) if expand else (width, height) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I don't adopt the squash idiom from other places because it would lead to more complex logic. I have concerns about whether the implementation followed here (maintained from main) actually handles properly all corner-cases (ill formed images with 0 elements), similar to the issue observed with pad.
@vfdev-5 Might be worth talking a look on your side to see if a mitigation is necessary similar to #6949 (aka sending the image through the kernel normally).
if coeffs is not None and len(coeffs) != 8: | ||
raise ValueError("Argument coeffs should have 8 float values") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is coeffs
in an affine transformation? That seems out of place here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #6945 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the perspective coeff. I'm not writing new code, I'm porting the existing methods here. I don't think changes of changing the validation should be in scope on this PR because it will get really fast really quickly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me send a follow-up PR then.
if ndim > 4: | ||
image = image.reshape((-1,) + shape[-3:]) | ||
needs_unsquash = True | ||
elif ndim == 3: | ||
image = image.unsqueeze(0) | ||
needs_unsquash = True | ||
else: | ||
needs_unsquash = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the special casing for ndim == 3
here? Is there actually a significant perf difference between reshape
and unsqueeze
in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless benchmarks show a perf regression somewhere, LGTM if CI is green!
Summary: * Clean up `_get_inverse_affine_matrix` and `_compute_affine_output_size` * Optimize `_apply_grid_transform` * Cleanup `_assert_grid_transform_inputs` * Fix bugs on `_pad_with_scalar_fill` & `crop_mask` and port `crop_image_tensor` * Call directly `_pad_with_scalar_fill` * Fix linter * Clean up `center_crop_image_tensor` * Fix comments. * Fixing rounding issues. * Bumping tolerance for rotate which is unrelated to this PR. * Fix tolerance threshold for RandomPerspective. * Clean up `_affine_grid` and `affine_image_tensor` * Clean up `rotate_image_tensor` * Fixing linter * Address code-review comments. Reviewed By: YosuaMichael Differential Revision: D41376279 fbshipit-source-id: bc21d68d6374b39d78ae74ba01e49633b2d8d1e2
Related to #6818
Clean ups:
_get_inverse_affine_matrix
_assert_grid_transform_inputs
_perspective_grid
_affine_bounding_box_xyxy
_affine_grid
Perfs:
_compute_affine_output_size
_apply_grid_transform
Affected methods (to benchmark):
rotate_image_tensor
perspective_image_tensor
elastic_image_tensor
affine_image_tensor
Benchmarks:
The PR improves all the kernels across the board. There is only one reported slowdown on
elastic_image_tensor cpu torch.float32
but these 2 runs have high std. I did a few consequent runs and it looks good. I think the changes look like they give us a 10-20% speed boost. We will measure it more carefully on the placeholder ticket.Before the PR (main branch):
This PR:
cc @vfdev-5 @bjuncek @pmeier