Skip to content

Commit 433283e

Browse files
authored
Specify coordinate constraints for box parameters (#3425)
* Specify coordinate constraints * some more * flake8
1 parent e1c49fa commit 433283e

File tree

10 files changed

+59
-44
lines changed

10 files changed

+59
-44
lines changed

torchvision/models/detection/faster_rcnn.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -32,8 +32,8 @@ class FasterRCNN(GeneralizedRCNN):
3232
3333
During training, the model expects both the input tensors, as well as a targets (list of dictionary),
3434
containing:
35-
- boxes (FloatTensor[N, 4]): the ground-truth boxes in [x1, y1, x2, y2] format, with values of x
36-
between 0 and W and values of y between 0 and H
35+
- boxes (``FloatTensor[N, 4]``): the ground-truth boxes in ``[x1, y1, x2, y2]`` format, with
36+
``0 <= x1 < x2 <= W`` and ``0 <= y1 < y2 <= H``.
3737
- labels (Int64Tensor[N]): the class label for each ground-truth box
3838
3939
The model returns a Dict[Tensor] during training, containing the classification and regression
@@ -42,8 +42,8 @@ class FasterRCNN(GeneralizedRCNN):
4242
During inference, the model requires only the input tensors, and returns the post-processed
4343
predictions as a List[Dict[Tensor]], one for each input image. The fields of the Dict are as
4444
follows:
45-
- boxes (FloatTensor[N, 4]): the predicted boxes in [x1, y1, x2, y2] format, with values of x
46-
between 0 and W and values of y between 0 and H
45+
- boxes (``FloatTensor[N, 4]``): the predicted boxes in ``[x1, y1, x2, y2]`` format, with
46+
``0 <= x1 < x2 <= W`` and ``0 <= y1 < y2 <= H``.
4747
- labels (Int64Tensor[N]): the predicted labels for each image
4848
- scores (Tensor[N]): the scores or each prediction
4949
@@ -309,8 +309,8 @@ def fasterrcnn_resnet50_fpn(pretrained=False, progress=True,
309309
During training, the model expects both the input tensors, as well as a targets (list of dictionary),
310310
containing:
311311
312-
- boxes (``FloatTensor[N, 4]``): the ground-truth boxes in ``[x1, y1, x2, y2]`` format, with values of ``x``
313-
between ``0`` and ``W`` and values of ``y`` between ``0`` and ``H``
312+
- boxes (``FloatTensor[N, 4]``): the ground-truth boxes in ``[x1, y1, x2, y2]`` format, with
313+
``0 <= x1 < x2 <= W`` and ``0 <= y1 < y2 <= H``.
314314
- labels (``Int64Tensor[N]``): the class label for each ground-truth box
315315
316316
The model returns a ``Dict[Tensor]`` during training, containing the classification and regression
@@ -320,8 +320,8 @@ def fasterrcnn_resnet50_fpn(pretrained=False, progress=True,
320320
predictions as a ``List[Dict[Tensor]]``, one for each input image. The fields of the ``Dict`` are as
321321
follows:
322322
323-
- boxes (``FloatTensor[N, 4]``): the predicted boxes in ``[x1, y1, x2, y2]`` format, with values of ``x``
324-
between ``0`` and ``W`` and values of ``y`` between ``0`` and ``H``
323+
- boxes (``FloatTensor[N, 4]``): the predicted boxes in ``[x1, y1, x2, y2]`` format, with
324+
``0 <= x1 < x2 <= W`` and ``0 <= y1 < y2 <= H``.
325325
- labels (``Int64Tensor[N]``): the predicted labels for each image
326326
- scores (``Tensor[N]``): the scores or each prediction
327327

torchvision/models/detection/keypoint_rcnn.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,8 @@ class KeypointRCNN(FasterRCNN):
2727
During training, the model expects both the input tensors, as well as a targets (list of dictionary),
2828
containing:
2929
30-
- boxes (FloatTensor[N, 4]): the ground-truth boxes in [x1, y1, x2, y2] format, with values of x
31-
between 0 and W and values of y between 0 and H
30+
- boxes (``FloatTensor[N, 4]``): the ground-truth boxes in ``[x1, y1, x2, y2]`` format, with
31+
``0 <= x1 < x2 <= W`` and ``0 <= y1 < y2 <= H``.
3232
- labels (Int64Tensor[N]): the class label for each ground-truth box
3333
- keypoints (FloatTensor[N, K, 3]): the K keypoints location for each of the N instances, in the
3434
format [x, y, visibility], where visibility=0 means that the keypoint is not visible.
@@ -40,8 +40,8 @@ class KeypointRCNN(FasterRCNN):
4040
predictions as a List[Dict[Tensor]], one for each input image. The fields of the Dict are as
4141
follows:
4242
43-
- boxes (FloatTensor[N, 4]): the predicted boxes in [x1, y1, x2, y2] format, with values of x
44-
between 0 and W and values of y between 0 and H
43+
- boxes (``FloatTensor[N, 4]``): the predicted boxes in ``[x1, y1, x2, y2]`` format, with
44+
``0 <= x1 < x2 <= W`` and ``0 <= y1 < y2 <= H``.
4545
- labels (Int64Tensor[N]): the predicted labels for each image
4646
- scores (Tensor[N]): the scores or each prediction
4747
- keypoints (FloatTensor[N, K, 3]): the locations of the predicted keypoints, in [x, y, v] format.
@@ -286,8 +286,8 @@ def keypointrcnn_resnet50_fpn(pretrained=False, progress=True,
286286
During training, the model expects both the input tensors, as well as a targets (list of dictionary),
287287
containing:
288288
289-
- boxes (``FloatTensor[N, 4]``): the ground-truth boxes in ``[x1, y1, x2, y2]`` format, with values of ``x``
290-
between ``0`` and ``W`` and values of ``y`` between ``0`` and ``H``
289+
- boxes (``FloatTensor[N, 4]``): the ground-truth boxes in ``[x1, y1, x2, y2]`` format, with
290+
``0 <= x1 < x2 <= W`` and ``0 <= y1 < y2 <= H``.
291291
- labels (``Int64Tensor[N]``): the class label for each ground-truth box
292292
- keypoints (``FloatTensor[N, K, 3]``): the ``K`` keypoints location for each of the ``N`` instances, in the
293293
format ``[x, y, visibility]``, where ``visibility=0`` means that the keypoint is not visible.
@@ -299,8 +299,8 @@ def keypointrcnn_resnet50_fpn(pretrained=False, progress=True,
299299
predictions as a ``List[Dict[Tensor]]``, one for each input image. The fields of the ``Dict`` are as
300300
follows:
301301
302-
- boxes (``FloatTensor[N, 4]``): the predicted boxes in ``[x1, y1, x2, y2]`` format, with values of ``x``
303-
between ``0`` and ``W`` and values of ``y`` between ``0`` and ``H``
302+
- boxes (``FloatTensor[N, 4]``): the predicted boxes in ``[x1, y1, x2, y2]`` format, with
303+
``0 <= x1 < x2 <= W`` and ``0 <= y1 < y2 <= H``.
304304
- labels (``Int64Tensor[N]``): the predicted labels for each image
305305
- scores (``Tensor[N]``): the scores or each prediction
306306
- keypoints (``FloatTensor[N, K, 3]``): the locations of the predicted keypoints, in ``[x, y, v]`` format.

torchvision/models/detection/mask_rcnn.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,8 @@ class MaskRCNN(FasterRCNN):
2626
2727
During training, the model expects both the input tensors, as well as a targets (list of dictionary),
2828
containing:
29-
- boxes (FloatTensor[N, 4]): the ground-truth boxes in [x1, y1, x2, y2] format, with values of x
30-
between 0 and W and values of y between 0 and H
29+
- boxes (``FloatTensor[N, 4]``): the ground-truth boxes in ``[x1, y1, x2, y2]`` format, with
30+
``0 <= x1 < x2 <= W`` and ``0 <= y1 < y2 <= H``.
3131
- labels (Int64Tensor[N]): the class label for each ground-truth box
3232
- masks (UInt8Tensor[N, H, W]): the segmentation binary masks for each instance
3333
@@ -37,8 +37,8 @@ class MaskRCNN(FasterRCNN):
3737
During inference, the model requires only the input tensors, and returns the post-processed
3838
predictions as a List[Dict[Tensor]], one for each input image. The fields of the Dict are as
3939
follows:
40-
- boxes (FloatTensor[N, 4]): the predicted boxes in [x1, y1, x2, y2] format, with values of x
41-
between 0 and W and values of y between 0 and H
40+
- boxes (``FloatTensor[N, 4]``): the predicted boxes in ``[x1, y1, x2, y2]`` format, with
41+
``0 <= x1 < x2 <= W`` and ``0 <= y1 < y2 <= H``.
4242
- labels (Int64Tensor[N]): the predicted labels for each image
4343
- scores (Tensor[N]): the scores or each prediction
4444
- masks (UInt8Tensor[N, 1, H, W]): the predicted masks for each instance, in 0-1 range. In order to
@@ -279,8 +279,8 @@ def maskrcnn_resnet50_fpn(pretrained=False, progress=True,
279279
During training, the model expects both the input tensors, as well as a targets (list of dictionary),
280280
containing:
281281
282-
- boxes (``FloatTensor[N, 4]``): the ground-truth boxes in ``[x1, y1, x2, y2]`` format, with values of ``x``
283-
between ``0`` and ``W`` and values of ``y`` between ``0`` and ``H``
282+
- boxes (``FloatTensor[N, 4]``): the ground-truth boxes in ``[x1, y1, x2, y2]`` format, with
283+
``0 <= x1 < x2 <= W`` and ``0 <= y1 < y2 <= H``.
284284
- labels (``Int64Tensor[N]``): the class label for each ground-truth box
285285
- masks (``UInt8Tensor[N, H, W]``): the segmentation binary masks for each instance
286286
@@ -291,8 +291,8 @@ def maskrcnn_resnet50_fpn(pretrained=False, progress=True,
291291
predictions as a ``List[Dict[Tensor]]``, one for each input image. The fields of the ``Dict`` are as
292292
follows:
293293
294-
- boxes (``FloatTensor[N, 4]``): the predicted boxes in ``[x1, y1, x2, y2]`` format, with values of ``x``
295-
between ``0`` and ``W`` and values of ``y`` between ``0`` and ``H``
294+
- boxes (``FloatTensor[N, 4]``): the predicted boxes in ``[x1, y1, x2, y2]`` format, with
295+
``0 <= x1 < x2 <= W`` and ``0 <= y1 < y2 <= H``.
296296
- labels (``Int64Tensor[N]``): the predicted labels for each image
297297
- scores (``Tensor[N]``): the scores or each prediction
298298
- masks (``UInt8Tensor[N, 1, H, W]``): the predicted masks for each instance, in ``0-1`` range. In order to

torchvision/models/detection/retinanet.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -236,8 +236,8 @@ class RetinaNet(nn.Module):
236236
237237
During training, the model expects both the input tensors, as well as a targets (list of dictionary),
238238
containing:
239-
- boxes (FloatTensor[N, 4]): the ground-truth boxes in [x1, y1, x2, y2] format, with values
240-
between 0 and H and 0 and W
239+
- boxes (``FloatTensor[N, 4]``): the ground-truth boxes in ``[x1, y1, x2, y2]`` format, with
240+
``0 <= x1 < x2 <= W`` and ``0 <= y1 < y2 <= H``.
241241
- labels (Int64Tensor[N]): the class label for each ground-truth box
242242
243243
The model returns a Dict[Tensor] during training, containing the classification and regression
@@ -246,8 +246,8 @@ class RetinaNet(nn.Module):
246246
During inference, the model requires only the input tensors, and returns the post-processed
247247
predictions as a List[Dict[Tensor]], one for each input image. The fields of the Dict are as
248248
follows:
249-
- boxes (FloatTensor[N, 4]): the predicted boxes in [x1, y1, x2, y2] format, with values between
250-
0 and H and 0 and W
249+
- boxes (``FloatTensor[N, 4]``): the predicted boxes in ``[x1, y1, x2, y2]`` format, with
250+
``0 <= x1 < x2 <= W`` and ``0 <= y1 < y2 <= H``.
251251
- labels (Int64Tensor[N]): the predicted labels for each image
252252
- scores (Tensor[N]): the scores for each prediction
253253
@@ -576,8 +576,8 @@ def retinanet_resnet50_fpn(pretrained=False, progress=True,
576576
During training, the model expects both the input tensors, as well as a targets (list of dictionary),
577577
containing:
578578
579-
- boxes (``FloatTensor[N, 4]``): the ground-truth boxes in ``[x1, y1, x2, y2]`` format, with values
580-
between ``0`` and ``H`` and ``0`` and ``W``
579+
- boxes (``FloatTensor[N, 4]``): the ground-truth boxes in ``[x1, y1, x2, y2]`` format, with
580+
``0 <= x1 < x2 <= W`` and ``0 <= y1 < y2 <= H``.
581581
- labels (``Int64Tensor[N]``): the class label for each ground-truth box
582582
583583
The model returns a ``Dict[Tensor]`` during training, containing the classification and regression
@@ -587,8 +587,8 @@ def retinanet_resnet50_fpn(pretrained=False, progress=True,
587587
predictions as a ``List[Dict[Tensor]]``, one for each input image. The fields of the ``Dict`` are as
588588
follows:
589589
590-
- boxes (``FloatTensor[N, 4]``): the predicted boxes in ``[x1, y1, x2, y2]`` format, with values between
591-
``0`` and ``H`` and ``0`` and ``W``
590+
- boxes (``FloatTensor[N, 4]``): the predicted boxes in ``[x1, y1, x2, y2]`` format, with
591+
``0 <= x1 < x2 <= W`` and ``0 <= y1 < y2 <= H``.
592592
- labels (``Int64Tensor[N]``): the predicted labels for each image
593593
- scores (``Tensor[N]``): the scores or each prediction
594594

torchvision/ops/boxes.py

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,8 @@ def nms(boxes: Tensor, scores: Tensor, iou_threshold: float) -> Tensor:
2222
2323
Args:
2424
boxes (Tensor[N, 4])): boxes to perform NMS on. They
25-
are expected to be in (x1, y1, x2, y2) format
25+
are expected to be in ``(x1, y1, x2, y2)`` format with ``0 <= x1 < x2`` and
26+
``0 <= y1 < y2``.
2627
scores (Tensor[N]): scores for each one of the boxes
2728
iou_threshold (float): discards all overlapping boxes with IoU > iou_threshold
2829
@@ -50,7 +51,8 @@ def batched_nms(
5051
5152
Args:
5253
boxes (Tensor[N, 4]): boxes where NMS will be performed. They
53-
are expected to be in (x1, y1, x2, y2) format
54+
are expected to be in ``(x1, y1, x2, y2)`` format with ``0 <= x1 < x2`` and
55+
``0 <= y1 < y2``.
5456
scores (Tensor[N]): scores for each one of the boxes
5557
idxs (Tensor[N]): indices of the categories for each one of the boxes.
5658
iou_threshold (float): discards all overlapping boxes with IoU > iou_threshold
@@ -79,7 +81,8 @@ def remove_small_boxes(boxes: Tensor, min_size: float) -> Tensor:
7981
Remove boxes which contains at least one side smaller than min_size.
8082
8183
Args:
82-
boxes (Tensor[N, 4]): boxes in (x1, y1, x2, y2) format
84+
boxes (Tensor[N, 4]): boxes in ``(x1, y1, x2, y2)`` format
85+
with ``0 <= x1 < x2`` and ``0 <= y1 < y2``.
8386
min_size (float): minimum size
8487
8588
Returns:
@@ -97,7 +100,8 @@ def clip_boxes_to_image(boxes: Tensor, size: Tuple[int, int]) -> Tensor:
97100
Clip boxes so that they lie inside an image of size `size`.
98101
99102
Args:
100-
boxes (Tensor[N, 4]): boxes in (x1, y1, x2, y2) format
103+
boxes (Tensor[N, 4]): boxes in ``(x1, y1, x2, y2)`` format
104+
with ``0 <= x1 < x2`` and ``0 <= y1 < y2``.
101105
size (Tuple[height, width]): size of the image
102106
103107
Returns:
@@ -185,7 +189,8 @@ def box_area(boxes: Tensor) -> Tensor:
185189
186190
Args:
187191
boxes (Tensor[N, 4]): boxes for which the area will be computed. They
188-
are expected to be in (x1, y1, x2, y2) format
192+
are expected to be in (x1, y1, x2, y2) format with
193+
``0 <= x1 < x2`` and ``0 <= y1 < y2``.
189194
190195
Returns:
191196
area (Tensor[N]): area for each box
@@ -215,7 +220,8 @@ def box_iou(boxes1: Tensor, boxes2: Tensor) -> Tensor:
215220
"""
216221
Return intersection-over-union (Jaccard index) of boxes.
217222
218-
Both sets of boxes are expected to be in (x1, y1, x2, y2) format.
223+
Both sets of boxes are expected to be in ``(x1, y1, x2, y2)`` format with
224+
``0 <= x1 < x2`` and ``0 <= y1 < y2``.
219225
220226
Args:
221227
boxes1 (Tensor[N, 4])
@@ -234,7 +240,8 @@ def generalized_box_iou(boxes1: Tensor, boxes2: Tensor) -> Tensor:
234240
"""
235241
Return generalized intersection-over-union (Jaccard index) of boxes.
236242
237-
Both sets of boxes are expected to be in (x1, y1, x2, y2) format.
243+
Both sets of boxes are expected to be in ``(x1, y1, x2, y2)`` format with
244+
``0 <= x1 < x2`` and ``0 <= y1 < y2``.
238245
239246
Args:
240247
boxes1 (Tensor[N, 4])

torchvision/ops/poolers.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -204,7 +204,7 @@ def forward(
204204
all the same number of channels, but they can have different sizes.
205205
boxes (List[Tensor[N, 4]]): boxes to be used to perform the pooling operation, in
206206
(x1, y1, x2, y2) format and in the image reference size, not the feature map
207-
reference.
207+
reference. The coordinate must satisfy ``0 <= x1 < x2`` and ``0 <= y1 < y2``.
208208
image_shapes (List[Tuple[height, width]]): the sizes of each image before they
209209
have been fed to a CNN to obtain feature maps. This allows us to infer the
210210
scale factor for each one of the levels to be pooled.

torchvision/ops/ps_roi_align.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,9 @@ def ps_roi_align(
2121
Args:
2222
input (Tensor[N, C, H, W]): input tensor
2323
boxes (Tensor[K, 5] or List[Tensor[L, 4]]): the box coordinates in (x1, y1, x2, y2)
24-
format where the regions will be taken from. If a single Tensor is passed,
24+
format where the regions will be taken from.
25+
The coordinate must satisfy ``0 <= x1 < x2`` and ``0 <= y1 < y2``.
26+
If a single Tensor is passed,
2527
then the first column should contain the batch index. If a list of Tensors
2628
is passed, then each Tensor will correspond to the boxes for an element i
2729
in a batch

torchvision/ops/ps_roi_pool.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,9 @@ def ps_roi_pool(
2020
Args:
2121
input (Tensor[N, C, H, W]): input tensor
2222
boxes (Tensor[K, 5] or List[Tensor[L, 4]]): the box coordinates in (x1, y1, x2, y2)
23-
format where the regions will be taken from. If a single Tensor is passed,
23+
format where the regions will be taken from.
24+
The coordinate must satisfy ``0 <= x1 < x2`` and ``0 <= y1 < y2``.
25+
If a single Tensor is passed,
2426
then the first column should contain the batch index. If a list of Tensors
2527
is passed, then each Tensor will correspond to the boxes for an element i
2628
in a batch

torchvision/ops/roi_align.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,9 @@ def roi_align(
2222
Args:
2323
input (Tensor[N, C, H, W]): input tensor
2424
boxes (Tensor[K, 5] or List[Tensor[L, 4]]): the box coordinates in (x1, y1, x2, y2)
25-
format where the regions will be taken from. If a single Tensor is passed,
25+
format where the regions will be taken from.
26+
The coordinate must satisfy ``0 <= x1 < x2`` and ``0 <= y1 < y2``.
27+
If a single Tensor is passed,
2628
then the first column should contain the batch index. If a list of Tensors
2729
is passed, then each Tensor will correspond to the boxes for an element i
2830
in a batch

torchvision/ops/roi_pool.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,9 @@ def roi_pool(
2020
Args:
2121
input (Tensor[N, C, H, W]): input tensor
2222
boxes (Tensor[K, 5] or List[Tensor[L, 4]]): the box coordinates in (x1, y1, x2, y2)
23-
format where the regions will be taken from. If a single Tensor is passed,
23+
format where the regions will be taken from.
24+
The coordinate must satisfy ``0 <= x1 < x2`` and ``0 <= y1 < y2``.
25+
If a single Tensor is passed,
2426
then the first column should contain the batch index. If a list of Tensors
2527
is passed, then each Tensor will correspond to the boxes for an element i
2628
in a batch

0 commit comments

Comments
 (0)