Description
In the function encode_boxes (line 79 of torchvision\models\detection_utils.py), it seems that the width of the ground truth proposals matched is being computed as
ex_widths = proposals_x2 - proposals_x1
ex_heights = proposals_y2 - proposals_y1
But for a bounding box from ms coco [368, 413, 368, 417]. I guess this is just a matter of opinion if this is a "valid" bounding box, but it seems to me that x_min = x_max is valid for a box that is 1 pixel wide, and y_max-y_min pixels high. Anyway this causes the targets_dw or targets_dh to take the torch.log of 0, giving float(-inf), which can of course be easily fixed by adding +1 to the width, or the fix:
ex_widths = proposals_x2 - proposals_x1 + 1
ex_heights = proposals_y2 - proposals_y1 + 1
Either that or I could just filter out these boxes with x_min = x_max or y_min = y_max