ViTDet object detection #7630

hgaiser · 2023-05-25T12:27:12Z

🚀 The feature

ViTDet achieves very interesting results on COCO and, given that ViT is already implemented, it seems relatively straightforward to implement this in torchvision.

Motivation, pitch

The best performing object detection network in torchvision is currently FasterRCNN with a resnet50 backbone (46.7 mAP). ViTDet reports an mAP 51.6 with ViT-B backbone, 55.6 with ViT-L and an impressive 56.7 mAP with ViT-H. Similarly impressive results have been obtained with the instance aware segmentation implementation.

Alternatives

Detectron2 implements ViTDet. It could be decided that torchvision will not provide its own implementation and instead redirects users that want to use ViTDet to Detectron2.

Additional context

Implementing ViTDet opens the door to other implementations, such as EVA-02. EVA-02 achieves even better results compared to ViTDet.

I have previously implemented RetinaNet for torchvision (later merged in #2784). I might be interested in implementing ViTDet, but I would first like to see if there is interest by the maintainers.

oke-aditya · 2023-06-04T09:08:04Z

This is actually cool. Given that we have SwinTransformer and ViT a are being offered stably. Both of these can be used as backbones for ViTDet.

A major challenge would be actually reproducing the metrics with torchvision reference scripts. It's not best idea to port weights. (we have done that for SwinTransformer I guess, but it's better if we can train)

Bandwidth might be something @NicolasHug can answer 😄

hgaiser · 2023-06-13T14:53:38Z

@fmassa is there any interest for this?

hgaiser mentioned this issue Jun 21, 2023

ViTDet object detection + segmentation implementation #7690

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ViTDet object detection #7630

ViTDet object detection #7630

hgaiser commented May 25, 2023 •

edited

Loading

oke-aditya commented Jun 4, 2023

hgaiser commented Jun 13, 2023

ViTDet object detection #7630

ViTDet object detection #7630

Comments

hgaiser commented May 25, 2023 • edited Loading

🚀 The feature

Motivation, pitch

Alternatives

Additional context

oke-aditya commented Jun 4, 2023

hgaiser commented Jun 13, 2023

hgaiser commented May 25, 2023 •

edited

Loading