Skip to content

ViTDet object detection #7630

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
hgaiser opened this issue May 25, 2023 · 2 comments
Open

ViTDet object detection #7630

hgaiser opened this issue May 25, 2023 · 2 comments

Comments

@hgaiser
Copy link
Contributor

hgaiser commented May 25, 2023

🚀 The feature

ViTDet achieves very interesting results on COCO and, given that ViT is already implemented, it seems relatively straightforward to implement this in torchvision.

Motivation, pitch

The best performing object detection network in torchvision is currently FasterRCNN with a resnet50 backbone (46.7 mAP). ViTDet reports an mAP 51.6 with ViT-B backbone, 55.6 with ViT-L and an impressive 56.7 mAP with ViT-H. Similarly impressive results have been obtained with the instance aware segmentation implementation.

Alternatives

Detectron2 implements ViTDet. It could be decided that torchvision will not provide its own implementation and instead redirects users that want to use ViTDet to Detectron2.

Additional context

Implementing ViTDet opens the door to other implementations, such as EVA-02. EVA-02 achieves even better results compared to ViTDet.

I have previously implemented RetinaNet for torchvision (later merged in #2784). I might be interested in implementing ViTDet, but I would first like to see if there is interest by the maintainers.

@oke-aditya
Copy link
Contributor

This is actually cool. Given that we have SwinTransformer and ViT a are being offered stably. Both of these can be used as backbones for ViTDet.

A major challenge would be actually reproducing the metrics with torchvision reference scripts. It's not best idea to port weights. (we have done that for SwinTransformer I guess, but it's better if we can train)

Bandwidth might be something @NicolasHug can answer 😄

@hgaiser
Copy link
Contributor Author

hgaiser commented Jun 13, 2023

@fmassa is there any interest for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants