Skip to content

Commit 83879a2

Browse files
committed
[feat] fix readme and add metafile
1 parent 85abecd commit 83879a2

File tree

2 files changed

+127
-12
lines changed

2 files changed

+127
-12
lines changed

configs/recognition/mvit/README.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -27,21 +27,21 @@ well as 86.1% on Kinetics-400 video classification.
2727

2828
### Kinetics-400
2929

30-
| frame sampling strategy | resolution | backbone | pretrain | top1 acc | top5 acc | reference top1 acc | reference top1 acc | testing protocol | params | config | ckpt |
31-
| :---------------------: | :------------: | :--------: | :----------: | :------: | :------: | :------------------------------: | :------------------------------: | :--------------: | :----: | :------------------: | :-----------------: |
32-
| 16x4x1 | short-side 320 | MViTv2-S\* | From scratch | 81.1 | 94.7 | [81.0](https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md) | [94.6](https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md) | 5 clips x 1 crop | xx.xM | [config](/configs/recognition/mvit/mvit-small-p244_16x4x1_kinetics400-rgb.py) | [ckpt](https://download.openmmlab.com/mmaction/v1.0/recognition/mvit/) |
33-
| 32x3x1 | short-side 320 | MViTv2-B\* | From scratch | 82.6 | 95.8 | [82.9](https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md) | [95.7](https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md) | 5 clips x 1 crop | xx.xM | [config](/configs/recognition/mvit/mvit-base-p244_32x3x1_kinetics400-rgb.py) | [ckpt](https://download.openmmlab.com/mmaction/v1.0/recognition/mvit/) |
34-
| 40x3x1 | short-side 320 | MViTv2-L\* | From scratch | 85.4 | 96.2 | [86.1](https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md) | [97.0](https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md) | 5 clips x 3 crop | xx.xM | [config](/configs/recognition/mvit/mvit-large-p244_40x3x1_kinetics400-rgb.py) | [ckpt](https://download.openmmlab.com/mmaction/v1.0/recognition/mvit/) |
30+
| frame sampling strategy | resolution | backbone | pretrain | top1 acc | top5 acc | reference top1 acc | reference top1 acc | testing protocol | FLOPs | params | config | ckpt |
31+
| :---------------------: | :------------: | :--------: | :----------: | :------: | :------: | :-----------------------------: | :-----------------------------: | :--------------: | :---: | :----: | :-----------------: | :---------------: |
32+
| 16x4x1 | short-side 320 | MViTv2-S\* | From scratch | 81.1 | 94.7 | [81.0](https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md) | [94.6](https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md) | 5 clips x 1 crop | 64G | 34.5M | [config](/configs/recognition/mvit/mvit-small-p244_16x4x1_kinetics400-rgb.py) | [ckpt](https://download.openmmlab.com/mmaction/v1.0/recognition/mvit/converted/mvit-small-p244_16x4x1_kinetics400-rgb_20221021-9ebaaeed.pth) |
33+
| 32x3x1 | short-side 320 | MViTv2-B\* | From scratch | 82.6 | 95.8 | [82.9](https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md) | [95.7](https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md) | 5 clips x 1 crop | 225G | 51.2M | [config](/configs/recognition/mvit/mvit-base-p244_32x3x1_kinetics400-rgb.py) | [ckpt](https://download.openmmlab.com/mmaction/v1.0/recognition/mvit/converted/mvit-base-p244_32x3x1_kinetics400-rgb_20221021-f392cd2d.pth) |
34+
| 40x3x1 | short-side 320 | MViTv2-L\* | From scratch | 85.4 | 96.2 | [86.1](https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md) | [97.0](https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md) | 5 clips x 3 crop | 2828G | 213M | [config](/configs/recognition/mvit/mvit-large-p244_40x3x1_kinetics400-rgb.py) | [ckpt](https://download.openmmlab.com/mmaction/v1.0/recognition/mvit/converted/mvit-large-p244_40x3x1_kinetics400-rgb_20221021-11fe1f97.pth) |
3535

3636
### Something-Something V2
3737

38-
| frame sampling strategy | resolution | backbone | pretrain | top1 acc | top5 acc | reference top1 acc | reference top1 acc | testing protocol | params | config | ckpt |
39-
| :---------------------: | :------------: | :--------: | :----------: | :------: | :------: | :------------------------------: | :------------------------------: | :--------------: | :----: | :------------------: | :-----------------: |
40-
| uniform 16 | short-side 320 | MViTv2-S\* | K400 | 68.1 | 91.0 | [68.2](https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md) | [91.4](https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md) | 1 clips x 3 crop | xx.xM | [config](/configs/recognition/mvit/mvit-small-p244_16x4x1_kinetics400-rgb.py) | [ckpt](https://download.openmmlab.com/mmaction/v1.0/recognition/mvit/) |
41-
| uniform 32 | short-side 320 | MViTv2-B\* | K400 | 70.8 | 92.7 | [70.5](https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md) | [92.7](https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md) | 1 clips x 3 crop | xx.xM | [config](/configs/recognition/mvit/mvit-base-p244_32x3x1_kinetics400-rgb.py) | [ckpt](https://download.openmmlab.com/mmaction/v1.0/recognition/mvit/) |
42-
| uniform 40 | short-side 320 | MViTv2-L\* | IN21K + K400 | 73.2 | 94.0 | [73.3](https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md) | [94.0](https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md) | 1 clips x 3 crop | xx.xM | [config](/configs/recognition/mvit/mvit-large-p244_40x3x1_kinetics400-rgb.py) | [ckpt](https://download.openmmlab.com/mmaction/v1.0/recognition/mvit/) |
38+
| frame sampling strategy | resolution | backbone | pretrain | top1 acc | top5 acc | reference top1 acc | reference top1 acc | testing protocol | FLOPs | params | config | ckpt |
39+
| :---------------------: | :------------: | :--------: | :----------: | :------: | :------: | :----------------------------: | :-----------------------------: | :---------------: | :---: | :----: | :-----------------: | :---------------: |
40+
| uniform 16 | short-side 320 | MViTv2-S\* | K400 | 68.1 | 91.0 | [68.2](https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md) | [91.4](https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md) | 1 clips x 3 crops | 64G | 34.4M | [config](/configs/recognition/mvit/mvit-small-p244_u16_sthv2-rgb.py) | [ckpt](https://download.openmmlab.com/mmaction/v1.0/recognition/mvit/converted/mvit-small-p244_u16_sthv2-rgb_20221021-65ecae7d.pth) |
41+
| uniform 32 | short-side 320 | MViTv2-B\* | K400 | 70.8 | 92.7 | [70.5](https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md) | [92.7](https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md) | 1 clips x 3 crops | 225G | 51.1M | [config](/configs/recognition/mvit/mvit-base-p244_u32_sthv2-rgb.py) | [ckpt](https://download.openmmlab.com/mmaction/v1.0/recognition/mvit/converted/mvit-base-p244_u32_sthv2-rgb_20221021-d5de5da6.pth) |
42+
| uniform 40 | short-side 320 | MViTv2-L\* | IN21K + K400 | 73.2 | 94.0 | [73.3](https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md) | [94.0](https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md) | 1 clips x 3 crops | 2828G | 213M | [config](/configs/recognition/mvit/mvit-large-p244_u40_sthv2-rgb.py) | [ckpt](https://download.openmmlab.com/mmaction/v1.0/recognition/mvit/converted/mvit-large-p244_u40_sthv2-rgb_20221021-61696e07.pth) |
4343

44-
*Models with * are ported from the repo [SlowFast](https://github.com/facebookresearch/SlowFast/) and tested on our data. Currently, we only support the testing of X3D models, training will be available soon.*
44+
*Models with * are ported from the repo [SlowFast](https://github.com/facebookresearch/SlowFast/) and tested on our data. Currently, we only support the testing of MViT models, training will be available soon.*
4545

4646
1. The values in columns named after "reference" are copied from paper
4747
2. The validation set of Kinetics400 we used consists of 19796 videos. These videos are available at [Kinetics400-Validation](https://mycuhk-my.sharepoint.com/:u:/g/personal/1155136485_link_cuhk_edu_hk/EbXw2WX94J1Hunyt3MWNDJUBz-nHvQYhO9pvKqm6g39PMA?e=a9QldB). The corresponding [data list](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_val_list.txt) (each line is of the format 'video_id, num_frames, label_index') and the [label map](https://download.openmmlab.com/mmaction/dataset/k400_val/kinetics_class2ind.txt) are also available.
@@ -59,7 +59,7 @@ python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [optional arguments]
5959
Example: test MViT model on Kinetics-400 dataset and dump the result to a pkl file.
6060

6161
```shell
62-
python tools/test.py configs/recognition/mvit/mvit-small_16x4x1_kinetics400-rgb.py \
62+
python tools/test.py configs/recognition/mvit/mvit-small-p244_16x4x1_kinetics400-rgb.py \
6363
checkpoints/SOME_CHECKPOINT.pth --dump result.pkl
6464
```
6565

configs/recognition/mvit/metafile.yml

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
Collections:
2+
- Name: MViT
3+
README: configs/recognition/MViT/README.md
4+
Paper:
5+
URL: http://openaccess.thecvf.com//content/CVPR2022/papers/Li_MViTv2_Improved_Multiscale_Vision_Transformers_for_Classification_and_Detection_CVPR_2022_paper.pdf
6+
Title: "MViTv2: Improved Multiscale Vision Transformers for Classification and Detection"
7+
8+
Models:
9+
- Name: mvit-small-p244_16x4x1_kinetics400-rgb
10+
Config: configs/recognition/mvit/mvit-small-p244_16x4x1_kinetics400-rgb.py
11+
In Collection: MViT
12+
Metadata:
13+
Architecture: MViT-small
14+
Resolution: short-side 320
15+
Modality: RGB
16+
Converted From:
17+
Weights: https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md
18+
Code: https://github.com/facebookresearch/SlowFast/
19+
Results:
20+
- Dataset: Kinetics-400
21+
Task: Action Recognition
22+
Metrics:
23+
Top 1 Accuracy: 81.1
24+
Top 5 Accuracy: 94.7
25+
Weights: https://download.openmmlab.com/mmaction/v1.0/recognition/mvit/converted/mvit-small-p244_16x4x1_kinetics400-rgb_20221021-9ebaaeed.pth
26+
27+
- Name: mvit-base-p244_32x3x1_kinetics400-rgb
28+
Config: configs/recognition/mvit/mvit-base-p244_32x3x1_kinetics400-rgb.py
29+
In Collection: MViT
30+
Metadata:
31+
Architecture: MViT-base
32+
Resolution: short-side 320
33+
Modality: RGB
34+
Converted From:
35+
Weights: https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md
36+
Code: https://github.com/facebookresearch/SlowFast/
37+
Results:
38+
- Dataset: Kinetics-400
39+
Task: Action Recognition
40+
Metrics:
41+
Top 1 Accuracy: 81.1
42+
Top 5 Accuracy: 94.7
43+
Weights: https://download.openmmlab.com/mmaction/v1.0/recognition/mvit/converted/mvit-base-p244_32x3x1_kinetics400-rgb_20221021-f392cd2d.pth
44+
45+
- Name: mvit-large-p244_40x3x1_kinetics400-rgb
46+
Config: configs/recognition/mvit/mvit-large-p244_40x3x1_kinetics400-rgb.py
47+
In Collection: MViT
48+
Metadata:
49+
Architecture: MViT-large
50+
Resolution: short-side 446
51+
Modality: RGB
52+
Converted From:
53+
Weights: https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md
54+
Code: https://github.com/facebookresearch/SlowFast/
55+
Results:
56+
- Dataset: Kinetics-400
57+
Task: Action Recognition
58+
Metrics:
59+
Top 1 Accuracy: 81.1
60+
Top 5 Accuracy: 94.7
61+
Weights: https://download.openmmlab.com/mmaction/v1.0/recognition/mvit/converted/mvit-large-p244_40x3x1_kinetics400-rgb_20221021-11fe1f97.pth
62+
63+
- Name: mvit-small-p244_u16_sthv2-rgb
64+
Config: configs/recognition/mvit/mvit-small-p244_u16_sthv2-rgb.py
65+
In Collection: MViT
66+
Metadata:
67+
Architecture: MViT-small
68+
Resolution: short-side 320
69+
Modality: RGB
70+
Converted From:
71+
Weights: https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md
72+
Code: https://github.com/facebookresearch/SlowFast/
73+
Results:
74+
- Dataset: SthV2
75+
Task: Action Recognition
76+
Metrics:
77+
Top 1 Accuracy: 68.1
78+
Top 5 Accuracy: 91.0
79+
Weights: https://download.openmmlab.com/mmaction/v1.0/recognition/mvit/converted/mvit-small-p244_u16_sthv2-rgb_20221021-65ecae7d.pth
80+
81+
- Name: mvit-base-p244_u32_sthv2-rgb
82+
Config: configs/recognition/mvit/mvit-base-p244_u32_sthv2-rgb.py
83+
In Collection: MViT
84+
Metadata:
85+
Architecture: MViT-small
86+
Resolution: short-side 320
87+
Modality: RGB
88+
Converted From:
89+
Weights: https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md
90+
Code: https://github.com/facebookresearch/SlowFast/
91+
Results:
92+
- Dataset: SthV2
93+
Task: Action Recognition
94+
Metrics:
95+
Top 1 Accuracy: 70.8
96+
Top 5 Accuracy: 92.7
97+
Weights: https://download.openmmlab.com/mmaction/v1.0/recognition/mvit/converted/mvit-base-p244_u32_sthv2-rgb_20221021-d5de5da6.pth
98+
99+
- Name: mvit-large-p244_u40_sthv2-rgb
100+
Config: configs/recognition/mvit/mvit-large-p244_u40_sthv2-rgb.py
101+
In Collection: MViT
102+
Metadata:
103+
Architecture: MViT-small
104+
Resolution: short-side 446
105+
Modality: RGB
106+
Converted From:
107+
Weights: https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md
108+
Code: https://github.com/facebookresearch/SlowFast/
109+
Results:
110+
- Dataset: SthV2
111+
Task: Action Recognition
112+
Metrics:
113+
Top 1 Accuracy: 73.2
114+
Top 5 Accuracy: 94.0
115+
Weights: https://download.openmmlab.com/mmaction/v1.0/recognition/mvit/converted/mvit-large-p244_u40_sthv2-rgb_20221021-61696e07.pth

0 commit comments

Comments
 (0)