Pre-training for Video Understanding Challenge




Track 1 Leaderboard

# Team Name [email protected] METEOR CIDEr-D SPICE
1 CASIA_IVA 26.13 20.86 35.09 7.85
2 Gene 23.67 19.63 31.19 7.52
3 aimc_21 20.66 20.13 30.18 7.40
4 Nameless 22.80 18.87 27.95 6.40
5 Micro Genius 20.93 17.34 24.42 5.60
6 MSVLPT 21.26 17.10 23.35 5.50
7 tsinghua_hhh 7.98 13.90 17.28 5.16

Track 2 Leaderboard

# Team Name Top-1 accuracy
1 Silver_Bullet 62.28
2 MSVLPT 56.77
3 sunny_flower 54.33
4 ethan 53.66
5 ghost_rider 50.83

Metrics

For the evaluation in the downstream task of video captioning, we will use and publish in a leaderboard the automatic metric results, including [email protected], METEOR, CIDEr and SPICE, on the testing set of MSR-VTT dataset.

For the evaluation in the downstream task of video categorization, we will report the top-1 accuracy on the testing set of Downstream dataset.



Citations

@article{autogif2020, title={Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training}, author={Yingwei Pan and Yehao Li and Jianjie Luo and Jun Xu and Ting Yao and Tao Mei}, journal={arXiv preprint arXiv:2007.02375}, year={2020}} @inproceedings{msrvtt, title={MSR-VTT: A Large Video Description Dataset for Bridging Video and Language}, author={Jun Xu and Tao Mei and Ting Yao and Yong Rui}, booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2016}}