An Efficient COarse-to-fiNE Alignment Framework @ Ego4D Natural Language Queries Challenge 2022
Research output: Working Papers › Preprint
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Number of pages | 4 |
Publication status | Online published - 16 Nov 2022 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/publications/publication(ebf0c789-0e59-4d28-b7df-6227002dc1d1).html |
---|
Abstract
This technical report describes the CONE approach for Ego4D Natural Language Queries (NLQ) Challenge in ECCV 2022. We leverage our model CONE, an efficient window-centric COarse-to-fiNE alignment framework. Specifically, CONE dynamically slices the long video into candidate windows via a sliding window approach. Centering at windows, CONE (1) learns the inter-window (coarse-grained) semantic variance through contrastive learning and speeds up inference by pre-filtering the candidate windows relevant to the NL query, and (2) conducts intra-window (fine-grained) candidate moments ranking utilizing the powerful multi-modal alignment ability of the contrastive vision-text pre-trained model EgoVLP. On the blind test set, CONE achieves 15.26 and 9.24 for R1@IoU=0.3 and R1@IoU=0.5, respectively.
Research Area(s)
- cs.CV, cs.IR
Citation Format(s)
An Efficient COarse-to-fiNE Alignment Framework @ Ego4D Natural Language Queries Challenge 2022. / Hou, Zhijian; Zhong, Wanjun; Ji, Lei et al.
2022.
2022.
Research output: Working Papers › Preprint