An Efficient COarse-to-fiNE Alignment Framework @ Ego4D Natural Language Queries Challenge 2022

Research output: Working PapersPreprint

View graph of relations

Author(s)

  • Wanjun Zhong
  • Lei Ji
  • Difei Gao
  • Kun Yan
  • Zheng Shou
  • Nan Duan

Related Research Unit(s)

Detail(s)

Original languageEnglish
Number of pages4
Publication statusOnline published - 16 Nov 2022

Abstract

This technical report describes the CONE approach for Ego4D Natural Language Queries (NLQ) Challenge in ECCV 2022. We leverage our model CONE, an efficient window-centric COarse-to-fiNE alignment framework. Specifically, CONE dynamically slices the long video into candidate windows via a sliding window approach. Centering at windows, CONE (1) learns the inter-window (coarse-grained) semantic variance through contrastive learning and speeds up inference by pre-filtering the candidate windows relevant to the NL query, and (2) conducts intra-window (fine-grained) candidate moments ranking utilizing the powerful multi-modal alignment ability of the contrastive vision-text pre-trained model EgoVLP. On the blind test set, CONE achieves 15.26 and 9.24 for R1@IoU=0.3 and R1@IoU=0.5, respectively.

Research Area(s)

  • cs.CV, cs.IR