Beyond Concept Annotation: Multimedia Event Detection and Recounting

Project: Research

View graph of relations

Description

With the massive growth of Internet videos, intensive research efforts have been devoted to concept annotation, particularly the learning of audio-visual classifiers for annotating video archives with textual words (or semantic concepts). Based on the current technologies, indexing a video archive with hundreds of elementary concepts for text-to-video search in a narrow domain (e.g., for broadcast videos) and with some restrictions (e.g., only handle queries with text words that are not out of concept vocabulary) is feasible, as evidenced by the annual TRECVID benchmark evaluation.Despite the progress, querying videos with text words beyond concept atoms, for example an event of making a cake, in a large Internet video archive remains a difficult problem. The challenge comes from the fact that the event is generic – a general description that can refer to wide varieties of cakes and ways of making them, and complex – involving activities such as flouring, baking, and freezing. Training an event-specific classifier for “making a cake” is difficult unless there are abundant training examples that adequately cover all cases; and more importantly it is not scalable with the fact that Internet videos can contain any possible scenes and events. With most Internet videos uploaded together with the amateur tags, the difficulty of answering event-oriented queries could be lower. However, the amateur-tags are used to be error-prone and not specific. Searching generic events based on textual tags is known to be limited – in the way that it becomes the users’ responsibilities to explore the long list of returned videos for getting the right hits.This project aims to address two challenges: event detection – identify videos and localize the segments containing events that are previously unknown by a search system; event recounting – narrate the audio-visual evidences of how a video relates to an event by generating short textual descriptions with illustrative thumbnails. The former addresses an issue of modeling and reasoning of event knowledge from a large number of noisily tagged concepts, while the latter explicates the reasoning process to textual sentences. The major goal is to research techniques that enable the search of complex and generic events that beyond the current concept classifier learning can handle, and recount the reasoning process for fast video browsing that beyond the current video summarization techniques can offer. Both issues are of great value to Internet video searching and content monitoring.

Detail(s)

Project number9041906
Grant typeGRF
StatusFinished
Effective start/end date1/01/145/06/18