Project Details
Description
In real-life, there exist some search queries that are hard to express in text but easy to formulate with image/video as examples. However, providing multimedia-based queries is not always convenient on desktop machines, and textual input remains the major way of information seeking. The proliferation of mobile and hand-held devices nevertheless has brought change to this trend.
Sensor-rich mobile users can more quickly and naturally find information on the Web nowadays.
With a mobile device to provide multimodal input for search, it is expected that in the near future,
a user could perform search by submitting images/videos anywhere and anytime, in addition to
texting or speaking a short question or few keywords as query. For example, a query can be
composed of a natural language question: “How to make a toy like this for my kid?” and an
image or video of toy snapped by mobile device. Developing a search engine that answers
multimodal queries nevertheless remains difficult based on the current technology. This is due to
difficulties in the semantic understanding of multimodal queries, and the search of potential
answers through a large amount of unstructured and semi-structured multimedia data.This project will address two key problems for multimedia answering on mobile device. The first problem, referred to as instance search, corresponds to the challenge of textually describing visual instances through large-scale search of multimedia data. The search of visual instances poses serious technical difficulties as the instances can be snapped from arbitrary viewpoints and the target searched items may or may not be in the same context as the instances. These difficulties make the existing near-duplicate image/video retrieval techniques, which have been
adopted by some mobile search applications, become incompetent for this new task. The second
problem, referred to as question answering (QA), corresponds to the answering of short text
questions together with visual search results. Different from conventional text-based QA, the
questions can be answered by images, videos, text passages or any combination of them. This
problem remains new and the associated issues, including the understanding of multimodal
questions, and the selection and visualization of answers based on factors such as questions types
and physical constraints of mobile devices, have not been previously researched.The innovative aspect of this project is originated from the proposal of a QA-based search paradigm that addresses the challenges and problems of leveraging multimedia information for question posting and answering in mobile environment. The contributions will be twofold. For instance search, we seek for near real-time, distributed and memory-efficient instance search algorithms that operate on large-scale Internet multimedia database. The research focus is on the robust representation, matching and search of visual instances with consideration to scalability, which are timely issues for enabling mobile media retrieval beyond near-duplicate and vertical search. For multimedia QA, the focus is on the aggregation, summarization and visualization of information that best leverage different forms of media for question answering on mobile devices. These issues are new and yet to be explored in the current research literature.
| Project number | 9041783 |
|---|---|
| Grant type | GRF |
| Status | Finished |
| Effective start/end date | 1/10/12 → 20/03/17 |
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.
Research output
- 2 RGC 21 - Publication in refereed journal
-
Fast covariant VLAD for image search
Zhao, W.-L., Ngo, C.-W. & Wang, H., 1 Sept 2016, In: IEEE Transactions on Multimedia. 18, 9, p. 1843-1854 7499824.Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
8 Link opens in a new tab Citations (Scopus) -
Hyperlink-Aware Object Retrieval
Zhang, W., Ngo, C.-W. & Cao, X., 1 Sept 2016, In: IEEE Transactions on Image Processing. 25, 9, p. 4186-4198 7508952.Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
8 Link opens in a new tab Citations (Scopus)