Global Fusion Attention for Vision and Language Understanding (Student Abstract)
Research output: Chapters, Conference Papers, Creative and Literary Works › RGC 32 - Refereed conference paper (with host publication) › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Title of host publication | The Thirty-Fifth AAAI Conference on Artificial Intelligence. The Thirty-Third Conference on Innovative Applications of Artificial Intelligence. The Eleventh Symposium on Educational Advances in Artificial Intelligence |
Publisher | AAAI Press |
Pages | 15789-15790 |
ISBN (electronic) | 9781577358664 (18 issue set) |
Publication status | Published - 2021 |
Publication series
Name | AAAI Conference on Artificial Intelligence |
---|---|
Number | 18 |
Volume | 35 |
ISSN (Print) | 2159-5399 |
ISSN (electronic) | 2374-3468 |
Conference
Title | 35th AAAI Conference on Artificial Intelligence / 33rd Conference on Innovative Applications of Artificial Intelligence / 11th Symposium on Educational Advances in Artificial Intelligence |
---|---|
Period | 2 - 9 February 2021 |
Link(s)
Document Link | Links
|
---|---|
Link to Scopus | https://www.scopus.com/record/display.uri?eid=2-s2.0-85130022780&origin=recordpage |
Permanent Link | https://scholars.cityu.edu.hk/en/publications/publication(c8f3e56e-23b6-4405-9590-fa247ba3ece6).html |
Abstract
We extend the popular transformer architecture to a multi modal model, processing both visual and textual inputs. We propose a new attention mechanism on Transformer-based architecture for the joint vision and language understanding tasks. Our model fuses multi-level comprehension between images and texts in a weighted manner, which could better curve the internal relationships. Experiments on benchmark VQA dataset CLEVR demonstrate the effectiveness of the proposed attention mechanism. We also observe the improvements in sample efficiency of reinforcement learning through the experiments on grounded language understanding tasks of BabyAI platform.
Citation Format(s)
Global Fusion Attention for Vision and Language Understanding (Student Abstract). / Guo, Zixin; Liang, Chen; Wan, Ziyu et al.
The Thirty-Fifth AAAI Conference on Artificial Intelligence. The Thirty-Third Conference on Innovative Applications of Artificial Intelligence. The Eleventh Symposium on Educational Advances in Artificial Intelligence. AAAI Press, 2021. p. 15789-15790 (AAAI Conference on Artificial Intelligence; Vol. 35, No. 18).
The Thirty-Fifth AAAI Conference on Artificial Intelligence. The Thirty-Third Conference on Innovative Applications of Artificial Intelligence. The Eleventh Symposium on Educational Advances in Artificial Intelligence. AAAI Press, 2021. p. 15789-15790 (AAAI Conference on Artificial Intelligence; Vol. 35, No. 18).
Research output: Chapters, Conference Papers, Creative and Literary Works › RGC 32 - Refereed conference paper (with host publication) › peer-review