A Computational Aesthetic Design Science Study on Online Video Based on Triple-Dimensional Multimodal Analysis
Research output: Chapters, Conference Papers, Creative and Literary Works › RGC 32 - Refereed conference paper (with host publication) › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Title of host publication | HCI International 2024 – Late Breaking Papers |
Subtitle of host publication | 26th International Conference on Human-Computer Interaction, HCII 2024, Washington, DC, USA, June 29 – July 4, 2024, Proceedings |
Editors | Aaron Marcus, Elizabeth Rosenzweig, Marcelo M. Soares, Pei-Luen Patrick Rau, Abbas Moallem |
Place of Publication | Cham |
Publisher | Springer |
Pages | 68-79 |
Volume | Part VII |
ISBN (electronic) | 978-3-031-76821-7 |
ISBN (print) | 9783031768200 |
Publication status | Online published - 17 Dec 2025 |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Volume | 15380 |
ISSN (Print) | 0302-9743 |
ISSN (electronic) | 1611-3349 |
Conference
Title | 26th International Conference on Human-Computer Interaction (HCII 2024) |
---|---|
Location | Washington Hilton Hotel |
Place | United States |
City | Washington |
Period | 29 June - 4 July 2024 |
Link(s)
Abstract
Computational video aesthetic prediction refers to using models that automatically evaluate the features of videos to produce their aesthetic scores. Current video aesthetic prediction models are designed based on bimodal frameworks. To address their limitations, we developed the Triple-Dimensional Multimodal Temporal Video Aesthetic neural network (TMTVA-net) model. The Long Short-Term Memory (LSTM) forms the conceptual foundation for the design framework. In the multimodal transformer layer, we employed two distinct transformers: the multimodal transformer and the feature transformer, enabling the acquisition of modality-specific patterns and representational features uniquely adapted to each modality. The fusion layer has also been redesigned to compute both pairwise interactions and overall interactions among the features. This study contributes to the video aesthetic prediction literature by considering the synergistic effects of textual, audio, and video features. This research presents a novel design framework that considers the combined effects of multimodal features. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
Research Area(s)
- Computational Video Aesthetic, Design Science, Multimodal Analysis, Neural Network
Citation Format(s)
A Computational Aesthetic Design Science Study on Online Video Based on Triple-Dimensional Multimodal Analysis. / Kang, Zhangguang; Nah, Fiona Fui-Hoon; Siau, Keng Leng.
HCI International 2024 – Late Breaking Papers: 26th International Conference on Human-Computer Interaction, HCII 2024, Washington, DC, USA, June 29 – July 4, 2024, Proceedings. ed. / Aaron Marcus; Elizabeth Rosenzweig; Marcelo M. Soares; Pei-Luen Patrick Rau; Abbas Moallem. Vol. Part VII Cham: Springer , 2025. p. 68-79 (Lecture Notes in Computer Science; Vol. 15380).
HCI International 2024 – Late Breaking Papers: 26th International Conference on Human-Computer Interaction, HCII 2024, Washington, DC, USA, June 29 – July 4, 2024, Proceedings. ed. / Aaron Marcus; Elizabeth Rosenzweig; Marcelo M. Soares; Pei-Luen Patrick Rau; Abbas Moallem. Vol. Part VII Cham: Springer , 2025. p. 68-79 (Lecture Notes in Computer Science; Vol. 15380).
Research output: Chapters, Conference Papers, Creative and Literary Works › RGC 32 - Refereed conference paper (with host publication) › peer-review