Skip to main navigation Skip to search Skip to main content

ERL-MR: Harnessing the Power of Euler Feature Representations for Balanced Multi-modal Learning

Weixiang Han, Chengjun Cai, Yu Guo, Jialiang Peng*

*Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

Multi-modal learning leverages data from diverse perceptual media to obtain enriched representations, thereby empowering machine learning models to complete more complex tasks. However, recent research results indicate that multi-modal learning still suffers from "modality imbalance '': Certain modalities' contributions are suppressed by dominant ones, consequently constraining the overall performance enhancement of multimodal learning. To tackle this issue, current approaches attempt to mitigate modality competition in various ways, but their effectiveness is still limited. To this end, we propose an Euler Representation Learning-based Modality Rebalance (ERL-MR) strategy, which reshapes the underlying competitive relationships between modalities into mutually reinforcing win-win situations while maintaining stable feature optimization directions. Specifically, ERL-MR employs Euler's formula to map original features to complex space, constructing cooperatively enhanced non-redundant features for each modality, which helps reverse the situation of modality competition. Moreover, to counteract the performance degradation resulting from optimization drift among modalities, we propose a Multi-Modal Constrained (MMC) loss based on cosine similarity of complex feature phase and cross-entropy loss of individual modalities, guiding the optimization direction of the fusion network. Extensive experiments conducted on four multi-modal multimedia datasets and two task-specific multi-modal multimedia datasets demonstrate the superiority of our ERL-MR strategy over state-of-the-art baselines, achieving modality rebalancing and further performance improvements. © 2024 ACM.
Original languageEnglish
Title of host publicationMM'24
Subtitle of host publicationProceedings of the 32nd ACM International Conference on Multimedia
PublisherAssociation for Computing Machinery
Pages4591-4600
ISBN (Print)979-8-4007-0686-8
DOIs
Publication statusPublished - 2024
Event32nd ACM International Conference on Multimedia (MM 2024) - Melbourne, Australia
Duration: 28 Oct 20241 Nov 2024
https://2024.acmmm.org/

Publication series

NameMM - Proceedings of the ACM International Conference on Multimedia

Conference

Conference32nd ACM International Conference on Multimedia (MM 2024)
Abbreviated titleACM MM’24
PlaceAustralia
CityMelbourne
Period28/10/241/11/24
Internet address

Research Keywords

  • euler formula
  • modality imbalance
  • multi-modal constrained loss
  • multi-modal learning

Fingerprint

Dive into the research topics of 'ERL-MR: Harnessing the Power of Euler Feature Representations for Balanced Multi-modal Learning'. Together they form a unique fingerprint.

Cite this