Abstract
Multi-modal learning leverages data from diverse perceptual media to obtain enriched representations, thereby empowering machine learning models to complete more complex tasks. However, recent research results indicate that multi-modal learning still suffers from "modality imbalance '': Certain modalities' contributions are suppressed by dominant ones, consequently constraining the overall performance enhancement of multimodal learning. To tackle this issue, current approaches attempt to mitigate modality competition in various ways, but their effectiveness is still limited. To this end, we propose an Euler Representation Learning-based Modality Rebalance (ERL-MR) strategy, which reshapes the underlying competitive relationships between modalities into mutually reinforcing win-win situations while maintaining stable feature optimization directions. Specifically, ERL-MR employs Euler's formula to map original features to complex space, constructing cooperatively enhanced non-redundant features for each modality, which helps reverse the situation of modality competition. Moreover, to counteract the performance degradation resulting from optimization drift among modalities, we propose a Multi-Modal Constrained (MMC) loss based on cosine similarity of complex feature phase and cross-entropy loss of individual modalities, guiding the optimization direction of the fusion network. Extensive experiments conducted on four multi-modal multimedia datasets and two task-specific multi-modal multimedia datasets demonstrate the superiority of our ERL-MR strategy over state-of-the-art baselines, achieving modality rebalancing and further performance improvements. © 2024 ACM.
| Original language | English |
|---|---|
| Title of host publication | MM'24 |
| Subtitle of host publication | Proceedings of the 32nd ACM International Conference on Multimedia |
| Publisher | Association for Computing Machinery |
| Pages | 4591-4600 |
| ISBN (Print) | 979-8-4007-0686-8 |
| DOIs | |
| Publication status | Published - 2024 |
| Event | 32nd ACM International Conference on Multimedia (MM 2024) - Melbourne, Australia Duration: 28 Oct 2024 → 1 Nov 2024 https://2024.acmmm.org/ |
Publication series
| Name | MM - Proceedings of the ACM International Conference on Multimedia |
|---|
Conference
| Conference | 32nd ACM International Conference on Multimedia (MM 2024) |
|---|---|
| Abbreviated title | ACM MM’24 |
| Place | Australia |
| City | Melbourne |
| Period | 28/10/24 → 1/11/24 |
| Internet address |
Research Keywords
- euler formula
- modality imbalance
- multi-modal constrained loss
- multi-modal learning
Fingerprint
Dive into the research topics of 'ERL-MR: Harnessing the Power of Euler Feature Representations for Balanced Multi-modal Learning'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver