Abstract
Portfolio choice is one of the most important topics in finance. Ample studies apply deep learning methods to it. However, most of the proposed models may suffer from the following main challenges: (i) The scarcity of historical data in financial markets, renders deep learning methods much less robust. (ii) Market-based time-series instances such as price data may share high similarities most of the time, making it challenging to extract discriminative representations. In this paper, we tackle these challenges within a coherent deep-learning model: Contrastively Aligned Cross-modal (Cross)-Attention Model (Caracal) for portfolio selection. By introducing multi-modal data, we contrastively extract and fuse cross-modal information and attend to both data-driven and real-world spatio-temporal relations with respect to an objective loss that considers the entire joint distribution of asset returns. Specifically, we propose an Inter-Modal Contrastive Fusion Module to build connections among features of highly correlated multi-modal pairs. This allows the model to find and align temporal-related text-numeric pairs, and then exchange cross-modal information and fuse cross-modal features using the attention mechanism. We further design an Inner-Modal Contrastive Learning Module to guide the feature enhancement by simultaneously learning coarse-grained and fine-grained temporal feature representations from the similarity information between instances in each modality. Extensive experiments on three real-world datasets show that our proposed model generates superior investment performance in comparison to other state-of-the-art models. We also conduct rich ablation experiments to justify the effectiveness of each module. © 2025 Elsevier Ltd.
| Original language | English |
|---|---|
| Article number | 112305 |
| Number of pages | 15 |
| Journal | Pattern Recognition |
| Volume | 171 |
| Issue number | Part B |
| Online published | 21 Aug 2025 |
| DOIs | |
| Publication status | Published - Mar 2026 |
| Externally published | Yes |
Funding
This work was supported by National Key R&D Program of China (No. 2022YFB4500600), National Natural Science Foundation of China (NSFC) 62272172.
Research Keywords
- Contrastive learning
- Cross-modal attention
- Multi-modal learning
- Portfolio selection
Fingerprint
Dive into the research topics of 'Deep portfolio selection with contrastively aligned cross-modal attention'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver