Abstract
Background Single-cell multi-omics (scMulti-omics) technologies have revolutionized our understanding of cellular functions and interactions by enabling the simultaneous measurement of diverse cellular modalities. Integrating these heterogeneous data types presents significant challenges due to differences in scale, resolution, and biological variability across the omics layers. Traditional computational methods often fail to reconcile these differences, leading to a loss of critical biological variability and subtle intermolecular interactions.
Methods To address these challenges, we have developed a single-cell multi-omics deep learning model (scMDCF) based on contrastive learning, tailored for the efficient characterization and integration of scMulti-omics data. scMDCF features a cross-modality contrastive learning module that harmonizes data representations across different omics types, ensuring consistency and preserving data heterogeneity by accommodating information entropy. Furthermore, a cross-modality feature fusion module extracts common low-dimensional latent representations of scMulti-omics data, effectively balancing the diverse characteristics of these data types.
Results Extensive empirical studies demonstrate that scMDCF outperforms existing state-of-the-art scMulti-omics models across various types of scMulti-omics data. In particular, scMDCF exhibits advanced analytical capabilities in extracting cell-type-specific peak-gene associations and cis-regulatory elements from SNARE-seq data, and in elucidating immune regulation from CITE-seq data. In a post-BNT162b2 mRNA SARS-CoV-2 vaccination dataset, scMDCF successfully annotates specific vaccine-induced B cell subpopulations, uncovering dynamic interactions and regulatory mechanisms within the immune system post-vaccination. Most importantly, using Alzheimer’s disease-specific data, scMDCF identifies computational minority Microglia and Endothelial cell populations, revealing ELF1 as a putative candidate transcription factor biomarker in Microglia, which potentially influences GTPase activity and may suppresses Alzheimer’s pathology.
Conclusions We propose scMDCF, a contrastive learning based framework for single-cell multi-omics integration that harmonizes cross-modality representations while preserving biological heterogeneity. Applications across diverse scMulti-omics datasets demonstrate improved clustering performance, effective batch-effect mitigation, and mechanistic insights into underlying biological processes. Code and reproducible workflows are openly available.
© The Author(s) 2025.
Methods To address these challenges, we have developed a single-cell multi-omics deep learning model (scMDCF) based on contrastive learning, tailored for the efficient characterization and integration of scMulti-omics data. scMDCF features a cross-modality contrastive learning module that harmonizes data representations across different omics types, ensuring consistency and preserving data heterogeneity by accommodating information entropy. Furthermore, a cross-modality feature fusion module extracts common low-dimensional latent representations of scMulti-omics data, effectively balancing the diverse characteristics of these data types.
Results Extensive empirical studies demonstrate that scMDCF outperforms existing state-of-the-art scMulti-omics models across various types of scMulti-omics data. In particular, scMDCF exhibits advanced analytical capabilities in extracting cell-type-specific peak-gene associations and cis-regulatory elements from SNARE-seq data, and in elucidating immune regulation from CITE-seq data. In a post-BNT162b2 mRNA SARS-CoV-2 vaccination dataset, scMDCF successfully annotates specific vaccine-induced B cell subpopulations, uncovering dynamic interactions and regulatory mechanisms within the immune system post-vaccination. Most importantly, using Alzheimer’s disease-specific data, scMDCF identifies computational minority Microglia and Endothelial cell populations, revealing ELF1 as a putative candidate transcription factor biomarker in Microglia, which potentially influences GTPase activity and may suppresses Alzheimer’s pathology.
Conclusions We propose scMDCF, a contrastive learning based framework for single-cell multi-omics integration that harmonizes cross-modality representations while preserving biological heterogeneity. Applications across diverse scMulti-omics datasets demonstrate improved clustering performance, effective batch-effect mitigation, and mechanistic insights into underlying biological processes. Code and reproducible workflows are openly available.
© The Author(s) 2025.
| Original language | English |
|---|---|
| Article number | 10 |
| Number of pages | 33 |
| Journal | Genome Medicine |
| Volume | 18 |
| Online published | 26 Jan 2026 |
| DOIs | |
| Publication status | Published - 2026 |
Funding
The work described in this paper was substantially supported by the National Natural Science Foundation of China under Grant No. 62472195 (X.L.).
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
Research Keywords
- Contrastive learning
- ScMulti-omics integration and clustering
- Single-cell multi-omics
Publisher's Copyright Statement
- This full text is made available under CC-BY-NC-ND 4.0. https://creativecommons.org/licenses/by-nc-nd/4.0/
Fingerprint
Dive into the research topics of 'Aligned cross-modal integration and regulatory heterogeneity characterization of single-cell multiomic data with deep contrastive learning'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver