Skip to main navigation Skip to search Skip to main content

Aligned cross-modal integration and regulatory heterogeneity characterization of single-cell multiomic data with deep contrastive learning

Yue Cheng, Yanchi Su, Yi Fan, Yuning Yang, Xingjian Chen, Fuzhou Wang, Ka-Chun Wong, Xiangtao Li*

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

Background Single-cell multi-omics (scMulti-omics) technologies have revolutionized our understanding of cellular functions and interactions by enabling the simultaneous measurement of diverse cellular modalities. Integrating these heterogeneous data types presents significant challenges due to differences in scale, resolution, and biological variability across the omics layers. Traditional computational methods often fail to reconcile these differences, leading to a loss of critical biological variability and subtle intermolecular interactions.
Methods To address these challenges, we have developed a single-cell multi-omics deep learning model (scMDCF) based on contrastive learning, tailored for the efficient characterization and integration of scMulti-omics data. scMDCF features a cross-modality contrastive learning module that harmonizes data representations across different omics types, ensuring consistency and preserving data heterogeneity by accommodating information entropy. Furthermore, a cross-modality feature fusion module extracts common low-dimensional latent representations of scMulti-omics data, effectively balancing the diverse characteristics of these data types.
Results Extensive empirical studies demonstrate that scMDCF outperforms existing state-of-the-art scMulti-omics models across various types of scMulti-omics data. In particular, scMDCF exhibits advanced analytical capabilities in extracting cell-type-specific peak-gene associations and cis-regulatory elements from SNARE-seq data, and in elucidating immune regulation from CITE-seq data. In a post-BNT162b2 mRNA SARS-CoV-2 vaccination dataset, scMDCF successfully annotates specific vaccine-induced B cell subpopulations, uncovering dynamic interactions and regulatory mechanisms within the immune system post-vaccination. Most importantly, using Alzheimer’s disease-specific data, scMDCF identifies computational minority Microglia and Endothelial cell populations, revealing ELF1 as a putative candidate transcription factor biomarker in Microglia, which potentially influences GTPase activity and may suppresses Alzheimer’s pathology.
Conclusions We propose scMDCF, a contrastive learning based framework for single-cell multi-omics integration that harmonizes cross-modality representations while preserving biological heterogeneity. Applications across diverse scMulti-omics datasets demonstrate improved clustering performance, effective batch-effect mitigation, and mechanistic insights into underlying biological processes. Code and reproducible workflows are openly available.
© The Author(s) 2025.
Original languageEnglish
Article number10
Number of pages33
JournalGenome Medicine
Volume18
Online published26 Jan 2026
DOIs
Publication statusPublished - 2026

Funding

The work described in this paper was substantially supported by the National Natural Science Foundation of China under Grant No. 62472195 (X.L.).

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Research Keywords

  • Contrastive learning
  • ScMulti-omics integration and clustering
  • Single-cell multi-omics

Publisher's Copyright Statement

  • This full text is made available under CC-BY-NC-ND 4.0. https://creativecommons.org/licenses/by-nc-nd/4.0/

Fingerprint

Dive into the research topics of 'Aligned cross-modal integration and regulatory heterogeneity characterization of single-cell multiomic data with deep contrastive learning'. Together they form a unique fingerprint.

Cite this