Resolving single-cell copy number profiling for large datasets

Ruohan Wang, Yuwei Zhang, Mengbo Wang, Xikang Feng, Jianping Wang*, Shuai Cheng Li*

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

The advances of single-cell DNA sequencing (scDNA-seq) enable us to characterize the genetic heterogeneity of cancer cells. However, the high noise and low coverage of scDNA-seq impede the estimation of copy number variations (CNVs). In addition, existing tools suffer from intensive execution time and often fail on large datasets. Here, we propose SeCNV, an efficient method that leverages structural entropy, to profile the copy numbers. SeCNV adopts a local Gaussian kernel to construct a matrix, depth congruent map (DCM), capturing the similarities between any two bins along the genome. Then, SeCNV partitions the genome into segments by minimizing the structural entropy from the DCM. With the partition, SeCNV estimates the copy numbers within each segment for cells. We simulate nine datasets with various breakpoint distributions and amplitudes of noise to benchmark SeCNV. SeCNV achieves a robust performance, i.e. the F1-scores are higher than 0.95 for breakpoint detections, significantly outperforming state-of-the-art methods. SeCNV successfully processes large datasets (>50 000 cells) within 4 min, while other tools fail to finish within the time limit, i.e. 120 h. We apply SeCNV to single-nucleus sequencing datasets from two breast cancer patients and acoustic cell tagmentation sequencing datasets from eight breast cancer patients. SeCNV successfully reproduces the distinct subclones and infers tumor heterogeneity. SeCNV is available at https://github.com/deepomicslab/SeCNV.
Original languageEnglish
Article numberbbac264
JournalBriefings in Bioinformatics
Volume23
Issue number4
Online published8 Jul 2022
DOIs
Publication statusPublished - Jul 2022

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 3 - Good Health and Well-being
    SDG 3 Good Health and Well-being

Research Keywords

  • copy number variation
  • cross-sample breakpoint detection
  • single-cell sequencing
  • structural information theory

Fingerprint

Dive into the research topics of 'Resolving single-cell copy number profiling for large datasets'. Together they form a unique fingerprint.

Cite this