Skip to main navigation Skip to search Skip to main content

scEFSC: Accurate single-cell RNA-seq data analysis via ensemble consensus clustering based on multiple feature selections

Chuang Bian, Xubin Wang, Yanchi Su, Yunhe Wang*, Ka-chun Wong, Xiangtao Li*

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

61 Downloads (CityUHK Scholars)

Abstract

With the development of next-generation sequencing technologies, single-cell RNA sequencing (scRNA-seq) has become one indispensable tool to reveal the wide heterogeneity between cells. Clustering is a fundamental task in this analysis to disclose the transcriptomic profiles of single cells and is one of the key computational problems that has received widespread attention. Recently, many clustering algorithms have been developed for the scRNA-seq data. Nevertheless, the computational models often suffer from realistic restrictions such as numerical instability, high dimensionality and computational scalability. Moreover, the accumulating cell numbers and high dropout rates bring a huge computational challenge to the analysis. To address these limitations, we first provide a systematic and extensive performance evaluation of four feature selection methods and nine scRNA-seq clustering algorithms on fourteen real single-cell RNA-seq datasets. Based on this, we then propose an accurate single-cell data analysis via Ensemble Feature Selection based Clustering, called scEFSC. Indeed, the algorithm employs several unsupervised feature selections to remove genes that do not contribute significantly to the scRNA-seq data. After that, different single-cell RNA-seq clustering algorithms are proposed to cluster the data filtered by multiple unsupervised feature selections, and then the clustering results are combined using weighted-based meta-clustering. We applied scEFSC to the fourteen real single-cell RNA-seq datasets and the experimental results demonstrated that our proposed scEFSC outperformed the other scRNA-seq clustering algorithms with several evaluation metrics. In addition, we established the biological interpretability of scEFSC by carrying out differential gene expression analysis, gene ontology enrichment and KEGG analysis. scEFSC is available at https://github.com/Conan-Bian/scEFSC.
Original languageEnglish
Pages (from-to)2181-2197
JournalComputational and Structural Biotechnology Journal
Volume20
Online published27 Apr 2022
DOIs
Publication statusPublished - 2022

Funding

The work described in this paper was substantially supported by the National Natural Science Foundation of China under Grant No. 62076109, and also funded by “the Fundamental Research Funds for the Central Universities”. The work described in this paper was substantially supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region [CityU 11200218], a grant from the Health and Medical Research Fund, of the Food and Health Bureau, The Government of the Hong Kong Special Administrative Region [07181426], and the funding from the Hong Kong Institute for Data Science (HKIDS) at the City University of Hong Kong. The work described in this paper was partially supported by two grants from the City University of Hong Kong (CityU 11202219, CityU 11203520). This research was substantially sponsored by a research project (Grant No. 32000464) from the National Natural Science Foundation of China and was substantially supported by the Shenzhen Research Institute, City University of Hong Kong.

Research Keywords

  • Consensus clustering
  • Feature selection
  • scEFSC
  • scRNA-seq

Publisher's Copyright Statement

  • This full text is made available under CC-BY-NC-ND 4.0. https://creativecommons.org/licenses/by-nc-nd/4.0/

RGC Funding Information

  • RGC-funded

Fingerprint

Dive into the research topics of 'scEFSC: Accurate single-cell RNA-seq data analysis via ensemble consensus clustering based on multiple feature selections'. Together they form a unique fingerprint.

Cite this