A statistical normalization method and differential expression analysis for RNA-seq data between different species
Research output: Journal Publications and Reviews (RGC: 21, 22, 62) › 21_Publication in refereed journal › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Article number | 163 |
Journal / Publication | BMC Bioinformatics |
Volume | 20 |
Online published | 29 Mar 2019 |
Publication status | Published - 2019 |
Link(s)
DOI | DOI |
---|---|
Attachment(s) | Documents
Publisher's Copyright Statement
|
Link to Scopus | https://www.scopus.com/record/display.uri?eid=2-s2.0-85063790058&origin=recordpage |
Permanent Link | https://scholars.cityu.edu.hk/en/publications/publication(60977c44-31b7-4f79-87b9-38a9922c9822).html |
Abstract
Background: High-throughput techniques bring novel tools and also statistical challenges to genomic research. Identifying genes with differential expression between different species is an effective way to discover evolutionarily conserved transcriptional responses. To remove systematic variation between different species for a fair comparison, normalization serves as a crucial pre-processing step that adjusts for the varying sample sequencing depths and other confounding technical effects.
Results: In this paper, we propose a scale based normalization (SCBN) method by taking into account the available knowledge of conserved orthologous genes and by using the hypothesis testing framework. Considering the different gene lengths and unmapped genes between different species, we formulate the problem from the perspective of hypothesis testing and search for the optimal scaling factor that minimizes the deviation between the empirical and nominal type I errors.
Conclusions: Simulation studies show that the proposed method performs significantly better than the existing competitor in a wide range of settings. An RNA-seq dataset of different species is also analyzed and it coincides with the conclusion that the proposed method outperforms the existing method. For practical applications, we have also developed an R package named "SCBN", which is freely available at http://www.bioconductor.org/packages/devel/bioc/html/SCBN.html.
Results: In this paper, we propose a scale based normalization (SCBN) method by taking into account the available knowledge of conserved orthologous genes and by using the hypothesis testing framework. Considering the different gene lengths and unmapped genes between different species, we formulate the problem from the perspective of hypothesis testing and search for the optimal scaling factor that minimizes the deviation between the empirical and nominal type I errors.
Conclusions: Simulation studies show that the proposed method performs significantly better than the existing competitor in a wide range of settings. An RNA-seq dataset of different species is also analyzed and it coincides with the conclusion that the proposed method outperforms the existing method. For practical applications, we have also developed an R package named "SCBN", which is freely available at http://www.bioconductor.org/packages/devel/bioc/html/SCBN.html.
Research Area(s)
- Differential expression, Hypothesis test, Normalization, Orthologous genes, RNA-seq
Citation Format(s)
A statistical normalization method and differential expression analysis for RNA-seq data between different species. / Zhou, Yan; Zhu, Jiadi; Tong, Tiejun et al.
In: BMC Bioinformatics, Vol. 20, 163, 2019.Research output: Journal Publications and Reviews (RGC: 21, 22, 62) › 21_Publication in refereed journal › peer-review
Download Statistics
No data available