Synergizing CRISPR/Cas9 off-target predictions for ensemble insights and practical applications

Research output: Journal Publications and Reviews (RGC: 21, 22, 62)21_Publication in refereed journalpeer-review

13 Scopus Citations
View graph of relations

Author(s)

Related Research Unit(s)

Detail(s)

Original languageEnglish
Pages (from-to)1108-1115
Journal / PublicationBioinformatics
Volume35
Issue number7
Online published31 Aug 2018
Publication statusPublished - 1 Apr 2019

Abstract

Motivation: The RNA-guided CRISPR/Cas9 system has been widely applied to genome editing. CRISPR/Cas9 system can effectively edit the on-target genes. Nonetheless, it has recently been demonstrated that many homologous off-target genomic sequences could be mutated, leading to unexpected gene-editing outcomes. Therefore, a plethora of tools were proposed for the prediction of off-target activities of CRISPR/Cas9. Nonetheless, each computational tool has its own advantages and drawbacks under diverse conditions. It is hardly believed that a single tool is optimal for all conditions. Hence, we would like to explore the ensemble learning potential on synergizing multiple tools with genomic annotations together to enhance its predictive abilities.
Results: We proposed an ensemble learning framework which synergizes multiple tools together to predict the off-target activities of CRISPR/Cas9 in different combinations. Interestingly, the ensemble learning using AdaBoost outperformed other individual off-target predictive tools. We also investigated the effect of evolutionary conservation (PhyloP and PhastCons) and chromatin annotations (ChromHMM and Segway) and found that only PhyloP can enhance the predictive capabilities further. Case studies are conducted to reveal ensemble insights into the off-target predictions, demonstrating how the current study can be applied in different genomic contexts. The best prediction predicted by AdaBoost is up to 0.9383 (AUC) and 0.2998 (PRC) that outperforms other classifiers. This is ascribable to the fact that AdaBoost introduces a new weak classifier (i.e. decision stump) in each iteration to learn the DNA sequences that were misclassified as off-targets until a small error rate is reached iteratively.
Availability and implementation: The source codes are freely available on GitHub at https://github.com/Alexzsx/CRISPR.Supplementary information: Supplementary data are available at Bioinformatics online.