Skip to main navigation Skip to search Skip to main content

Accurate sequence‐based prediction of deleterious nsSNPs with multiple sequence profiles and putative binding residues

  • Ruiyang Song
  • , Baixin Cao
  • , Zhenling Peng
  • , Christopher J. Oldfield
  • , Lukasz Kurgan*
  • , Ka-Chun Wong
  • , Jianyi Yang*
  • *Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

42 Downloads (CityUHK Scholars)

Abstract

Non‐synonymous single nucleotide polymorphisms (nsSNPs) may result in pathogenic changes that are associated with human diseases. Accurate prediction of these deleterious nsSNPs is in high demand. The existing predictors of deleterious nsSNPs secure modest levels of predictive performance, leaving room for improvements. We propose a new sequence‐based predictor, DMBS, which addresses the need to improve the predictive quality. The design of DMBS relies on the observation that the deleterious mutations are likely to occur at the highly conserved and functionally important positions in the protein sequence. Correspondingly, we introduce two innovative components. First, we improve the estimates of the conservation computed from the multiple sequence profiles based on two complementary databases and two complementary alignment algorithms. Second, we utilize putative annotations of functional/binding residues produced by two state‐of-the‐art sequence‐based methods. These inputs are processed by a random forests model that provides favorable predictive performance when empirically compared against five other machine-learning algorithms. Empirical results on four benchmark datasets reveal that DMBS achieves AUC > 0.94, outperforming current methods, including protein structure‐based approaches. In particular, DMBS secures AUC = 0.97 for the SNPdbe and ExoVar datasets, compared to AUC = 0.70 and 0.88, respectively, that were obtained by the best available methods. Further tests on the independent HumVar dataset shows that our method significantly outperforms the state‐of‐the‐art method SNPdryad. We conclude that DMBS provides accurate predictions that can effectively guide wet‐lab experiments in a high‐throughput manner.
Original languageEnglish
Article number1337
JournalBiomolecules
Volume11
Issue number9
Online published9 Sept 2021
DOIs
Publication statusPublished - Sept 2021

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 15 - Life on Land
    SDG 15 Life on Land

Research Keywords

  • Binding site
  • Mutation
  • Sequence profile

Publisher's Copyright Statement

  • This full text is made available under CC-BY 4.0. https://creativecommons.org/licenses/by/4.0/

Fingerprint

Dive into the research topics of 'Accurate sequence‐based prediction of deleterious nsSNPs with multiple sequence profiles and putative binding residues'. Together they form a unique fingerprint.

Cite this