Algorithms for Identification and Quantification of Proteoforms Using Multiplexed Tandem Mass Spectra and De Novo Sequencing of Mixture Spectra

Proteomics has been developed rapidly in recent years. The problem of identifying complex proteoforms is a challenging problem in this area. A proteoform is a protein product containing various kinds of primary structure alterations caused by gene mutations, alternative splicing, posttranslational modifications, and other biological activities. Proteoform identification is the first step towards understanding the functions of proteoforms, discovering molecule signatures for disease diagnosis [11-15], and identifying possible drug targets [16, 17], etc. The top-down mass spectrometry approach has unique advantages in analyzing proteoforms with multiple alterations, including proteolytic digestions, sequence variations, and post-translational modifications (PTMs) since it directly analyzes intact proteoforms instead of short peptides [16, 18-21]. Recent developments in MS instrumentation and protein separation make it possible for proteome-wide analysis of complex proteoforms [4-9, 22-24]. A survey is given in [10]. In this project, we will study some computational problems rise in top-down multiplexed tandem mass (MTM) spectra analysis. Those problems include protein proteoform database search for multiplexed tandem mass (MTM) spectra, and proteoform quantification using top-down multiplexed tandem mass spectra. Constructing multiple peptides from mixture spectra (bottom-up approach) is a challenging problem from computational point of view. This is the first step to solve the combined top-down and bottom up De Novo sequencing for multiple proteoforms problem, where we use both top-down and bottom-up spectra from multiple proteoforms to identify multiple proteoform sequences.  In this project, we will also design algorithms for De Novo sequencing of mixture spectra (bottom-up approach). The solutions for the De Novo sequencing of mixture spectra problem will provide a base to attack the more complicated problem, the combined top-down and bottom up De Novo sequencing for multiple proteoforms.  


Project number9043176
Grant typeGRF
StatusNot started
Effective start/end date1/01/22 → …