Algorithms for Searching MS Spectra against Protein Databases and Protein Sequencing Using Combined Top-down and Bottom-up Approach for Monoclonal Antibodies

Project: Research

View graph of relations

Description

In the last few years, great progresses have been made in Proteomics. Protein sequence information is essential for understanding their structures and functions. One of the basic tasks in mass spectrometry (MS)-based computational proteomics is protein identification by searching spectra in tandem mass spectrometry (MS/MS) against protein databases generated from gene sequences in genomes. Recently, the problem of identifying complex proteoforms has been attracting lots of attentions. A proteoform is a protein product containing various kinds of primary structure alterations caused by alternative splicing, posttranslational modifications, gene mutations, and other biological events. However, a more challenge computational problem is protein sequencing for novel proteins such as monoclonal antibodies, which have been used for the treatment of different types of cancers and graph-versus-host disease. The sequences of these proteins cannot be obtained from the genomes since the genome information is often limited or not available. In this project, we will design fast algorithm for searching MS Spectra against protein databases and protein sequencing using combined top-down and bottom-up approach for monoclonal antibodies. The top-down method has demonstrated unique advantages in understanding proteoform functions, discovering disease molecule signatures, and identifying possible drug targets. A major challenge in proteoform identification by database search is the combinatorial explosion of possible proteoforms resulting from combinations of sequence variations, post-translational modifications, and other molecular events. In this project, we will design fast algorithms for searching MS Spectra against protein proteoform databases. Monoclonal antibodies play important roles in therapeutic strategies due to their mechanisms of variations1. However, such kind of variations forces us to do de novo sequencing with no resembling proteins (for the variable regions) in the databases. The ‘top-down’’ MS provides a new approach for identifying complex proteoforms generated from posttranslational modifications and sequence variations that describe the analysis of intact proteins. For the traditional ‘‘bottom-up’ approaches, proteins are decomposed into peptides for MS analysis. This approach has some advantages, e.g., ease of analysis and not much limit for detection. However, the overlaps between different peptides may be too short and it will cause difficulty to figure out the whole protein sequence. The ‘top-down’’ MS provides an alternative approach for identifying complex proteoforms generated from posttranslational modifications and sequence variations that describe the analysis of intact proteins. Top-down tandem mass spectra cover whole proteins. Here we will design efficient and effective algorithms for monoclonal antibodies sequencing using combined top-down and bottom-up approach.   

Detail(s)

Project number9042965
Grant typeGRF
StatusNot started
Effective start/end date1/01/21 → …