Algorithms for Proteoform Detection and Multiplexed Protein Sequencing

Project: Research

View graph of relations


Algorithms for Proteoform Detection and Multiplexed Protein Sequencing The problem of identifying complex proteoforms is a challenging problem in proteomics. A proteoform is a protein product containing various kinds of primary structure alterations caused by different reasons such as gene mutations, alternative splicing, posttranslational modifications, and other biological activities. Proteoform identification is the first step towards understanding the functions of proteoforms, discovering molecule signatures for disease diagnosis, and identifying possible drug targets, etc. Recent developments in MS instrumentation and protein separation make it possible for proteome-wide analysis of complex proteoforms. In this project, we will design efficient algorithms for proteoform detection that takes a top-down MS spectrum as input and search a database with proteins and their proteoforms. We emphasize peak error corrections. Multiplexed Proteins such as polyclonal antibodies play important roles in therapeutic strategies. We need to do de novo sequencing with no resembling proteins (for the variable regions) in the databases. Multiplexed Proteins de novo sequencing is a challenge problem. In this project, we propose to use both top-down and bottom-up mass spectra to do de nova sequencing for multiplexed proteins, e.g., polyclonal antibodies. To deal with multiplexed proteins, the difficulty is that the top-down spectra are from mixtures and may contain multiple proteins. Peptides should be classified into different groups. Each group of peptides form a path on the top-down spectrum to form a protein. Thus, we need to consider a new model to represent the top-down spectrum. Here we propose to use a polyclonal antibodies graph to represent a top-down spectrum. We then design algorithms to align peptides with the polyclonal antibodies graph. After that, we will identify several paths on the polyclonal antibodies graph. Finally, we report the sequences corresponding to those identified paths. Note that an antibody contains a heavy chain and a light chain. Our target is to output the sequences for both heavy and light chains. It is well-known that identification of the association of heavy chains and light chains for polyclonal antibodies is challenge and it needs other experimental method to provide more information. We do not work on this point. In this project, we will propose a new mathematical model to represent the top-down mass spectrum and design efficient algorithms to align peptides obtained from bottom-up against the top-down spectrum. After aligning all peptides to the top-down spectrum, we can find long and heavy paths on the graph and report the sequences of candidate antibodies. Here we emphasize peak error correction for mass spectra when doing alignment. The algorithms developed provides efficient methods for spectra alignment and can also be used for other related tasks.


Project number9043557
Grant typeGRF
StatusNot started
Effective start/end date1/01/24 → …