Project Details
Description
Algorithms for Proteoform Detection and Multiplexed Protein Sequencing The problem of identifying complex proteoforms is a challenging problem in proteomics. A proteoform is a protein product containing various kinds of primary structure alterations caused by different reasons such as gene mutations, alternative splicing, posttranslational modifications, and other biological activities. Proteoform identification is the first step towards understanding the functions of proteoforms, discovering molecule signatures for disease diagnosis, and identifying possible drug targets, etc. Recent developments in MS instrumentation and protein separation make it possible for proteome-wide analysis of complex proteoforms. In this project, we will design efficient algorithms for proteoform detection that takes a top-down MS spectrum as input and search a database with proteins and their proteoforms. We emphasize peak error corrections. Multiplexed Proteins such as polyclonal antibodies play important roles in therapeutic strategies. We need to do de novo sequencing with no resembling proteins (for the variable regions) in the databases. Multiplexed Proteins de novo sequencing is a challenge problem. In this project, we propose to use both top-down and bottom-up mass spectra to do de nova sequencing for multiplexed proteins, e.g., polyclonal antibodies. To deal with multiplexed proteins, the difficulty is that the top-down spectra are from mixtures and may contain multiple proteins. Peptides should be classified into different groups. Each group of peptides form a path on the top-down spectrum to form a protein. Thus, we need to consider a new model to represent the top-down spectrum. Here we propose to use a polyclonal antibodies graph to represent a top-down spectrum. We then design algorithms to align peptides with the polyclonal antibodies graph. After that, we will identify several paths on the polyclonal antibodies graph. Finally, we report the sequences corresponding to those identified paths. Note that an antibody contains a heavy chain and a light chain. Our target is to output the sequences for both heavy and light chains. It is well-known that identification of the association of heavy chains and light chains for polyclonal antibodies is challenge and it needs other experimental method to provide more information. We do not work on this point. In this project, we will propose a new mathematical model to represent the top-down mass spectrum and design efficient algorithms to align peptides obtained from bottom-up against the top-down spectrum. After aligning all peptides to the top-down spectrum, we can find long and heavy paths on the graph and report the sequences of candidate antibodies. Here we emphasize peak error correction for mass spectra when doing alignment. The algorithms developed provides efficient methods for spectra alignment and can also be used for other related tasks.
| Project number | 9043557 |
|---|---|
| Grant type | GRF |
| Status | Active |
| Effective start/end date | 1/01/24 → … |
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.
Research output
-
Approximately covering vertices by order-5 or longer paths
Gong, M., Chen, Z.-Z., Lin, G. & Wang, L., Mar 2026, In: Journal of Computer and System Sciences. 156, 103704.Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Open AccessFile3 Downloads (CityUHK Scholars) -
Approximation algorithms for the maximum path cover problem using long paths
Gong, M., Chen, Y., Chen, Z.-Z., Lin, G., Su, B. & Wang, L., Nov 2025, In: Information and Computation. 307, 19 p., 105378.Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Open AccessFile15 Downloads (CityUHK Scholars) -
Finding coexisting combinations of posttranslational modifications with HomMTM spectra
Li, K. & Wang, L., Nov 2025, In: Briefings in Bioinformatics. 26, 6, 12 p., bbaf653.Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Open AccessFile3 Downloads (CityUHK Scholars)