Projects per year
Abstract
Motivation: Proteoforms are the different forms of a proteins generated from the genome with various sequence variations, splice isoforms, and post-translational modifications. Proteoforms regulate protein structures and functions. A single protein can have multiple proteoforms due to different modification sites. Proteoform identification is to find proteoforms of a given protein that best fits the input spectrum. Proteoform quantification is to find the corresponding abundances of different proteoforms for a specific protein.
Results: We proposed algorithms for proteoform identification and quantification based on the top-down tandem mass spectrum. In the combination alignments of the HomMTM spectrum and the reference protein, we need to give a correction of the mass for each matched peak within the pre-defined error range. After the correction, we impose that the mass between any two (not necessarily consecutive) matched nodes in the protein is identical to that of the corresponding two matched peaks in the HomMTM spectrum. We design a back-tracking graph to store such kind of information and find a combinatorial path (k paths) with the minimum sum of peak intensity error in this back-tracking graph. The obtained alignment can also show the relative abundance of these proteoforms (paths). Our experimental results demonstrate the algorithm’s capability to identify and quantify proteoform combinations encompassing a greater number of peaks. This advancement holds promise for enhancing the accuracy and comprehensiveness of proteoform quantification, addressing a crucial need in the field of top-down MS-based proteomics.
Availability and implementation: The software package are available at https://github.com/Zeirdo/TopMGQuant.
© The Author(s) 2025. Published by Oxford University Press.
Results: We proposed algorithms for proteoform identification and quantification based on the top-down tandem mass spectrum. In the combination alignments of the HomMTM spectrum and the reference protein, we need to give a correction of the mass for each matched peak within the pre-defined error range. After the correction, we impose that the mass between any two (not necessarily consecutive) matched nodes in the protein is identical to that of the corresponding two matched peaks in the HomMTM spectrum. We design a back-tracking graph to store such kind of information and find a combinatorial path (k paths) with the minimum sum of peak intensity error in this back-tracking graph. The obtained alignment can also show the relative abundance of these proteoforms (paths). Our experimental results demonstrate the algorithm’s capability to identify and quantify proteoform combinations encompassing a greater number of peaks. This advancement holds promise for enhancing the accuracy and comprehensiveness of proteoform quantification, addressing a crucial need in the field of top-down MS-based proteomics.
Availability and implementation: The software package are available at https://github.com/Zeirdo/TopMGQuant.
© The Author(s) 2025. Published by Oxford University Press.
Original language | English |
---|---|
Article number | btaf007 |
Journal | Bioinformatics |
Volume | 41 |
Issue number | 1 |
Online published | 9 Jan 2025 |
DOIs | |
Publication status | Published - Jan 2025 |
Funding
This work was fully supported by funds from the National Science Foundation (NSF: 61972329), GRF Grants for Hong Kong Special Administrative Region, P. R. China (CityU 11210119, and CityU 11206120) and a Strategic Research Grant from City University of Hong Kong (Project Number 7005851).
Publisher's Copyright Statement
- This full text is made available under CC-BY 4.0. https://creativecommons.org/licenses/by/4.0/
Fingerprint
Dive into the research topics of 'Proteoform identification and quantification based on alignment graphs'. Together they form a unique fingerprint.Projects
- 2 Finished
-
GRF: Algorithms for Searching MS Spectra against Protein Databases and Protein Sequencing Using Combined Top-down and Bottom-up Approach for Monoclonal Antibodies
WANG, L. (Principal Investigator / Project Coordinator)
1/01/21 → 13/06/25
Project: Research
-
GRF: Efficient Algorithms for Identification of Modified Proteoforms Using Top-down Mass Spectra
WANG, L. (Principal Investigator / Project Coordinator) & Liu, X. (Co-Investigator)
1/01/20 → 5/06/24
Project: Research