Abstract
Nontrivial protein topology has the potential to revolutionize protein engineering by enabling the manipulation of proteins' stability and dynamics. However, the rarity of topological proteins in nature poses a challenge for their design, synthesis and application, primarily due to the limited number of available entangling motifs as synthetic templates. Discovering these motifs is particularly difficult, as entanglement is a subtle structural feature that is not readily discernible from protein sequences. In this study, we developed a streamlined workflow enabling efficient and accurate identification of structurally reliable and applicable entangling motifs from protein sequences. Through this workflow, we automatically curated a database of 1115 entangling protein motifs from over 100 thousand sequences in the UniProt Knowledgebase. In our database, 73.3% of C2 entangling motifs and 80.1% of C3 entangling motifs exhibited low structural similarity to known protein structures. The entangled structures in the database were categorized into different groups and their functional and biological significance were analyzed. The results were summarized in an online database accessible through a user-friendly web platform, providing researchers with an expanded toolbox of entangling motifs. This resource is poised to significantly advance the field of protein topology engineering and inspire new research directions in protein design and application.
© 2025 The Author(s). Published by the Royal Society of Chemistry
© 2025 The Author(s). Published by the Royal Society of Chemistry
Original language | English |
---|---|
Journal | Chemical Science |
Online published | 31 Mar 2025 |
DOIs | |
Publication status | Online published - 31 Mar 2025 |
Funding
This work was supported by the HKUST Start-up Fund, Hong Kong RGC Early Career Scheme [Project Number: 26214522], the National Key R&D Program of China [No. 2020YFA0908100 and 2023YFF1204401], the Shenzhen Medical Research Fund [No. B2302037], the National Natural Science Foundation of China [No. 22331003, 21991132, 21925102, 92056118, 22101010, 22201017, and 22201016], and the Beijing National Laboratory for Molecular Sciences [BNLMS-CXXM-202006].