Skip to main navigation Skip to search Skip to main content

Towards large-scale chemical reaction image parsing via a multimodal large language model

Yufan Chen, Ching Ting Leung, Jianwei Sun, Yong Huang, Linyan Li, Hao Chen, Hanyu Gao*

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

2 Downloads (CityUHK Scholars)

Abstract

Artificial intelligence (AI) has demonstrated significant promise in advancing organic chemistry research; however, its effectiveness depends on the availability of high-quality chemical reaction data. Currently, most published chemical reactions are not available in machine-readable form, limiting the broader application of AI in this field. The extraction of published chemical reactions into structured databases still relies heavily on manual curation, and robust automatic parsing of chemical reaction images into machine-readable data remains a significant challenge. To address this, we introduce the Reaction Image Multimodal large language model (RxnIM), the first multimodal large language model specifically designed to parse chemical reaction images into machine-readable reaction data. RxnIM not only extracts key chemical components from reaction images but also interprets the textual content that describes reaction conditions. Together with a specially designed large-scale dataset generation method to support model training, our approach achieves excellent performance, with an average F1 score of 88% on various benchmarks, surpassing state-of-the-art methods by an average of 5%. This represents a crucial step toward the automatic construction of large databases of machine-readable reaction data parsed from images in the chemistry literature, providing essential data resources for AI research in chemistry. The source code, model checkpoints, and datasets developed in this work are released under permissive licenses.

© 2025 The Author(s). Published by the Royal Society of Chemistry
Original languageEnglish
Pages (from-to)21464-21474
Number of pages11
JournalChemical Science
Volume16
Issue number45
Online published7 Oct 2025
DOIs
Publication statusPublished - 7 Dec 2025

Funding

We thank the Information Technology Services Center (ITSC) in HKUST for providing the HPC3 and SuperPod Cluster as our computational resources. This work is financially supported by the Hong Kong University of Science and Technology, and the Hong Kong Research Grants Council Early Career Scheme (26214522).

Publisher's Copyright Statement

  • This full text is made available under CC-BY 3.0. https://creativecommons.org/licenses/by/3.0/

RGC Funding Information

  • RGC-funded

Fingerprint

Dive into the research topics of 'Towards large-scale chemical reaction image parsing via a multimodal large language model'. Together they form a unique fingerprint.

Cite this