Abstract
Scientific publications, especially biomedical publications, contain a large number of compound figures, which are composed of multiple graphs, plots, and drawings. With the growing interest in data mining, scientific image understanding, and retrieval, compound figure separation and label recognition have become vital steps for various downstream tasks. However, existing studies are difficult to apply to increasingly complex scenarios, and they usually treat these two tasks separately. In this work, we propose a new model called YOLO-OCR to do compound figure separation and label recognition simultaneously. The YOLO-OCR realizes object detection, text detection, and text recognition altogether in a unified end-to-end trainable network. Benefiting from shared convolution features, the model has fewer computation costs and higher performance. To reduce annotation costing, we train the model on a synthesized compound figure dataset and then finetune the model in actual compound figure datasets based on an active learning strategy. The results show that the proposed method achieves a new state-of-the-art performance on the ImageCLEF 2016 dataset and our dataset. In addition, we developed an online system based on the proposed model to help researchers conveniently separate compound figures. The project is publicly available at https://www.chatfigures.com/figure-separation. Compound figure separation, Label recognition, Information retrieval, Object detection, Text recognition. Copyright © 2024 by SIAM.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2024 SIAM International Conference on Data Mining (SDM) |
Editors | Shashi Shekhar, Vagelis Papalexakis, Jing Gao, Zhe Jiang, Matteo Riondato |
Publisher | Society for Industrial and Applied Mathematics |
Pages | 118-126 |
ISBN (Electronic) | 9781611978032 |
DOIs | |
Publication status | Published - Apr 2024 |
Event | 2024 SIAM International Conference on Data Mining (SDM24) - Houston, United States Duration: 18 Apr 2024 → 20 Apr 2024 https://www.siam.org/conferences/cm/conference/sdm24 https://www.siam.org/conferences-events/past-event-archive/sdm24/ |
Publication series
Name | Proceedings of the SIAM International Conference on Data Mining, SDM |
---|
Conference
Conference | 2024 SIAM International Conference on Data Mining (SDM24) |
---|---|
Country/Territory | United States |
City | Houston |
Period | 18/04/24 → 20/04/24 |
Internet address |
Research Keywords
- Compound figure separation
- Label recognition
- Information retrieval
- Object detection
- Text recognition