Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable Cyclic Image-Report Generation
Research output: Chapters, Conference Papers, Creative and Literary Works › RGC 32 - Refereed conference paper (with host publication) › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Title of host publication | Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) |
Publication status | Published - Aug 2024 |
Conference
Title | 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 |
---|---|
Location | Centara Grand and Bangkok Convention Centre |
Place | Thailand |
City | Bangkok |
Period | 11 - 16 August 2024 |
Link(s)
Permanent Link | https://scholars.cityu.edu.hk/en/publications/publication(5e9b9c04-6c73-4a42-a60e-e2a15ec9cfe6).html |
---|
Abstract
Fine-grained vision-language models (VLM) have been widely used for inter-modality local alignment between the predefined fixed patches and textual words. However, in medical analysis, lesions exhibit varying sizes and positions, and using fixed patches may cause incomplete representations of lesions. Moreover, these methods provide explainability by using heatmaps to show the general image areas potentially associated with texts rather than specific regions, making their explanations not explicit and specific enough. To address these issues, we propose a novel Adaptive patch-word Matching (AdaMatch) model to correlate chest X-ray (CXR) image regions with words in medical reports and apply it to CXR-report generation to provide explainability for the generation process. AdaMatch exploits the fine-grained relation between adaptive patches and words to provide explanations of specific image regions with corresponding words. To capture the abnormal regions of varying sizes and positions, we introduce an Adaptive Patch extraction (AdaPatch) module to acquire adaptive patches for these regions adaptively. Aiming to provide explicit explainability for the CXR-report generation task, we propose an AdaMatch-based bidirectional LLM for Cyclic CXR-report generation (AdaMatch-Cyclic). It employs AdaMatch to obtain the keywords for CXR images and `keypatches' for medical reports as hints to guide CXR-report generation. Extensive experiments on two publicly available CXR datasets validate the effectiveness of our method and its superior performance over existing methods.
Bibliographic Note
Research Unit(s) information for this publication is provided by the author(s) concerned.
Since this conference is yet to commence, the information for this record is subject to revision.
Citation Format(s)
Fine-Grained Image-Text Alignment in Medical Imaging Enables Explainable Cyclic Image-Report Generation. / Chen, Wenting; Shen, Linlin; Lin, Jingyang et al.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). 2024.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024). 2024.
Research output: Chapters, Conference Papers, Creative and Literary Works › RGC 32 - Refereed conference paper (with host publication) › peer-review