Bi-VLGM : Bi-Level Class-Severity-Aware Vision-Language Graph Matching for Text Guided Medical Image Segmentation
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Author(s)
Related Research Unit(s)
Detail(s)
Original language | English |
---|---|
Pages (from-to) | 1375–1391 |
Number of pages | 17 |
Journal / Publication | International Journal of Computer Vision |
Volume | 133 |
Issue number | 3 |
Online published | 6 Oct 2024 |
Publication status | Published - Mar 2025 |
Link(s)
DOI | DOI |
---|---|
Attachment(s) | Documents
Publisher's Copyright Statement
|
Link to Scopus | https://www.scopus.com/record/display.uri?eid=2-s2.0-85205695428&origin=recordpage |
Permanent Link | https://scholars.cityu.edu.hk/en/publications/publication(ae7acd65-1be3-493d-af5e-f261c3574698).html |
Abstract
Medical reports containing specific diagnostic results and additional information not present in medical images can be effectively employed to assist image understanding tasks, and the modality gap between vision and language can be bridged by vision-language matching (VLM). However, current vision-language models distort the intra-model relation and only include class information in reports that is insufficient for segmentation task. In this paper, we introduce a novel Bi-level class-severity-aware Vision-Language Graph Matching (Bi-VLGM) for text guided medical image segmentation, composed of a word-level VLGM module and a sentence-level VLGM module, to exploit the class-severity-aware relation among visual-textual features. In word-level VLGM, to mitigate the distorted intra-modal relation during VLM, we reformulate VLM as graph matching problem and introduce a vision-language graph matching (VLGM) to exploit the high-order relation among visual-textual features. Then, we perform VLGM between the local features for each class region and class-aware prompts to bridge their gap. In sentence-level VLGM, to provide disease severity information for segmentation task, we introduce a severity-aware prompting to quantify the severity level of disease lesion, and perform VLGM between the global features and the severity-aware prompts. By exploiting the relation between the local (global) and class (severity) features, the segmentation model can include the class-aware and severity-aware information to promote segmentation performance. Extensive experiments proved the effectiveness of our method and its superiority to existing methods. The source code will be released. © The Author(s) 2024.
Research Area(s)
- Graph matching, Medical image segmentation, Text guided segmentation, Vision-language model
Citation Format(s)
Bi-VLGM: Bi-Level Class-Severity-Aware Vision-Language Graph Matching for Text Guided Medical Image Segmentation. / Chen, Wenting; Liu, Jie; Liu, Tianming et al.
In: International Journal of Computer Vision, Vol. 133, No. 3, 03.2025, p. 1375–1391.
In: International Journal of Computer Vision, Vol. 133, No. 3, 03.2025, p. 1375–1391.
Research output: Journal Publications and Reviews › RGC 21 - Publication in refereed journal › peer-review
Download Statistics
No data available