Abstract
Cardiovascular disease diagnosis via ultrasound screening relies on manually measured metrics and the experience level of human experts, which is time-consuming and may overlook subtle cross-anatomical pathological patterns. Recent vision-language models offer end-to-end diagnostic potential but lack mechanisms to handle heterogeneous multi-modal, multiview ultrasound data while preserving modality-specific semantics. To fill this gap, we propose an end-to-end framework called MMVL that directly fuses raw ultrasound sequences from diverse anatomical regions, bypassing intermediate measurements, and enabling direct diagnosis. We design lightweight adapters for domain-specific multi-modal feature fusion and refinement, a gating mechanism that dynamically reweights modality importance based on global context, and disease-aware prompt-guided classification. MMVL ensures robust performance across both common and rare conditions. The proposed multi-view, multimodal vision-language framework enables end-to-end cardiovascular disease diagnosis with a 10.9% accuracy gain, and opens a new avenue for automated and generalizable diagnostic solutions. © 2025 IEEE.
| Original language | English |
|---|---|
| Title of host publication | Proceedings - 2025 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2025 |
| Publisher | IEEE |
| Pages | 6287-6294 |
| ISBN (Electronic) | 9798331515577 |
| ISBN (Print) | 979-8-3315-1558-4 |
| DOIs | |
| Publication status | Published - Dec 2025 |
| Event | 2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2025) - Wuhan, China Duration: 15 Dec 2025 → 18 Dec 2025 |
Publication series
| Name | Proceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM |
|---|---|
| ISSN (Print) | 2156-1125 |
| ISSN (Electronic) | 2156-1133 |
Conference
| Conference | 2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2025) |
|---|---|
| Place | China |
| City | Wuhan |
| Period | 15/12/25 → 18/12/25 |
Bibliographical note
Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s).Funding
This work was supported in part by the National Key RandD Program of China under Grant 2025YFB3003705, in part by the NSFC under Grants 62227808, Grants 62506124, and in part by the Natural Science Foundation of Hunan Province under Grants 2025JJ60408.
Research Keywords
- Disease Diagnosis
- Multi-Modal Ultrasound
- Vision-Language Model
Fingerprint
Dive into the research topics of 'Direct Cardiovascular Disease Diagnosis From Multi-Modal Multi-View Ultrasound Via Unified Vision-Language Modeling'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver