Skip to main navigation Skip to search Skip to main content

Direct Cardiovascular Disease Diagnosis From Multi-Modal Multi-View Ultrasound Via Unified Vision-Language Modeling

  • Bin Pu
  • , Jiewen Yang
  • , Hangcheng Cao
  • , Xingguo Lv
  • , Lei Zhao
  • , Qika Lin
  • , Zhan Gao
  • , Yifan Zhu
  • , Haiyan Chen
  • , Kenli Li*
  • *Corresponding author for this work

Research output: Chapters, Conference Papers, Creative and Literary WorksRGC 32 - Refereed conference paper (with host publication)peer-review

Abstract

Cardiovascular disease diagnosis via ultrasound screening relies on manually measured metrics and the experience level of human experts, which is time-consuming and may overlook subtle cross-anatomical pathological patterns. Recent vision-language models offer end-to-end diagnostic potential but lack mechanisms to handle heterogeneous multi-modal, multiview ultrasound data while preserving modality-specific semantics. To fill this gap, we propose an end-to-end framework called MMVL that directly fuses raw ultrasound sequences from diverse anatomical regions, bypassing intermediate measurements, and enabling direct diagnosis. We design lightweight adapters for domain-specific multi-modal feature fusion and refinement, a gating mechanism that dynamically reweights modality importance based on global context, and disease-aware prompt-guided classification. MMVL ensures robust performance across both common and rare conditions. The proposed multi-view, multimodal vision-language framework enables end-to-end cardiovascular disease diagnosis with a 10.9% accuracy gain, and opens a new avenue for automated and generalizable diagnostic solutions. © 2025 IEEE.
Original languageEnglish
Title of host publicationProceedings - 2025 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2025
PublisherIEEE
Pages6287-6294
ISBN (Electronic)9798331515577
ISBN (Print)979-8-3315-1558-4
DOIs
Publication statusPublished - Dec 2025
Event2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2025) - Wuhan, China
Duration: 15 Dec 202518 Dec 2025

Publication series

NameProceedings - IEEE International Conference on Bioinformatics and Biomedicine, BIBM
ISSN (Print)2156-1125
ISSN (Electronic)2156-1133

Conference

Conference2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2025)
PlaceChina
CityWuhan
Period15/12/2518/12/25

Bibliographical note

Full text of this publication does not contain sufficient affiliation information. With consent from the author(s) concerned, the Research Unit(s) information for this record is based on the existing academic department affiliation of the author(s).

Funding

This work was supported in part by the National Key RandD Program of China under Grant 2025YFB3003705, in part by the NSFC under Grants 62227808, Grants 62506124, and in part by the Natural Science Foundation of Hunan Province under Grants 2025JJ60408.

Research Keywords

  • Disease Diagnosis
  • Multi-Modal Ultrasound
  • Vision-Language Model

Fingerprint

Dive into the research topics of 'Direct Cardiovascular Disease Diagnosis From Multi-Modal Multi-View Ultrasound Via Unified Vision-Language Modeling'. Together they form a unique fingerprint.

Cite this