Semantically-driven No-reference Visual Quality Assessment Using Large Vision-Language Models
Project: Research
Researcher(s)
Description
Visual quality assessment has been a long-standing problem, playing an important role in monitoring, assessing, and optimizing multimedia systems. Traditional quality assessment models, which rely on structural-level or feature-level modeling, face challenges such as a lack of meticulously labeled data, a limited capability to capture high-level semantics, and an unsatisfactory accuracy in correlating with human opinions. Furthermore, these models are typically trained using scalar mean opinion scores (MOSs), lacking interpretability in quality prediction and optimization. The evaluation of visual quality is a complicated process shaped by both perception and comprehension. However, cognitive understanding is typically lacking in existing quality assessment models, leading to biased predictions. This project aims to address this grand challenge by extracting, modeling, and utilizing semantics for quality assessment. The proposed solution paves the way toward incorporating knowledge encoded in these large vision-language models (LVLMs) for quality assessment, greatly benefiting accuracy, adaptability, and interpretability. First, an LVLM-aided quality assessment dataset is created. Instead of one MOS score for one image, the dataset is featured with multi-modality quality labeling based on both human opinions and semantic interpretations by LVLMs. In particular, heuristic dialog rules inspired by the human quality rating process are specifically explored to mimic human interactions with LVLMs. Second, the semantic-driven quality assessment model is developed and grounded on LVLMs, which can be customized according to individual preferences. Instead of providing only one score, the quality information represented in the scalar value, distortion map, and textual descriptions are generated from the assessment model, enriching the interpretability and robustness. Third, adaptable quality-driven optimization schemes are developed, by which the semantically prioritized and qualityoptimized images can be selected or generated.The proposed research could significantly advance the frontier of visual signal processing and provide a paradigm shift in image quality assessment from distortion identification to semantic cognition. The potential applications of the proposed research extend beyond the scenarios supported by traditional assessment models. We demonstrate how the proposed assessment model can optimize image restoration, recommendation, and generation. We also show that incorporating individual preference into model learning can be supported, resulting in high personalization potential. The proposed research can be feasibly extended beyond images to more realistic visual data (e.g., omni-directional representation and point clouds), creating a comprehensive and systematic research ecology for quality assessment of visual data from both real-world and artificial intelligence.Detail(s)
Project number | 9043678 |
---|---|
Grant type | GRF |
Status | Not started |
Effective start/end date | 1/01/25 → … |