Skip to main navigation Skip to search Skip to main content

MVHDiff: Leveraging 3D priors for consistent multi-view human image generation with diffusion models

Yan Huang, Hongxin Fu, Zhonghang Li, Yongcan Luo, Tianyi Chen*, Si Wu

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

Abstract

Text-to-image models are increasingly applied to human image generation, leveraging multimodal information under multiple conditions to produce high-quality human images. Despite their ability to generate detailed images, these models often struggle to maintain perceptual consistency across multiple viewpoints. To address this limitation, we propose Multi-View Human Diffusion (MVHDiff), a novel framework that integrates 3D human model priors and text prompts to generate high-quality, multi-view-consistent human images. MVHDiff separately acquires textual descriptions of human appearance and pose, as well as spatial information regarding the subject's orientation relative to the camera. Subsequently, a perceptual fusion module is employed to align these text features with the visual features extracted from the human image, thereby enabling the fused learning of prior information and image features. Further, MVHDiff finetunes both appearance descriptions and spatial viewpoint-related textual inputs, enabling precise text-based control over human attributes while ensuring semantic consistency across different spatial viewpoints. Experimental results demonstrate that MVHDiff significantly outperforms existing methods in generating text-guided human attributes with consistent multi-view representations, offering a robust solution for high-quality, text-driven human image generation. © 2026 Elsevier B.V.
Original languageEnglish
Article number133057
JournalNeurocomputing
Volume676
Online published13 Feb 2026
DOIs
Publication statusOnline published - 13 Feb 2026

Funding

This work was supported in part by the GuangDong Basic and Applied Basic Research Foundation (Project No. 2024A1515011437), in part by the Guangdong Provincial Science and Technology Program (Project No. 2025A050508003), in part by the National Key Research and Development Program of China (Project No. 2024YFE0105400), in part by the National Natural Science Foundation of China (Project Nos. 62372186 and 62472179), and in part by the Science and Technology Planning Project of Guangdong Province (Project No. 2025A0505020016).

Research Keywords

  • 3D prior understanding
  • Diffusion models
  • Image generation

Fingerprint

Dive into the research topics of 'MVHDiff: Leveraging 3D priors for consistent multi-view human image generation with diffusion models'. Together they form a unique fingerprint.

Cite this