TY - JOUR
T1 - Inverse Kinematics Embedded Network for Robust Patient Anatomy Avatar Reconstruction From Multimodal Data
AU - Zhou, Tongxi
AU - Chen, Mingcong
AU - Cao, Guanglin
AU - Hu, Jian
AU - Liu, Hongbin
PY - 2024/4
Y1 - 2024/4
N2 - Patient modelling has a wide range of applications in medicine and healthcare, such as clinical teaching, surgery navigation and automatic robotized scanning. While patients are typically covered or occluded in medical scenes, directly regressing human meshes from single RGB images is challenging. To this end, we design a deep learning-based patient anatomy reconstruction network from RGB-D images with three key modules: 1) the attention-based multimodal fusion module, 2) the analytical inverse kinematics module and 3) the anatomical layer module. In our pipeline, the color and depth modality are fully fused by the multimodal attention module to obtain a cover-insensitive feature map. The estimated 3D keypoints, learned from the fused feature, are further converted to patient model parameters through the embedded analytical inverse kinematics module. To capture more detailed patient structures, we also present a parametric anatomy avatar by extending the Skinned Multi-Person Linear Model (SMPL) with internal bone and artery models. Final meshes are driven by the predicted parameters via the anatomical layer module, generating digital twins of patients. Experimental results on the Simultaneously-Collected Multimodal Lying Pose Dataset demonstrate that our approach surpasses state-of-the-art human mesh recovery methods and shows robustness to occlusions. © 2024 IEEE.
AB - Patient modelling has a wide range of applications in medicine and healthcare, such as clinical teaching, surgery navigation and automatic robotized scanning. While patients are typically covered or occluded in medical scenes, directly regressing human meshes from single RGB images is challenging. To this end, we design a deep learning-based patient anatomy reconstruction network from RGB-D images with three key modules: 1) the attention-based multimodal fusion module, 2) the analytical inverse kinematics module and 3) the anatomical layer module. In our pipeline, the color and depth modality are fully fused by the multimodal attention module to obtain a cover-insensitive feature map. The estimated 3D keypoints, learned from the fused feature, are further converted to patient model parameters through the embedded analytical inverse kinematics module. To capture more detailed patient structures, we also present a parametric anatomy avatar by extending the Skinned Multi-Person Linear Model (SMPL) with internal bone and artery models. Final meshes are driven by the predicted parameters via the anatomical layer module, generating digital twins of patients. Experimental results on the Simultaneously-Collected Multimodal Lying Pose Dataset demonstrate that our approach surpasses state-of-the-art human mesh recovery methods and shows robustness to occlusions. © 2024 IEEE.
KW - deep learning for visual perception
KW - Gesture
KW - modeling and simulating humans
KW - posture and facial expressions
KW - RGB-D perception
UR - http://www.scopus.com/inward/record.url?scp=85186072616&partnerID=8YFLogxK
UR - https://www.scopus.com/record/pubmetrics.uri?eid=2-s2.0-85186072616&origin=recordpage
U2 - 10.1109/LRA.2024.3366418
DO - 10.1109/LRA.2024.3366418
M3 - RGC 21 - Publication in refereed journal
SN - 2377-3766
VL - 9
SP - 3395
EP - 3402
JO - IEEE Robotics and Automation Letters
JF - IEEE Robotics and Automation Letters
IS - 4
ER -