Absolute Monocular Depth Estimation on Robotic Visual and Kinematics Data via Self-Supervised Learning

Ruofeng Wei, Bin Li, Fangxun Zhong, Hangjie Mo, Qi Dou, Yun-Hui Liu, Dong Sun*

*Corresponding author for this work

Research output: Journal Publications and ReviewsRGC 21 - Publication in refereed journalpeer-review

1 Citation (Scopus)

Abstract

Accurate estimation of absolute depth from a monocular endoscope is a fundamental task for automatic navigation systems in robotic surgery. Previous works solely rely on uni-modal data (i.e., monocular images), which can only estimate depth values arbitrarily scaled with the real world. In this paper, we present a novel framework, SADER, which explores vision and robot kinematics to estimate the high-quality absolute depth for monocular surgical scenes. To jointly learn the multi-modal data, we introduce a self-distillation based two-stage training policy in the framework. In the first stage, a boosting depth module based on vision transformer is proposed to improve the relative depth estimation network that is trained in a self-supervised method. Then, we develop an algorithm to automatically compute the scale from robot kinematics. By coupling the scale and relative depth data, pseudo absolute depth labels for all images are yielded. In the second stage, we re-train the network with 3D loss supervised by pseudo labels. To make our method generalize to different endoscopes, the learning of endoscopic intrinsics is integrated into the network. In addition, we did cadaver experiments to collect new surgical depth estimation data about robotic laparoscopy for evaluation. Experimental results on public SCARED and cadaver data demonstrate that the SADER outperforms previous state-of-art even stereo-based methods with an accuracy error under 1.90 mm, proving the feasibility of our approach to recover the absolute depth with monocular inputs. © 2024 IEEE.
Original languageEnglish
Pages (from-to)4269-4282
JournalIEEE Transactions on Automation Science and Engineering
Volume22
Online published7 Jun 2024
DOIs
Publication statusPublished - 2025

Funding

This article was recommended for publication by Editor P. Rocco upon evaluation of the reviewers’ comments. This work was supported in part by the Research Grants Council of Hong Kong Special Administrative Region, China, under Grant T42-409/18-R and Grant 11211421; and in part by the National Natural Science Foundation of China under Project 62322318.

Research Keywords

  • absolute depth estimation
  • Boosting
  • endoscope
  • Endoscopes
  • Estimation
  • Kinematics
  • monocular images
  • multi-modal learning
  • Robot kinematics
  • Surgical robotics
  • Training
  • Visualization

Fingerprint

Dive into the research topics of 'Absolute Monocular Depth Estimation on Robotic Visual and Kinematics Data via Self-Supervised Learning'. Together they form a unique fingerprint.

Cite this