Open Access
ARTICLE
Self-Supervised Monocular Depth Estimation with Scene Dynamic Pose
1 School of Information and Navigation, Airforce Engineering University, Xi’an, 710077, China
2 Unit 95655 of the People’s Liberation Army, Chengdu, 611500, China
3 College of Air and Missile Defense, Airforce Engineering University, Xi’an, 710051, China
* Corresponding Author: Minrui Zhao. Email:
Computers, Materials & Continua 2025, 83(3), 4551-4573. https://doi.org/10.32604/cmc.2025.062437
Received 18 December 2024; Accepted 26 February 2025; Issue published 19 May 2025
Abstract
Self-supervised monocular depth estimation has emerged as a major research focus in recent years, primarily due to the elimination of ground-truth depth dependence. However, the prevailing architectures in this domain suffer from inherent limitations: existing pose network branches infer camera ego-motion exclusively under static-scene and Lambertian-surface assumptions. These assumptions are often violated in real-world scenarios due to dynamic objects, non-Lambertian reflectance, and unstructured background elements, leading to pervasive artifacts such as depth discontinuities (“holes”), structural collapse, and ambiguous reconstruction. To address these challenges, we propose a novel framework that integrates scene dynamic pose estimation into the conventional self-supervised depth network, enhancing its ability to model complex scene dynamics. Our contributions are threefold: (1) a pixel-wise dynamic pose estimation module that jointly resolves the pose transformations of moving objects and localized scene perturbations; (2) a physically-informed loss function that couples dynamic pose and depth predictions, designed to mitigate depth errors arising from high-speed distant objects and geometrically inconsistent motion profiles; (3) an efficient SE (3) transformation parameterization that streamlines network complexity and temporal pre-processing. Extensive experiments on the KITTI and NYU-V2 benchmarks show that our framework achieves state-of-the-art performance in both quantitative metrics and qualitative visual fidelity, significantly improving the robustness and generalization of monocular depth estimation under dynamic conditions.Keywords
Cite This Article

This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.