Open Access
ARTICLE
Self-Supervised Monocular Depth Estimation with Scene Dynamic Pose
1 School of Information and Navigation, Airforce Engineering University, Xi’an, 710077, China
2 Unit 95655 of the People’s Liberation Army, Chengdu, 611500, China
3 College of Air and Missile Defense, Airforce Engineering University, Xi’an, 710051, China
* Corresponding Author: Minrui Zhao. Email:
Computers, Materials & Continua 2025, 83(3), 4551-4573. https://doi.org/10.32604/cmc.2025.062437
Received 18 December 2024; Accepted 26 February 2025; Issue published 19 May 2025
Abstract
Self-supervised monocular depth estimation has emerged as a major research focus in recent years, primarily due to the elimination of ground-truth depth dependence. However, the prevailing architectures in this domain suffer from inherent limitations: existing pose network branches infer camera ego-motion exclusively under static-scene and Lambertian-surface assumptions. These assumptions are often violated in real-world scenarios due to dynamic objects, non-Lambertian reflectance, and unstructured background elements, leading to pervasive artifacts such as depth discontinuities (“holes”), structural collapse, and ambiguous reconstruction. To address these challenges, we propose a novel framework that integrates scene dynamic pose estimation into the conventional self-supervised depth network, enhancing its ability to model complex scene dynamics. Our contributions are threefold: (1) a pixel-wise dynamic pose estimation module that jointly resolves the pose transformations of moving objects and localized scene perturbations; (2) a physically-informed loss function that couples dynamic pose and depth predictions, designed to mitigate depth errors arising from high-speed distant objects and geometrically inconsistent motion profiles; (3) an efficient SE (3) transformation parameterization that streamlines network complexity and temporal pre-processing. Extensive experiments on the KITTI and NYU-V2 benchmarks show that our framework achieves state-of-the-art performance in both quantitative metrics and qualitative visual fidelity, significantly improving the robustness and generalization of monocular depth estimation under dynamic conditions.Keywords
Cite This Article
Copyright © 2025 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools