Self-Supervised Monocular Depth Estimation with Scene Dynamic Pose

Jing He; Haonan Zhu; Chenhao Zhao; Minrui Zhao

doi:10.32604/cmc.2025.062437

Open Access icon Open Access

ARTICLE

Self-Supervised Monocular Depth Estimation with Scene Dynamic Pose

Jing He¹, Haonan Zhu², Chenhao Zhao¹, Minrui Zhao^3,*

1 School of Information and Navigation, Airforce Engineering University, Xi’an, 710077, China
2 Unit 95655 of the People’s Liberation Army, Chengdu, 611500, China
3 College of Air and Missile Defense, Airforce Engineering University, Xi’an, 710051, China

* Corresponding Author: Minrui Zhao. Email: email

Computers, Materials & Continua 2025, 83(3), 4551-4573. https://doi.org/10.32604/cmc.2025.062437

Received 18 December 2024; Accepted 26 February 2025; Issue published 19 May 2025

Abstract

Self-supervised monocular depth estimation has emerged as a major research focus in recent years, primarily due to the elimination of ground-truth depth dependence. However, the prevailing architectures in this domain suffer from inherent limitations: existing pose network branches infer camera ego-motion exclusively under static-scene and Lambertian-surface assumptions. These assumptions are often violated in real-world scenarios due to dynamic objects, non-Lambertian reflectance, and unstructured background elements, leading to pervasive artifacts such as depth discontinuities (“holes”), structural collapse, and ambiguous reconstruction. To address these challenges, we propose a novel framework that integrates scene dynamic pose estimation into the conventional self-supervised depth network, enhancing its ability to model complex scene dynamics. Our contributions are threefold: (1) a pixel-wise dynamic pose estimation module that jointly resolves the pose transformations of moving objects and localized scene perturbations; (2) a physically-informed loss function that couples dynamic pose and depth predictions, designed to mitigate depth errors arising from high-speed distant objects and geometrically inconsistent motion profiles; (3) an efficient SE (3) transformation parameterization that streamlines network complexity and temporal pre-processing. Extensive experiments on the KITTI and NYU-V2 benchmarks show that our framework achieves state-of-the-art performance in both quantitative metrics and qualitative visual fidelity, significantly improving the robustness and generalization of monocular depth estimation under dynamic conditions.

Keywords

Monocular depth estimation; self-supervised learning; scene dynamic pose estimation; dynamic-depth constraint; pixel-wise dynamic pose

Cite This Article

APA Style

He, J., Zhu, H., Zhao, C., Zhao, M. (2025). Self-Supervised Monocular Depth Estimation with Scene Dynamic Pose. Computers, Materials & Continua, 83(3), 4551–4573. https://doi.org/10.32604/cmc.2025.062437

Vancouver Style

He J, Zhu H, Zhao C, Zhao M. Self-Supervised Monocular Depth Estimation with Scene Dynamic Pose. Comput Mater Contin. 2025;83(3):4551–4573. https://doi.org/10.32604/cmc.2025.062437

IEEE Style

J. He, H. Zhu, C. Zhao, and M. Zhao, “Self-Supervised Monocular Depth Estimation with Scene Dynamic Pose,” Comput. Mater. Contin., vol. 83, no. 3, pp. 4551–4573, 2025. https://doi.org/10.32604/cmc.2025.062437

BibTex EndNote RIS

Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Self-Supervised Monocular Depth Estimation with Scene Dynamic Pose

Abstract

Keywords

Cite This Article

1443

616

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link