Home / Journals / CMC / Online First / doi:10.32604/cmc.2026.079331
Special Issues
Table of Content

Open Access

ARTICLE

DeepEchoNet: A Lightweight Architecture for Low Resolution Monocular Depth Estimation

Giulio Caporro1, Paolo Russo2,*
1 Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy
2 Department of Civil, Computer Science and Aeronautical Technologies Engineering, Roma Tre University, Rome, Italy
* Corresponding Author: Paolo Russo. Email: email
(This article belongs to the Special Issue: Advances in Efficient Vision Transformers: Architectures, Optimization, and Applications)

Computers, Materials & Continua https://doi.org/10.32604/cmc.2026.079331

Received 20 January 2026; Accepted 25 March 2026; Published online 23 April 2026

Abstract

Monocular depth estimation (MDE) has become a practical alternative to active range sensing in many indoor scenarios, enabled by supervised deep learning models that predict dense depth maps from a single RGB image. However, most modern MDE systems assume mid-to-high resolution inputs and non-trivial compute budgets, limiting their direct applicability in embedded and bandwidth-constrained settings. This paper studies low resolution MDE, focusing on 96×96 inputs, where geometric cues are strongly degraded and naively downsizing high-resolution architectures often leads to unstable training and poor accuracy. We propose DeepEchoNet, a lightweight hybrid CNN-transformer model tailored to operate natively at 96×96 resolution. The design combines a MobileViT-inspired encoder with MobileNetV2-style inverted residual blocks and lightweight transformer blocks, and a guided decoder that selectively fuses multi-scale skip features through efficient recalibration modules and separable convolutions. We further adopt a training objective that is aware of low resolution, along with a joint RGB–depth augmentation pipeline that includes a strong-to-weak schedule, to improve robustness while preserving coarse geometric consistency.

Graphical Abstract

DeepEchoNet: A Lightweight Architecture for Low Resolution Monocular Depth Estimation

Keywords

Monocular depth estimation; lightweight neural networks; mobile vision transformers; encoder-decoder architectures; edge deployment; low resolution
  • 208

    View

  • 29

    Download

  • 5

    Like

Share Link