Home / Journals / CMC / Online First / doi:10.32604/cmc.2025.072544
Special Issues
Table of Content

Open Access

ARTICLE

Real-Time 3D Scene Perception in Dynamic Urban Environments via Street Detection Gaussians

Yu Du1, Runwei Guan2, Ho-Pun Lam1, Jeremy Smith3, Yutao Yue4,5, Ka Lok Man1, Yan Li6,*
1 School of Advanced Technology, Xi’an Jiaotong-Liverpool University, Suzhou, 215123, China
2 Thrust of Artificial Intelligence, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, 511400, China
3 Department of Electrical Engineering and Electronics, University of Liverpool, Liverpool, L69 7ZX, UK
4 The Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511400, China
5 Institute of Deep Perception Technology, JITRI, Wuxi, 214000, China
6 Department of Electrical and Computer Engineering, Inha University, Incheon, 402751, Republic of Korea
* Corresponding Author: Yan Li. Email: email

Computers, Materials & Continua https://doi.org/10.32604/cmc.2025.072544

Received 29 August 2025; Accepted 02 December 2025; Published online 23 December 2025

Abstract

As a cornerstone for applications such as autonomous driving, 3D urban perception is a burgeoning field of study. Enhancing the performance and robustness of these perception systems is crucial for ensuring the safety of next-generation autonomous vehicles. In this work, we introduce a novel neural scene representation called Street Detection Gaussians (SDGs), which redefines urban 3D perception through an integrated architecture unifying reconstruction and detection. At its core lies the dynamic Gaussian representation, where time-conditioned parameterization enables simultaneous modeling of static environments and dynamic objects through physically constrained Gaussian evolution. The framework’s radar-enhanced perception module learns cross-modal correlations between sparse radar data and dense visual features, resulting in a 22% reduction in occlusion errors compared to vision-only systems. A breakthrough differentiable rendering pipeline back-propagates semantic detection losses throughout the entire 3D reconstruction process, enabling the optimization of both geometric and semantic fidelity. Evaluated on the Waymo Open Dataset and the KITTI Dataset, the system achieves real-time performance (135 Frames Per Second (FPS)), photorealistic quality (Peak Signal-to-Noise Ratio (PSNR) 34.9 dB), and state-of-the-art detection accuracy (78.1% Mean Average Precision (mAP)), demonstrating a 3.8× end-to-end improvement over existing hybrid approaches while enabling seamless integration with autonomous driving stacks.

Keywords

Radar-vision fusion; differentiable rendering; autonomous driving perception; 3D reconstruction; occlusion robustness
  • 204

    View

  • 44

    Download

  • 0

    Like

Share Link