TY  - EJOU
AU  - Li, Xinyu 
AU  - Wan, Gang 
AU  - Chen, Xinyang 
AU  - Qie, Liyue 
AU  - Fan, Xinnan 
AU  - Shi, Pengfei 
AU  - Wan, Jin 

TI  - FastSECOND: Real-Time 3D Detection via Swin-Transformer Enhanced SECOND with Geometry-Aware Learning
T2  - Computer Modeling in Engineering \& Sciences

PY  - 2025
VL  - 144
IS  - 1
SN  - 1526-1506

AB  - The inherent limitations of 2D object detection, such as inadequate spatial reasoning and susceptibility to environmental occlusions, pose significant risks to the safety and reliability of autonomous driving systems. To address these challenges, this paper proposes an enhanced 3D object detection framework (FastSECOND) based on an optimized SECOND architecture, designed to achieve rapid and accurate perception in autonomous driving scenarios. Key innovations include: (1) Replacing the Rectified Linear Unit (ReLU) activation functions with the Gaussian Error Linear Unit (GELU) during voxel feature encoding and region proposal network stages, leveraging partial convolution to balance computational efficiency and detection accuracy; (2) Integrating a Swin-Transformer V2 module into the voxel backbone network to enhance feature extraction capabilities in sparse data; and (3) Introducing an optimized position regression loss combined with a geometry-aware Focal-EIoU loss function, which incorporates bounding box geometric correlations to accelerate network convergence. While this study currently focuses exclusively on the detection of the Car category, with experiments conducted on the Car class of the KITTI dataset, future work will extend to other categories such as Pedestrian and Cyclist to more comprehensively evaluate the generalization capability of the proposed framework. Extensive experimental results demonstrate that our framework achieves a more effective trade-off between detection accuracy and speed. Compared to the baseline SECOND model, it achieves a 21.9% relative improvement in 3D bounding box detection accuracy on the hard subset, while reducing inference time by 14 ms. These advancements underscore the framework’s potential for enabling real-time, high-precision perception in autonomous driving applications.
KW  - 3D object detection; automatic driving; Deep Learning; SECOND; geometry-aware learning

DO  - 10.32604/cmes.2025.064775