Open Access iconOpen Access

ARTICLE

crossmark

FastSECOND: Real-Time 3D Detection via Swin-Transformer Enhanced SECOND with Geometry-Aware Learning

Xinyu Li1,2, Gang Wan2, Xinyang Chen3, Liyue Qie3, Xinnan Fan3, Pengfei Shi3, Jin Wan3,*

1 College of Shipbuilding Engineering, Harbin Engineering University, Harbin, 150001, China
2 China Yangtze River Electric Power Co., Ltd., Yichang, 443002, China
3 College of Information Science and Engineering, Hohai University, Changzhou, 213000, China

* Corresponding Author: Jin Wan. Email: email

(This article belongs to the Special Issue: Data-Driven Artificial Intelligence and Machine Learning in Computational Modelling for Engineering and Applied Sciences)

Computer Modeling in Engineering & Sciences 2025, 144(1), 1071-1090. https://doi.org/10.32604/cmes.2025.064775

Abstract

The inherent limitations of 2D object detection, such as inadequate spatial reasoning and susceptibility to environmental occlusions, pose significant risks to the safety and reliability of autonomous driving systems. To address these challenges, this paper proposes an enhanced 3D object detection framework (FastSECOND) based on an optimized SECOND architecture, designed to achieve rapid and accurate perception in autonomous driving scenarios. Key innovations include: (1) Replacing the Rectified Linear Unit (ReLU) activation functions with the Gaussian Error Linear Unit (GELU) during voxel feature encoding and region proposal network stages, leveraging partial convolution to balance computational efficiency and detection accuracy; (2) Integrating a Swin-Transformer V2 module into the voxel backbone network to enhance feature extraction capabilities in sparse data; and (3) Introducing an optimized position regression loss combined with a geometry-aware Focal-EIoU loss function, which incorporates bounding box geometric correlations to accelerate network convergence. While this study currently focuses exclusively on the detection of the Car category, with experiments conducted on the Car class of the KITTI dataset, future work will extend to other categories such as Pedestrian and Cyclist to more comprehensively evaluate the generalization capability of the proposed framework. Extensive experimental results demonstrate that our framework achieves a more effective trade-off between detection accuracy and speed. Compared to the baseline SECOND model, it achieves a 21.9% relative improvement in 3D bounding box detection accuracy on the hard subset, while reducing inference time by 14 ms. These advancements underscore the framework’s potential for enabling real-time, high-precision perception in autonomous driving applications.

Keywords

3D object detection; automatic driving; Deep Learning; SECOND; geometry-aware learning

Cite This Article

APA Style
Li, X., Wan, G., Chen, X., Qie, L., Fan, X. et al. (2025). FastSECOND: Real-Time 3D Detection via Swin-Transformer Enhanced SECOND with Geometry-Aware Learning. Computer Modeling in Engineering & Sciences, 144(1), 1071–1090. https://doi.org/10.32604/cmes.2025.064775
Vancouver Style
Li X, Wan G, Chen X, Qie L, Fan X, Shi P, et al. FastSECOND: Real-Time 3D Detection via Swin-Transformer Enhanced SECOND with Geometry-Aware Learning. Comput Model Eng Sci. 2025;144(1):1071–1090. https://doi.org/10.32604/cmes.2025.064775
IEEE Style
X. Li et al., “FastSECOND: Real-Time 3D Detection via Swin-Transformer Enhanced SECOND with Geometry-Aware Learning,” Comput. Model. Eng. Sci., vol. 144, no. 1, pp. 1071–1090, 2025. https://doi.org/10.32604/cmes.2025.064775



cc Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 653

    View

  • 566

    Download

  • 0

    Like

Share Link