Open Access iconOpen Access

ARTICLE

DA-T3D: Distribution-Aware Cross-Modal Distillation Framework for Temporal 3D Object Detection

Tianzhe Jiao, Yuming Chen, Xiaoyue Feng, Chaopeng Guo, Jie Song*

Software College, Northeastern University, Shenyang, China

* Corresponding Author: Jie Song. Email: email

(This article belongs to the Special Issue: Advanced Image Segmentation and Object Detection: Innovations, Challenges, and Applications)

Computer Modeling in Engineering & Sciences 2026, 147(1), 1 https://doi.org/10.32604/cmes.2026.080595

Abstract

Knowledge distillation bridges the performance gap between camera-based and LiDAR-based 3D detectors by leveraging the precise geometric information from LiDAR. However, cross-modal knowledge transfer remains challenging due to the inherent modality heterogeneity between LiDAR and camera data, which often leads to instability during training. In this work, we find that these instabilities are closely related to distribution mismatch in the cross-modal feature space and noisy teacher signals. To address this issue, we propose a novel distribution-aware cross-modal distillation framework, named DA-T3D. Specifically, we first explicitly model the LiDAR teacher’s Bird’s-Eye-View (BEV) feature distribution and use the learned distribution as a statistical prior to guide the student features toward high-density and geometrically stable regions in the teacher’s BEV feature space. This ensures feature alignment in BEV space by constraining the student model’s feature distribution to match that of the LiDAR teacher model within foreground regions. Next, we further introduce response-level distillation to directly transfer the teacher’s prediction behavior to the student detection head, providing direct output-space supervision that complements feature distillation and effectively reduces modality-induced ambiguity, leading to more accurate and stable classification confidence and bounding-box regression. Furthermore, we perform temporal modeling on the distilled cross-modal features to produce fused BEV representations that capture more comprehensive scene context. Finally, we utilize the fused BEV features to generate 3D detection results. Through experiments, we validate the effectiveness and superiority of DA-T3D on the nuScenes dataset, achieving 46.7% mAP and 58.1% NDS.

Keywords

3D object detection; Bird’s-Eye-View perception; cross-modal knowledge distillation; Dirichlet process Gaussian mixture model; temporal modeling

Cite This Article

APA Style
Jiao, T., Chen, Y., Feng, X., Guo, C., Song, J. (2026). DA-T3D: Distribution-Aware Cross-Modal Distillation Framework for Temporal 3D Object Detection. Computer Modeling in Engineering & Sciences, 147(1), 1. https://doi.org/10.32604/cmes.2026.080595
Vancouver Style
Jiao T, Chen Y, Feng X, Guo C, Song J. DA-T3D: Distribution-Aware Cross-Modal Distillation Framework for Temporal 3D Object Detection. Comput Model Eng Sci. 2026;147(1):1. https://doi.org/10.32604/cmes.2026.080595
IEEE Style
T. Jiao, Y. Chen, X. Feng, C. Guo, and J. Song, “DA-T3D: Distribution-Aware Cross-Modal Distillation Framework for Temporal 3D Object Detection,” Comput. Model. Eng. Sci., vol. 147, no. 1, pp. 1, 2026. https://doi.org/10.32604/cmes.2026.080595



cc Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 236

    View

  • 62

    Download

  • 0

    Like

Share Link