
This article proposes a novel distribution-aware cross-modal distillation framework to address the distillation instability in camera-based 3D object detection caused by modality heterogeneity and noisy teacher supervision. Unlike conventional pointwise feature alignment, the proposed method explicitly models the BEV feature distribution of the LiDAR teacher using a Dirichlet Process Gaussian Mixture Model and guides the student model to focus on high-density, geometrically stable regions of teacher features through distribution-level consistency constraints, thereby effectively narrowing the cross-modal feature gap. In addition, response-level distillation and a lightweight two-frame temporal fusion module are introduced to further improve the prediction accuracy of the detection head.
The cover image was created by ChatGPT and contains no copyrighted elements or misleading representations.
View this paper


Submit a Paper
Propose a Special lssue