Open Access
ARTICLE
Railway Track Defect Detection Based on Dynamic Multi-Modal Fusion and Challenging Object Enhanced Perception
1 Institute of Technological Innovation, Beijing Subway Operation Co. Ltd., Beijing, 100044, China
2 Technical Department, Beijing Subway Operation Co. Ltd., Beijing, 100044, China
3 State Key Laboratory of Advanced Rail Autonomous Operation, Beijing Jiaotong University, Beijing, 100044, China
* Corresponding Author: Yang Gao. Email:
(This article belongs to the Special Issue: AI-Enhanced Low-Altitude Technology Applications in Structural Integrity Evaluation and Safety Management of Transportation Infrastructure Systems)
Structural Durability & Health Monitoring 2026, 20(2), 10 https://doi.org/10.32604/sdhm.2025.072538
Received 29 August 2025; Accepted 27 October 2025; Issue published 31 March 2026
Abstract
The fasteners employed in the railway tracks are susceptible to defects arising from their intricate composition. Foreign objects are frequently observed on the track bed in an open environment. These two types of defects pose potential threats to high-speed trains, thus necessitating timely and accurate track inspection. The majority of extant automatic inspection methods are predicated on the utilization of single visible light data, and the efficacy of the algorithmic processes is influenced by complex environments. Furthermore, due to the single information dimension, the detection accuracy of defects in similar, occluded, and small object categories is low. To address the aforementioned issues, this paper proposes a track defect detection method based on dynamic multi-modal fusion and challenging object enhanced perception. First, in light of the variances in the representation dimensions of multimodal information, this paper proposes a dynamic weighted multi-modal feature fusion module. The fused multi-modal features are assigned weights, and then multiplied with the extracted single-modal features at multiple levels, achieving adaptive adjustment of the response degree of fusion features. Second, a novel stepwise multi-scale convolution feature aggregation module is proposed for challenging objects. The proposed method employs depth separable convolution and cross-scale aggregation operations of different receptive fields to enhance feature extraction and reuse, thereby reducing the degree of progressive loss of effective information. The experimental results demonstrate the efficacy of the proposed method in comparison to eight established methods, encompassing both single-modal and multi-modal methods, as evidenced by the extensive findings within the constructed RGBD dataset.Keywords
Cite This Article
Copyright © 2026 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools