Open Access
REVIEW
Review of Deep Learning-Based Intelligent Inspection Research for Transmission Lines
1 College of Mechanical and Electrical Engineering, Chizhou University, Chizhou, China
2 College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China
* Corresponding Authors: Chuanyang Liu. Email: ,
(This article belongs to the Special Issue: Advances in Deep Learning and Neural Networks: Architectures, Applications, and Challenges)
Computers, Materials & Continua 2026, 87(2), 5 https://doi.org/10.32604/cmc.2026.075348
Received 30 October 2025; Accepted 21 January 2026; Issue published 12 March 2026
Abstract
Intelligent inspection of transmission lines enables efficient automated fault detection by integrating artificial intelligence, robotics, and other related technologies. It plays a key role in ensuring power grid safety, reducing operation and maintenance costs, driving the digital transformation of the power industry, and facilitating the achievement of the dual-carbon goals. This review focuses on vision-based power line inspection, with deep learning as the core perspective to systematically analyze the latest research advancements in this field. Firstly, at the technical foundation level, it elaborates on deep learning algorithms for intelligent transmission line inspection based on image perception, covering object detection algorithms, semantic segmentation algorithms, and other relevant methodologies. Secondly, in application practice, it summarizes deep learning-based intelligent inspection applications across six dimensions—including detection of power insulators and their defects, transmission tower detection, power line feature extraction, metal fitting and defect detection, thermal fault diagnosis of power components, and safety hazard detection in power scenarios, and further lists relevant public datasets. Finally, in response to current challenges, it identifies five key future research directions, such as the deep integration of multiple learning paradigms, multi-modal data fusion, collaborative application of large and small models, cloud-edge-end collaborative integration, and multi-agent cluster control. This paper reviews and analyzes numerous deep learning-based intelligent detection methods for aerial images, comprehensively explores the application of deep learning in Unmanned Aerial Vehicle (UAV) inspection scenarios, and thus provides valuable theoretical and practical references for scholars engaged in smart grid automated inspection research.Keywords
As the economic lifeline of a nation, the power industry shoulders increasingly heavy responsibilities and plays an irreplaceable core role in meeting the electricity demands of industrial production and daily life, driving national economic development, and safeguarding social security and stability. With the rapid growth of the social economy and the continuous improvement of people’s living standards, society’s demand for electricity has exhibited a rigid upward trend. This objective trend has directly accelerated the large-scale expansion of power grid construction. By deeply integrating cutting-edge information technologies such as artificial intelligence, the Internet of Things, big data, and cloud platforms, it has established a robust smart grid infrastructure, which provides crucial technical support and hardware foundations for the large-scale application of deep learning and machine vision technologies in power systems [1,2].
Transmission lines, serving as the pivotal infrastructure for power transmission, are crucial to maintaining the stability of national livelihoods. Ensuring their operational reliability and safety is an essential prerequisite for the continuous supply of electrical energy [3]. However, owing to the inherent characteristics of power transmission and distribution systems, these lines are predominantly deployed in complex terrains such as dense forests and rugged mountains. Persistently exposed to environmental stressors including extreme weather events, topographical variations, and seasonal shifts, they also face challenges ranging from internal design flaws to external mechanical damage. Consequently, regular inspections are indispensable for ensuring the safe and stable operation of transmission lines, and the integration of deep learning and machine vision technologies is now providing an entirely new solution for efficient inspection in such complex environments [4].
The core objective of intelligent inspection for transmission lines is to monitor the status and diagnose faults of key components such as transmission towers, insulators, anti-vibration hammers, wire clamps, grading rings, bolts, and connection plates, while identifying potential hazards including bird nests on towers, suspended objects on lines (e.g., kites, balloons, garbage bags, and ice accretions), and vegetation or engineering vehicles within the transmission corridor [5]. Traditional manual inspection not only struggles to cover complex terrains but also has limited accuracy in detecting minor defects. In contrast, machine vision-based intelligent detection technology can achieve automated recognition of multiple component types and hazards through image feature extraction and pattern recognition [6]. For instance, it can identify cracking defects by analyzing texture changes on insulator surfaces and distinguish between iced and normal lines using color features, thereby significantly enhancing the comprehensiveness and accuracy of inspections.
With the rapid development of image processing and UAV control technologies, the model of “UAV inspection as the primary approach and manual inspection as a supplement” has become the mainstream mode for power transmission line inspection [7]. Major power grid enterprises have widely deployed UAVs equipped with visible light or infrared imaging devices, generating massive volumes of image and video data that lay a solid data foundation for intelligent analysis. Compared with infrared images, visible light images—rich in shape, color, and texture features—have emerged as the preferred data source for machine vision algorithms. However, UAV aerial images present challenges for component recognition and defect detection, such as complex backgrounds (including forests, fields, rivers, and buildings), variable target scales (e.g., small insulators captured from a long distance vs. large towers photographed at close range), and target occlusion (e.g., wires blocked by tree branches). Traditional manual identification is not only time-consuming and labor-intensive but also prone to missed or false detections due to visual fatigue. Thus, deep learning-based automated detection methods have become key to breaking this bottleneck: through the multi-layer feature extraction capability of Convolutional Neural Networks (CNNs), interference from complex backgrounds can be effectively suppressed, enabling accurate recognition of multi-scale and multi-morphological targets [8].
Over the past decade, with the maturation of UAV control technologies, power transmission line inspection has gradually evolved from mere image acquisition and transmission to the deep integration of UAVs and visual detection technologies. Its core lies in achieving intelligent recognition of power components and automated detection of defects [9–11]. Deep learning has demonstrated revolutionary advantages in this process: leveraging large-scale annotated datasets and high-performance computing hardware, target detection algorithms such as You Only Look Once (YOLO), Faster Region-Based Convolutional Neural Network (Faster R-CNN), and Single Shot MultiBox Detector (SSD) can automatically extract deep semantic features from images, enabling end-to-end detection that maps raw images to target coordinates and categories [12]. Instance segmentation algorithms like Mask Region-Based Convolutional Neural Network (Mask R-CNN) can further output pixel-level contours of targets, providing more granular information for defect localization (e.g., the precise range of insulator cracks). Compared with traditional image processing methods based on edge detection and threshold segmentation, deep learning methods exhibit stronger robustness to illumination variations and background interference, with significantly improved detection accuracy and speed [13,14].
Against the strategic backdrop of new digital infrastructure development, the ubiquitous power Internet of Things, and other power big data initiatives, the integrated advantages of UAV inspection and machine vision technologies continue to gain prominence. Through the efficient processing of massive inspection data by deep learning models, the workload of manual inspection can be significantly reduced, while the efficiency of equipment inspection and the accuracy of defect recognition are improved, thus driving power operation and maintenance toward automation and intelligent transformation [15,16]. For instance, visual models based on the Transformer architecture leverage a global attention mechanism to better capture correlations between transmission line components (e.g., spatial positional constraints between conductors and insulators), further enhancing detection stability in complex scenarios. The deployment of lightweight models (e.g., MobileNet, ShuffleNet) enables real-time on-edge analysis on UAVs, facilitating the “inspect-while-analyze” immediate response paradigm. Deep learning-based UAV autonomous inspection technology not only significantly reduces the operation and maintenance costs of power grid enterprises but also enhances the ability to identify tiny defects in aerial images (e.g., loose bolts and broken conductor strands), thereby providing more reliable technical safeguards for the safe operation of transmission lines [17,18]. Therefore, in-depth research on this technology holds great practical significance for comprehensively improving the efficiency and quality of power grid equipment inspection, as well as reducing the workload intensity and safety risks faced by inspectors.
To date, significant progress has been made in intelligent detection technologies for transmission line inspection images based on image perception. However, due to the wide variety of power components and their significant differences in morphological structures, a universal algorithm capable of effectively detecting all power components in images has not yet been developed. Therefore, in order to select suitable deep learning algorithms for the detection of specific power components, it is urgent to conduct a systematic analysis and summary of the detection methods for different power components. Against this backdrop, this paper reviews and analyzes a large number of intelligent detection methods for aerial images based on deep learning, comprehensively explores the application practices of deep learning in UAV inspection scenarios, and provides researchers engaged in smart grid automated inspection research with valuable theoretical and practical references to support their related work.
In terms of application practice, the paper highlights deep learning-based intelligent inspection applications, comprehensively summarizing them across six dimensions: power insulator and defect detection, transmission tower detection, power line feature extraction, metal fitting and defect detection, heating fault diagnosis of power components, and safety hazard detection in power scenarios. Meanwhile, it conducts a quantitative evaluation of all selected research papers, establishing a comparative framework from dimensions including model performance, dataset applicability, and engineering practicability to provide an objective reference for technology selection. Additionally, the paper systematically lists public datasets available for research on intelligent transmission line inspection.
In response to current challenges, the paper identifies five future research directions: promoting the deep integration of deep learning with multiple learning paradigms to enhance model adaptability; strengthening the deep integration of multimodal data (visible light, infrared, laser, etc.) to enrich feature dimensions; exploring the collaborative application of object detection algorithms and large language models to improve scene understanding capabilities; constructing a cloud-edge-end collaborative integration architecture to optimize data processing efficiency; and developing multi-agent collaborative inspection and cluster optimization technologies to enhance the coverage and reliability of large-scale inspections. These directions provide a clear development path for research in the field.
With the continuous deepening of smart grid digital transformation, image acquisition devices based on high-definition cameras have undergone continuous iteration and upgrading. Meanwhile, supporting technologies such as image processing, machine vision, and deep learning have become increasingly mature, laying a solid foundation for the intelligent advancement of transmission line inspection. Currently, UAVs equipped with visible-light cameras, infrared cameras, and Light Detection and Ranging (LiDAR) have been widely applied in transmission line inspection scenarios. Intelligent transmission line inspection integrates the advantages of computer vision and deep learning technologies, effectively enhancing the safety, reliability, and operational efficiency of transmission systems [19,20]. Fig. 1 illustrates the workflow of image perception-based intelligent inspection for transmission lines, which primarily includes three core links: data acquisition, deep learning network model training and optimization, transmission line target detection and fault diagnosis.

Figure 1: Intelligent inspection flowchart for transmission lines.
Nguyen et al. [21] systematically constructed a technical framework for the intelligent inspection of transmission lines. They not only comprehensively sorted out the applicable scenarios and limitations of traditional inspection methods (manual inspection and helicopter inspection) and emerging technologies (UAV inspection and robot inspection) but also conducted in-depth analyses of the characteristics of various data sources—including visible light images, infrared thermal imaging data, and Light Detection and Ranging (LiDAR) point clouds—and their application value in inspection. Crucially, this study prospectively summarized the research status of deep learning in power inspection, clarified the potential of models such as CNNs and Recurrent Neural Networks (RNNs) in tasks like image classification, target detection, and defect recognition, and laid a theoretical foundation for the subsequent in-depth integration of deep learning technology with inspection scenarios.
Yang et al. [22] focused on the core issues of image detection technology for transmission lines inspection. They first categorized typical detection tasks in inspection aerial images, such as power component positioning, defect recognition, and obstacle detection. Then, by comparing the technical parameters and actual performance of different inspection platforms (fixed-wing UAVs, multi-rotor UAVs, tethered balloons, etc.), they analyzed the differences in adaptability of various platforms in complex terrains (mountainous areas, jungles, cross-river regions) and harsh weather conditions (heavy rain, fog, strong electromagnetic interference). In addition, this study systematically summarized the functional modules and detection processes of multiple existing implemented automatic inspection systems, pointed out the shortcomings of current systems in image blur processing, small target missed detection, and algorithm real-time performance, and proposed future research directions such as optimizing the robustness of machine vision algorithms and building a multi-sensor fusion system—thus providing practical technical paths for engineering applications.
Liu et al. [23] focused on the data level and comprehensively reviewed deep learning-based data analysis technologies for transmission lines inspection. They divided visual detection into two core tasks: power component recognition and fault diagnosis. They compared in detail the accuracy and efficiency of different deep learning models (such as YOLO, SSD, Faster R-CNN, and Mask R-CNN) in detecting components like towers, insulators, conductors, and metal fittings. In parallel, they analyzed the advantages and disadvantages of two approaches to defect recognition tasks: direct detection and indirect recognition based on component detection. Notably, this study insightfully identified the core challenges faced by current technologies: in data quality, issues include sample imbalance, high labeling costs, and severe data noise in harsh environments; in small target detection, tiny components such as bolts and pins occupy a small pixel proportion in images and have blurred features, which makes it difficult to improve detection accuracy; in embedded applications, there is a conflict between the high computing power required by deep learning models and the hardware limitations of terminal devices such as UAVs and robots; in evaluation benchmarks, the lack of unified datasets and evaluation indicators hinders horizontal comparison of different research results. These insights have clarified the key directions for subsequent research.
Liu et al. [24] further focused their research on insulators—a critical component—and traced the evolutionary trajectory of defect detection methods from a technical standpoint. They compared the fundamental methodological differences between traditional image processing methods (which rely on manually designed features such as color, shape, and texture) and deep learning methods (end-to-end learning that automatically extracts high-level features), noting that traditional methods have limitations such as poor adaptability and weak generalization in complex scenarios. In contrast, deep learning methods, by leveraging their strong feature learning capabilities, demonstrate higher accuracy and robustness in tasks such as insulator fragment dropping, self-explosion, and contamination level evaluation. Additionally, this study also addressed unique challenges in insulator defect detection, such as variations in defect characteristics among insulators of different materials (porcelain, glass, composite) and image quality degradation under harsh weather conditions (e.g., icing, thunderstorms), and proposed future development trends including model improvement integrated with domain knowledge and lightweight network design.
Luo et al. [25] proposed a “full-process intelligent” technical framework centered on the mainstream scenario of UAV inspection. Their research encompasses the entire workflow of UAV inspection: in the path planning phase, they analyzed the autonomous obstacle avoidance algorithm based on environmental perception and the strategy for generating globally optimal paths; in the trajectory tracking phase, they discussed high-precision positioning and attitude control technologies which are designed to ensure the stability of inspection routes; in the fault detection and diagnosis phase, they summarized the application of deep learning models in real-time defect recognition and the method of multi-modal data fusion (visible light and infrared images) to improve diagnostic accuracy. Furthermore, this study identified the practical challenges faced by UAV inspection, such as limited inspection range caused by battery capacity constraints, interference from strong electromagnetic environments on communication signals, and difficulties in autonomous takeoff and landing in complex terrains. It also proposed potential solutions including lightweight energy systems, anti-interference communication protocols, and intelligent auxiliary takeoff and landing systems, thereby providing a systematic approach for the full-process optimization of UAV inspection.
Although the aforementioned review studies all center on intelligent transmission line inspection, each has distinct research focuses: Nguyen et al. concentrated on constructing technical frameworks; Yang et al. focused on core issues of image detection technology for inspection; Liu et al. emphasized data-level analysis; Liu et al. focused on the evolutionary trajectory of insulator defect detection methods; and Luo et al. centered on the full-process framework of UAV inspection. By comparison, this review takes deep learning algorithms as the core backbone, closely integrating the technical principles, application scenarios, and future directions of intelligent inspection to form a closed-loop analytical logic of algorithm-application-evaluation-outlook, and thus highlights the core driving role of deep learning in intelligent inspection.
This review systematically categorizes six application directions (e.g., power insulator defect detection, transmission tower detection, etc.), covering the full-spectrum demands in transmission line inspection from component-level to scenario-level, and from single-target to multi-target tasks, thus forming a more complete application system. Such an integration not only reflects comprehensive coverage of domain applications but also responds to the comprehensive detection needs in complex engineering scenarios through directions like multi-target recognition and defect detection, filling the gaps in single-scenario research. Existing reviews mostly focus on technical combing, with a lack systematic evaluation of research results and integration of data resources. This paper conducts a comprehensive evaluation of retrieved and screened papers regarding deep learning-based intelligent inspection applications, quantifying the performance of different studies through an evaluation system and providing an objective reference standard for researchers in the field. Although the aforementioned reviews mention technical challenges, their discussions on future directions are mostly limited to a single technical dimension. This review proposes five future directions from the perspective of interdisciplinary integration: combining deep learning with multiple learning paradigms, multimodal data, large language models, cloud-edge-end architectures, and multi-agent technologies, and thereby breaks through through the traditional idea of single technical upgrading and emphasizes the innovative path of collaborative integration.
3 Intelligent Inspection Based on Deep Learning Algorithms
In the field of computer vision, the emergence of deep convolutional neural networks has equipped deep learning with robust feature extraction capabilities, allowing it to extract highly complex and abstract feature representations from massive image datasets. This capability provides robust support for key tasks such as image classification, object detection, and semantic segmentation. Moreover, leveraging the availability of large-scale public datasets and the computing power of high-performance hardware, researchers have developed a series of advanced backbone networks and object detection networks, driving breakthroughs in deep learning-based image classification, object detection, and segmentation technologies within computer vision. Notably, numerous researchers have focused on developing deep learning frameworks, fostering a technical ecosystem typified by Darknet, MXNet, Convolutional Architecture for Fast Feature Embedding (Caffe), TensorFlow, Keras, PaddlePaddle, and PyTorch. These frameworks, with their comprehensive functionalities and user-friendly design, offer strong technical guarantees for the large-scale deployment and cross-domain application of deep learning technologies.
This section provides a systematic overview of deep learning technologies for image-based intelligent inspection of transmission lines, classified according to object detection networks, semantic segmentation networks, and other methods.
3.1 Object Detection Algorithms
In recent years, deep learning-based object detection has emerged as a research hotspot in machine vision. Since the advent of AlexNet in 2012, researchers have driven innovations in feature extraction networks by reconstructing architectures and designing novel modules, achieving a series of landmark technological breakthroughs. Specifically, early networks such as Zeiler-Fergus Network (ZF-Net) and Visual Geometry Group Network (VGG-Net) effectively increased the depth of deep CNNs while controlling computational complexity by reducing filter size. GoogLeNet innovatively introduced the Inception module, which significantly lowered model computational costs through multi-scale feature fusion. Residual Neural Network (ResNet), meanwhile, overcame the training bottleneck of deep networks via cross-layer connections (residual structures), greatly extending network depth and thus becoming the most widely adopted backbone network to date. Building on this foundation, researchers have further optimized and upgraded these frameworks, successively developing derivative networks such as Residual Neural Network with Next-generation architecture (ResNeXt), Inception-ResNet, Res2Net, and Residual Neural Network with Nest architecture (ResNeSt), which continuously enhance feature extraction performance. Additionally, to address the computational constraints of mobile devices, some researchers have focused on lightweight solutions, developing backbone networks like MobileNet, Xception, and ShuffleNet. These networks leverage technologies such as depth-wise separable convolution and channel shuffling to drastically reduce model size while preserving core performance, specifically addressing the requirements of portable applications.
As deep learning theories continue to advance, a wealth of high-performance object detection algorithms have been developed in academia and industry. Based on distinct detection mechanisms in the model inference process, these algorithms can be clearly categorized into three types: two-stage detectors, one-stage detectors, and anchor-free detectors. These three categories differ significantly in detection accuracy, processing speed, and applicable scenarios.
3.1.1 Two-Stage Detection Algorithms
Two-stage object detection algorithms achieve object localization and classification through two sequential steps: first generating candidate regions, then classifying these regions. Typical algorithms include Faster R-CNN and Spatial Pyramid Pooling Network (SPPNet); the latter enhances feature extraction efficiency via spatial pyramid pooling; Cascade R-CNN improves detection accuracy by iteratively refining candidate boxes; Region-based Fully Convolutional Network (R-FCN) integrates the advantages of fully convolutional networks and region-based methods for detection. Additionally, as a representative extension of Faster R-CNN, Mask R-CNN not only realizes object detection but also generates instance segmentation masks, thereby providing more detailed object information.
Faster R-CNN, a mainstream high-accuracy object detection algorithm, is among the most widely adopted two-stage frameworks. Its architecture, as shown in Fig. 2, includes three core components: feature extraction network, RPN (Region Proposal Network), and Region of Interest (RoI) Pooling. The feature extraction network generates a shared feature map via convolutional operations on the input image; the RPN slides a sliding window over this map to produce candidate regions, with its classification branch identifying object-containing anchor boxes and regression branch refining their positions and sizes; RoI Pooling converts these variable-sized regions into fixed-size features, which are then processed by fully connected layers for object classification and fine-grained bounding box regression. This design allows Faster R-CNN to simultaneously achieve accurate object classification and precise bounding box localization, thus achieving detection accuracy far exceeding many contemporary single-stage algorithms. As a result, researchers widely apply the algorithm and its improved variants to identify power components and detect defects in aerial images.

Figure 2: Faster R-CNN architecture.
Reference [26] employed Faster R-CNN for insulator detection and defect localization in aerial images. The trained model performed excellently on the test set, achieving a detection accuracy of 94% and a recall rate of 88%, and a detection speed of 10 Frames Per Second (FPS). To tackle this limitation and address the missed detection of small defects in complex aerial backgrounds, reference [27] proposed a cascaded insulator defect recognition framework integrating global detection and local segmentation. By introducing ResNeXt-101, Feature Pyramid Network (FPN), and the Online Hard Example Mining (OHEM) strategy into Faster R-CNN, this method improved insulator defect detection accuracy to 97.3%. Reference [28] presented an insulator defect detection method based on ResNeSt and multi-scale RPN. Through optimizing the ResNeSt architecture, implementing adaptive feature fusion, and integrating multi-scale RPN design, the method effectively enhanced small-target detection capability and resolved the missed detection of small defects in complex scenes. For more complex high-resolution UAV scenarios facing challenges of complex background interference and small-target detection, reference [29] proposed two improved Faster R-CNN variants (Exact-RCNN and CME-CNN). Experiments demonstrated that both variants can effectively detect insulator defects in high-resolution UAV imagery. To further improve multi-defect detection accuracy, reference [30] proposed a method based on the Multi-Geometric Reasoning Network (MGRN). By incorporating Spatial Geometric Reasoning (SGR), Appearance Geometric Reasoning (AGR), and Parallel Feature Transformation (PFT) submodules into the Faster R-CNN framework, the approach significantly enhanced detection accuracy when defect samples are limited. Reference [31] proposed a Faster R-CNN-based pointer meter recognition method, representing the first attempt to apply deep learning to meter positioning and reading inference. By calculating the geometric angle between the pointer and scales, this method effectively overcame recognition challenges under complex lighting and angles. Reference [32] put forward a small-sized insulator defect detection method based on I2D-Net. This method integrates the Three-Path Feature Fusion Network (TFFN), Enhanced Receptive Field Attention (RFA+) Module, and Context Perception Module (CPM) to strengthen feature extraction and localization capabilities for small-scale defects. Reference [33] proposed the GFRF-RCNN algorithm for detecting small-scale power components. By replacing the backbone network with AdvResNet50 and adopting the Guided Feature Refinement (GFR) method to enhance small targets, this GFRF-RCNN (an improved Faster R-CNN variant) effectively enabled the detection of small insulators, shock hammers, bolts, and other components.
In addition, two-stage detection algorithms such as Cascade R-CNN, R-FCN, and Mask R-CNN have also been widely applied to intelligent inspection tasks for transmission lines. Among them, Cascade R-CNN significantly enhances the detection accuracy of subtle defects through multi-stage iterative refinement of candidate boxes, rendering it particularly suitable for inspection scenarios in complex terrains [34]. R-FCN integrates the advantages of fully convolutional networks and regional features, ensuring stable recognition of small-scale power components while maintaining high detection speed. Mask R-CNN extends object detection with an instance segmentation function, which can accurately delineate the contours of power components like insulator strings and conductors, thereby providing more detailed spatial information for defect localization [35]. These algorithms effectively complement Faster R-CNN and its variants, thus collectively driving the evolution of intelligent transmission line inspection technology from single defect detection to full-component, multi-dimensional state assessment.
3.1.2 One-Stage Detection Algorithms
One-stage object detection methods occupy a prominent position in the field of computer vision due to their concise and efficient architecture. Unlike two-stage methods, they eliminate the region proposal generation process and subsequent feature processing operations, thereby greatly simplifying the detection workflow. Leveraging the powerful feature extraction capabilities of CNNs, these methods directly and simultaneously predict object classes and positions from predefined spatial grids across the entire input image, markedly enhancing detection speed and making them particularly suitable for scenarios with high real-time requirements. Among the numerous one-stage algorithms, typical representatives include SSD [36], YOLO [37], and RetinaNet [38]. Among these, the YOLO algorithm is particularly highly favored for its outstanding performance in engineering applications. It achieves efficient end-to-end inference by partitioning the input image into spatial grids, with the grid containing an object’s center tasked with detecting that object. By comparison, SSD detects objects of varying sizes via multi-scale feature maps, while RetinaNet introduces focal loss to address class imbalance.
As a typical one-stage object detection framework, YOLO directly and simultaneously outputs the spatial location information and corresponding category labels of target objects from input images through an end-to-end inference process. Since YOLOv1 pioneered the one-stage detection paradigm in 2016, the series has undergone multiple rounds of iterative optimization and evolved to YOLOv11, forming an algorithmic system that addresses diverse scenario requirements [39]. A pivotal milestone in this evolution is YOLOv3, as shown in Fig. 3, which introduced several innovative advancements: it adopted Darknet-53 as its backbone network, boosting deep feature extraction capabilities via residual connections; simultaneously, it implemented a multi-scale prediction mechanism based on FPN principles. This mechanism operates on three feature maps of varying resolutions, each tailored to detect large-, medium-, and small-sized objects respectively, thereby significantly improving the insufficient small-object detection accuracy that plagued previous versions. With continuous algorithmic advancements, the YOLO series has developed multi-scale models spanning lightweight to high-performance variants, enabling flexible adaptation to diverse scenarios including mobile device deployment, real-time field monitoring, and high-precision industrial inspection. In engineering practice, numerous studies have verified that applying the YOLO algorithm to transmission line inspection image analysis enables efficient identification of key power components and various types of defects, providing robust technical support for the intelligent operation and maintenance of power systems.

Figure 3: The architecture of YOLOv3.
To enhance the accuracy and speed of insulator defect detection, reference [40] proposed a YOLOv4-based model improved via data augmentation and K-means anchor clustering. Experimental results indicate that the detection accuracy of the improved model is 37.2% higher than that of the original YOLOv4 algorithm; compared with benchmark algorithms such as SSD and Faster-RCNN, it exhibits significantly better robustness under varying lighting conditions and complex backgrounds. Reference [41] developed an enhanced YOLOv5 algorithm integrating the C2f module and SimAM attention mechanism. This model effectively tackles the low accuracy issue caused by small target proportions and complex backgrounds in insulator defect detection, thus making it more suitable for low-altitude UAV-based insulator defect detection scenarios. Reference [42] proposed MFI-YOLO, a multi-fault detection algorithm for insulators based on YOLOv8. By constructing the MSA-GhostBlock module to enhance feature extraction in complex backgrounds and designing the ResPANet residual pyramid structure to optimize multi-scale feature fusion, the algorithm successfully addresses the challenge of multi-scale target detection in complex scenarios. Reference [43] proposed IF-YOLO based on YOLOv10, which enhances small-target feature representation and extraction via the Group Collaborative Attention (GCA) module. This effectively alleviates issues such as low accuracy and frequent missed detections in UAV inspections of insulator defects. Reference [44] proposed the ID-YOLO algorithm by integrating the GConv, C3-GPF, MSIF, and WFIF modules into the YOLOv5 framework. This algorithm effectively resolves the problem of detecting multiple targets and small targets in complex backgrounds, satisfying the requirements of real-time power system inspections.
For foreign object intrusion detection in transmission lines, reference [45] proposed the KM-YOLO model based on an improved YOLOv5. By integrating the C3GC attention module into the backbone network, adopting a dynamic decoupled detection head, and using the SIoU loss function, this model enhances the detection capability for foreign objects such as bird nests and kites, thereby providing a practical solution for the safety monitoring of transmission lines. Reference [46] improved YOLOv5 by fusing the coordinate attention mechanism with FasterNet, and proposed the FusionNet model. This model can still achieve accurate target detection under severe weather conditions while balancing detection accuracy and the model’s lightweight requirements. Reference [47] developed a foreign object detection model based on an improved YOLOv5. By adopting the C3CG module to enhance feature extraction, utilizing Spatial Pyramid Dilated Convolution (SPD-Conv) to mitigate information loss during downsampling, and introducing the Simple Attention Module (SimAM) to improve the detection performance of small targets in complex backgrounds, the model achieves effective detection of small-scale foreign objects in complex transmission line scenarios. Another study by Wang et al. [48] proposed a foreign object detection model based on an improved YOLOv8. This model introduces the Efficient Channel Attention (ECA) attention mechanism to enhance inter-channel feature dependency and adds a small-target detection module in the detection head, improving detection accuracy while ensuring high speed, thus making it suitable for UAV intelligent inspection scenarios.
To address the challenge of rapidly and accurately identifying hardware defects in UAV inspection images, Zou et al. [49] proposed an improved YOLOv5-based method. By fusing the Convolutional Block Attention Module (CBAM) attention module with Omni-Dimensional Dynamic Convolution (ODConv), this method effectively enhances the detection accuracy of hardware defects in transmission lines. Shi et al. [50] developed SONet, a small-target detection network based on YOLOv8. This network employs the Multi-Branch Dilated Convolution Module (MDCM) to capture features across diverse receptive fields and replaces PANet with the Adaptive Attention Feature Fusion (AAFF) structure, thereby significantly boosting the detection accuracy of small-scale hardware defects in UAV inspection imagery. Liu et al. [51] proposed a two-stage cascaded RepYOLO model (an optimized variant of YOLOv5 integrated with RepVGG, Diverse Branch Block, and ECA modules) for the specific task of small-target pin loss detection. With an inference speed 4 times that of YOLOv5 and a 1.2% accuracy improvement, the model enables low-latency real-time deployment on resource-constrained edge devices.
To address the challenge of slow detection speed caused by the limited computing capacity of edge devices, Hou et al. [52] proposed the YOLO-GSS algorithm. This algorithm replaces the backbone network with G-GhostNet and optimizes the neck module with the S-FPN structure, thereby enhancing feature fusion and reducing overall model computational complexity. Han et al. [53] developed TD-YOLO based on YOLOv7-Tiny, which achieves a frame rate of 23.5 FPS on Jetson Xavier NX, thus providing a valuable reference for low-power edge device deployment in power inspection scenarios. Wu et al. [54] proposed the lightweight GMPPD-YOLO model. By implementing channel pruning and knowledge distillation to reduce the model size by 66.4%, it enables real-time high-precision detection on edge devices. Wang et al. [55] put forward a lightweight multi-type defect detection method based on YOLOv8, which effectively balances high detection accuracy and low-latency real-time inference performance. Xiang et al. [56] introduced the BN-YOLO algorithm for bird’s nest detection, by designing a lightweight C2f structure that increases the inference speed from 61 FPS to 83 FPS and thus solves the challenge of real-time detection in complex environments.
In summary, YOLO-based object detection methods for power transmission line inspection image perception have demonstrated significant advantages in scenarios such as insulator defect detection, transmission line foreign object intrusion detection, metal fittings defect identification, and edge device adaptation. Existing studies have effectively enhanced detection accuracy, speed, and adaptability to complex environments by optimizing backbone networks, upgrading feature fusion architectures, integrating attention mechanisms, and implementing lightweight optimization technologies. For instance, to address the long-standing challenge of small-target defect detection, detection performance is enhanced via modules like multi-branch convolution and adaptive feature fusion; to adapt to edge devices, real-time deployment is achieved via model compression and lightweight design. These methods provide diverse technical support for the intelligent operation and maintenance of power systems, with their specific performance metrics and application scenarios summarized in Table 1, serving as a valuable technical reference for subsequent academic research and field engineering practice.

Since the introduction of the anchor mechanism in Faster R-CNN, anchor-based two-stage and one-stage object detection algorithms have become mainstream paradigms in the field of object detection. However, such algorithms require the predefined generation of a large number of anchors, whose sizes must be tailored to the specific characteristics of the dataset—this inherently restricts the model’s generalization capability. In addition, the vast majority of anchors correspond to background regions, with only a small fraction overlapping with actual targets, which readily induces a severe positive-negative sample imbalance and thus leads to classification outcomes dominated by negative samples. To mitigate these drawbacks of anchor-based detectors, anchor-free object detection algorithms have gradually gained traction. Unlike their anchor-based counterparts, these methods eliminate the need for per-pixel anchor predefinition and instead perform direct object detection on input image pixels, which not only effectively reduces model complexity but also further enhances the model’s universality for multi-category object detection tasks. Based on different sample assignment strategies, anchor-free object detection approaches are primarily categorized into two types: keypoint grouping and center point regression.
For the object detection pipeline based on center-point regression, a feature extraction network is first utilized to extract feature maps. Subsequently, the detection head module predicts three core target properties: the center-point location, the scale of the bounding box, and the center-point offset. Finally, the post-processing module selects the optimal bounding boxes, thus accomplishing the object detection task. For instance, Law and Deng [57] proposed the CornerNet algorithm based on the anchor-free paradigm; this algorithm converts bounding box prediction into the detection and localization of corner point pairs. However, this method suffers from high computational complexity during the matching of valid corner point pairs and is susceptible to corner point pair mismatches, which consequently degrades the accuracy of the resulting bounding boxes. For the keypoint grouping-based object detection pipeline, a feature extraction network is first employed to extract feature maps. Subsequently, multiple keypoint prediction modules identify and localize keypoints (i.e., pixels that can characterize the typical features of targets), after which candidate bounding boxes are delineated based on these keypoints. Finally, object detection is achieved by regressing and aggregating these keypoints—including corner points, center points, or extreme points. Building upon CornerNet, Duan et al. [58] proposed the CenterNet algorithm, which introduces three sets of key points (i.e., top-left corners, bottom-right corners, and center points) to constrain the generation of bounding boxes. Specifically, the algorithm refines the predicted bounding boxes through the cascade of corner pooling and center pooling modules.
Compared with anchor-free algorithms based on keypoint grouping, those based on center-point regression eliminate the keypoint matching step, thereby enhancing the model’s overall performance. CenterNet and its improved variants exhibit high target detection accuracy and excellent real-time performance, having been widely deployed in the field of fault detection for power inspection equipment. To address the challenges of detecting small targets and low-contrast internal defects in X-ray images, Wang et al. [59] proposed a CenterNet-based improved architecture. By incorporating a cross-scale feature fusion module and a defect-region attention mechanism, this framework enhances the feature extraction capability for minute defects (e.g., core rod microcracks and fiber delamination). Additionally, the loss function is refined to mitigate the problem of defect sample imbalance. When benchmarked against conventional approaches—including threshold segmentation, Faster R-CNN, and the vanilla CenterNet—the improved model delivers state-of-the-art performance across key evaluation metrics: mean Average Precision (mAP), defect miss-detection rate, and detection speed. Building upon the CenterNet framework, Meng [60] proposed an improved model dubbed CenterNet plus by introducing a context clustering feature enhancement module, incorporating a multi-scale defect detection branch, and refining the loss function. Comparative experiments against the original CenterNet were performed, and the results demonstrate that the improved model elevates the mAP of insulator defect recognition by 8%–12%, cuts the missed detection rate of small defects by over 15%, and retains a detection speed that satisfies the real-time demands of UAV-based inspection.
Two-stage detection algorithms, with their unique step-by-step processing mechanism, exhibit significant advantages such as accurate localization, shareable computational overload, and module-individually optimizable parameters. This design enables them to effectively capture the detailed features of targets through the dual processes of region proposal and fine-grained classification, thereby achieving high detection accuracy in complex scenarios. However, due to the requirement that candidate regions and target classification must be completed in stages, the model training process requires collaborative optimization across multiple modules—resulting in a significantly prolonged training cycle. Meanwhile, the dual computational flow in the inference stage also drastically reduces detection speed, which makes it difficult to satisfy the requirements of high-real-time scenarios.
One-stage detection algorithms, leveraging an end-to-end detection process, exhibit significant advantages such as fast inference speed, strong resistance to background interference, and a robust capability to learn generalized target feature representations, and thus finding extensive application in scenarios with high real-time requirements. However, these algorithms often struggle with small-target detection: due to the low pixel occupancy ratio and insufficient feature information of small targets, small targets are easily obscured or misclassified in complex backgrounds. Additionally, fine-grained target details tend to be lost during deep feature extraction, leading to low detection accuracy and high missed detection rates. This limitation is particularly prominent in scenarios with dense small targets, such as UAV-based power line inspection and aerial remote sensing image analysis, and has become a key bottleneck limiting their further application [61,62].
Anchor-based algorithms generate candidate regions using predefined anchor boxes to achieve direct target classification and bounding box coordinate regression. By incorporating prior knowledge of typical target scales and aspect ratios into the detection pipeline, such algorithms can achieve higher detection recall rates, especially for small-target detection tasks. However, manual hyperparameter design (e.g., predefined anchor scales and aspect ratios) tends to induce severe positive-negative sample class imbalance. In contrast, anchor-free algorithms eliminate the process of anchor box generation, thus providing a more flexible algorithmic solution space while reducing computational overhead. Nevertheless, they often face issues such as severe foreground-background category imbalance, semantic ambiguity, and unstable training convergence. Despite these inherent limitations, anchor-based algorithms such as Faster R-CNN, SSD, and YOLO remain widely deployed in practical computer vision applications. The advantages and disadvantages of these algorithms are summarized in Table 2.

3.2 Semantic Segmentation Algorithms
The core of semantic segmentation lies in assigning corresponding category labels to each pixel in an image. State-of-the-art deep learning-based semantic segmentation networks must simultaneously fulfill two core tasks: first, extracting hierarchically structured multi-scale target features in a top-down manner, and second, reconstructing image spatial dimensions in a bottom-up fashion to enable pixel-wise classification. Such networks typically adopt an encoder-decoder architecture: the encoder module performs hierarchical feature extraction via backbone networks (e.g., VGG, ResNet), while the decoder module achieves precise image size restoration through upsampling operations and skip connections (e.g., cross-layer feature fusion mechanisms). Classic algorithms including Fully Convolutional Networks (FCN), U-Net, SegNet, and DeeplabV1-V3+ leverage the robust feature learning capability of deep convolutional neural networks to model global and local image context information, thereby enabling high-accuracy processing of diverse pixel-level segmentation tasks (e.g., semantic and instance segmentation).
Since Chen et al. [63] proposed DeepLab v1, dilated convolution modules have become standard components in advanced semantic segmentation networks; these networks effectively fuse multi-level deep and shallow feature maps by configuring varying dilation rates. Focusing on deep learning-based pixel-level image segmentation technologies, this section summarizes mainstream algorithms across different architecture categories and conducts an in-depth analysis of their practical application value in power component recognition and defect detection for intelligent transmission line inspection. For instance, the power line segmentation algorithm for multi-spectral UAV images proposed by Hota et al. [64] effectively resolves the confusion between power lines and background in complex terrains by fusing visible light and infrared features, providing precise power line contours for hidden danger investigation and early warning in transmission line corridors. The insulator instance segmentation and defect detection scheme by Antwi-Bekoe et al. [65] precisely identifies damaged areas of umbrella skirts through pixel-level classification, offering a quantitative basis for insulation performance evaluation. Additionally, relevant studies have achieved notable results: Ye et al. [66] realized substation insulator defect detection based on CenterMask, while Hu et al. [67] completed ice-cover segmentation on transmission lines using an improved U-Net combined with Generative Adversarial Networks (GANs).
These algorithms have been validated in scenarios such as UAV inspection and robotic autonomous operation and maintenance, with their algorithmic characteristics, applicable scenarios, and segmentation performance comprehensively verified. This practical validation further provides key technical support for shifting transmission line inspection from traditional manual patrols to intelligent, unmanned monitoring.
Beyond the traditional intelligent inspection of transmission lines based on object detection and semantic segmentation techniques, advanced technical approaches leveraging Transformer architectures, 3D point cloud processing, lightweight network models, and GANs are gradually improving the environmental adaptability and operational efficiency of power line inspection scenarios. From the technical dimensions of global feature modeling, 3D spatial perception, edge deployment optimization, and data augmentation, these four technologies have specifically addressed key technical challenges in transmission line inspection, such as complex scenario adaptation, high-precision geometric measurement, edge-side real-time deployment, and defect sample scarcity, and thus providing multiple technical pathways for intelligent power grid operation and maintenance.
(1) Transformer, through its ability to efficiently aggregate global spatial feature information via multi-head self-attention mechanisms, has overcome the limitations of local receptive field-based feature modeling in traditional CNNs within transmission line inspection. Its core advantage lies in modeling long-distance inter-component correlations, which enables more comprehensive scene understanding—such as topological spatial relationships between conductors and transmission towers, or between metal fittings and insulator strings—effectively addressing complex background interference. For instance, the end-to-end insulator string defect detection method in complex backgrounds proposed by Xu et al. [68] employs Vision Transformer (ViT) to capture global features of insulators amid cluttered backgrounds, boosting defect recognition accuracy to 91.7% and significantly reducing false detection rates caused by vegetation occlusion. Cheng and Liu [69] realized high-precision power line insulator defect detection by improving the Detection Transformer (DETR), precisely locating damaged umbrella skirt areas via bidirectional attention mechanisms, which reduced the missed detection rate by 15% relative to traditional methods. Additionally, Shi et al. [70] integrated the strengths of CNN and Transformer to resolve power line detection challenges under occlusion, improving accuracy by 20% in cross-line scenarios—fully demonstrating the strong environmental adaptability of Transformer-based models in complex inspection scenarios.
(2) 3D point cloud technology breaks through the planar spatial constraints of 2D images via depth information reconstruction and fusion, providing accurate three-dimensional spatial cognition for fine-grained transmission line inspection tasks. Using Structure-from-Motion (SfM) techniques with monocular or binocular cameras or high-precision ranging via LiDAR, a high-precision 3D model of the transmission line corridor (incorporating spatial coordinates and key physical attributes) can be constructed. Chen et al. [71] proposed an insulator extraction method from UAV LiDAR point clouds based on multi-type, multi-scale feature histograms, achieving 96.3% insulator extraction accuracy in high-density point clouds, which outperforms traditional 2D image-based methods. Ni et al. [72] utilized UAV LiDAR to detect and predict tree-related risks to transmission lines, calculating the minimum distance between trees and conductors via point cloud modeling, which enabled 7-day advance warnings of potential hazards, representing a 5-fold efficiency improvement compared with traditional manual patrols.
(3) Lightweight network architectures and model compression technologies specifically tackle the deployment bottlenecks of deep neural networks on resource-constrained edge devices (e.g., UAV terminals, inspection robots) [73]. As the depth of detection/segmentation networks increases, model parameter volume and computational overhead surge exponentially, making it difficult to meet the strict real-time inference requirements of on-site power line inspections. Zhao et al. [74] proposed a dynamic supervision knowledge distillation-based method for classifying transmission line bolt defects, transferring “discriminative knowledge” from complex teacher models to lightweight student models. While maintaining 92% accuracy in bolt looseness detection, the model achieved a 60% size reduction, adapting to mobile terminal computing constraints.
(4) Generative Adversarial Networks tackle the critical challenges of scarce defect samples and insufficient environmental scene robustness in transmission line inspection through adversarial training between generators and discriminators. Wang et al. [75] generated UAV aerial images of high-voltage transmission line components using a multi-level GAN, synthesizing photorealistic insulator and conductor defect samples via multi-scale feature fusion and style transfer; this effectively expanded small datasets, elevating component recognition accuracy by 18%. Wu et al. [76] utilized a multi-scenario diverse sample generation model for detecting foreign object intrusions, enhancing environmental adaptability by synthesizing foreign object samples under varying lighting and weather conditions, thereby elevating foreign object detection accuracy rates from 72% to 90%. Zhang et al. [77] integrated GAN-based image generation with deep defect detection networks for substation equipment defect identification, using GAN to synthesize diverse defect samples (e.g., casing damage, overheated joints), resolving the long-standing issue of real-world defect data scarcity in substation scenarios.
4 Applications of Intelligent Inspection Based on Deep Learning
With the rapid advancement of machine vision and deep learning fusion technology, deep learning-driven target detection methods have been widely applied in the intelligent inspection of transmission lines. Numerous researchers have conducted in-depth studies on various detection networks, proposed scenario-adaptive improved algorithms for typical inspection tasks, and attained high-precision and real-time detection results that meet engineering application requirements. This section systematically explores the application of computer vision and deep learning technologies in power component identification and transmission line fault diagnosis (Fig. 4), reviewing the following six aspects: detection of power insulators and defects, detection of transmission towers and structural defects, power line feature extraction and spatial positioning, detection of metal fittings and defects, diagnosis of heating faults in power components, and safety hazard detection in power scenarios.

Figure 4: Different types of power components and defects.
4.1 Detection of Power Insulators and Defects
As a core insulating component in transmission lines, insulators are irreplaceable for ensuring the stable operation of power systems. These components are characterized by large quantities, wide coverage, and diverse types, including porcelain, glass, and composite insulators, each adapted to transmission lines of specific voltage levels and environmental conditions. Long-term exposure to complex and variable outdoor environments subjects insulators to dual stresses from natural and anthropogenic factors, gradually degrading their mechanical and electrical performance. Extreme weather conditions (e.g., strong winds, heavy rainfall, thunderstorms, extreme high temperatures, and frigid cold), combined with accumulated contaminants such as industrial dust and bird droppings. These include insulator string detachment, self-explosion of glass insulators, surface cracking in porcelain insulators, and surface pollution accumulation—a common defect across all insulator types. If left undetected or unaddressed in a timely manner, such defects may, at best, degrade the insulation performance of transmission lines; at worst, they could trigger severe faults like short circuits and tripping, directly undermining the continuous and stable power supply capacity of transmission lines.
Accurate and efficient monitoring of insulator operating status has thus become pivotal to ensuring the safe and stable operation of power systems. To enhance the efficiency and accuracy of insulator on-line monitoring in aerial inspection scenarios, researchers have actively adopted mainstream deep learning-based object detection algorithms such as Faster R-CNN, SSD, and YOLO. Tackling challenges in aerial images—including small-scale insulator targets, complex backgrounds, and indistinct defect features—they have strengthened discriminative feature extraction for key defect regions by integrating attention modules, improved recognition accuracy for insulators and defects of varying sizes through multi-scale feature fusion, optimized algorithm deployment and inference speed on mobile devices via lightweight network design, and elevated defect detection rates using multi-stage detection strategies. In-depth research on insulator target recognition and defect detection in UAV aerial images has achieved substantial technical progress that meets engineering application standards.
(1) Attention mechanism, which highlights regions of interest via dynamic weighting, has been widely applied in fields such as target tracking, image recognition, and image classification. SENet compresses 2D feature maps through the Squeeze-and-Excitation (SE) mechanism, enhancing target feature representation and enabling the model to focus on critical image features. To this end, Zhang et al. [78] integrated SENet into YOLOv5 to develop SE-YOLOv5. Compared with the original YOLOv5, SE-YOLOv5 increased the detection accuracy of insulator defects by 1.9%. For detecting multiple insulator defects in aerial images, Kang et al. [79] adopted a hybrid self-attention and convolution mechanism (a mixed model of self-attention and convolution, ACmix) to prioritize processing key target information, enabling differentiation of defects such as spontaneous explosion, contamination, and damage. Overall, attention mechanism modules can enhance feature representation in complex backgrounds, thereby improving the saliency and detection accuracy of inspected objects.
(2) Multi-scale feature fusion is a core feature enhancement technique in deep learning-based detection and segmentation models that integrates feature maps extracted from different network layers and at varying spatial resolutions. In deep learning networks, as layer depth increases, high-level features can capture richer semantic information but suffer from inherent drawbacks such as degraded spatial resolution and diminished fine-grained detail perception capability. This trade-off between semantic information richness and spatial detail retention has become a key bottleneck limiting the performance improvement of deep learning detection models. Specifically, increased network depth significantly enhances the semantic expressiveness of high-level features yet impairs the retention of fine-grained target structural details. In contrast, low-level features—while boasting high resolution and rich detail information—carry relatively weak high-level semantic information compared with deep-layer features. Thus, multi-scale feature fusion technology complements the advantages of features across different layers: it not only strengthens the understanding of overall target semantics but also improves the precision of local detail perception. This has become a core method for effectively boosting the performance of target detection and semantic segmentation models, especially for small-target and multi-scale defect detection tasks in power inspection scenarios.
Li et al. [80] proposed a multi-scale feature fusion detection algorithm based on an enhanced SSD framework. The core of this method involves using a residual attention network to extract multi-scale insulator defect features and achieving effective feature fusion via the cross-layer connection mechanism between deconvolution and multi-branch detection networks. Experimental data show that compared with the original SSD algorithm, its defect detection accuracy is improved by 2.7%, fully verifying the method’s advantages in handling complex defect patterns. In addition, Hao et al. [81] proposed a high-precision insulator defect detection method based on a modified YOLOv4 architecture. The method employs Cross Stage Partial-Residual Network with Split-Attention (CSP-ResNeSt) as the backbone network and embeds the SimAM attention mechanism into the multi-scale bidirectional pyramid network, thereby constructing an efficient cross-scale feature fusion path. This design not only addresses the long-standing challenge of accurately identifying small-scale insulator defects (e.g., micro-cracks and pinhole damage) but also significantly improves the overall detection accuracy and robustness of the model in real-world power inspection scenarios.
(3) Deep learning network lightweighting is a pivotal approach in modern deep learning, as it can significantly reduce computational resource consumption and model storage requirements through technical means such as network architecture redesign and reconstruction, parameter pruning, and low-rank decomposition—all while maintaining or even slightly improving the core detection performance of the model. In target detection, traditional algorithms often adopt classification networks like VGGNet and ResNet as backbone structures. While these backbone architectures excel at extracting high-dimensional semantic features, they have inherent limitations that hinder edge deployment: excessively large parameter volumes and high inference computational complexity, which make them ill-suited for resource-constrained edge devices (e.g., UAV onboard terminals and portable inspection edge nodes) and thus fail to meet the strict real-time detection requirements of field inspection. Therefore, advancing the lightweighting of deep learning detection networks holds great practical significance for enabling low-latency, real-time detection of power components in UAV aerial inspection images under on-site resource constraints.
Yang et al. [82] proposed a lightweight YOLOv3 detector based on spatial pyramid pooling and MobileNetV2 backbone. Compared with the baseline YOLOv3 model, it achieves a 98% reduction in model parameter volume and a nearly fivefold improvement in inference speed. To achieve fast and accurate localization and recognition of insulator defects in aerial images, Zan et al. [83] replaced the standard Conv-BatchNorm-LeakyReLU (CBL) convolutional blocks with MobileViT hybrid feature extraction modules to improve YOLOv4-tiny. Compared with the traditional YOLOv4-tiny, the defect detection accuracy of the improved algorithm is increased by 1.64%, and the insulator defect detection inference speed reaches 80.61 FPS under the same hardware test environment, making it highly suitable for on-site real-time monitoring tasks. In addition to depthwise separable convolutions and lightweight backbone architectures (e.g., SqueezeNet, MobileNet), researchers have also explored targeted model compression techniques (e.g., knowledge distillation, parameter quantization) to further reduce the deployment overhead of lightweight detectors on edge devices.
For example, Xie et al. [84] applied comprehensive pruning methods to eliminate redundant channels and convolutional kernels, while Zhao et al. [74] employed dynamic supervision knowledge distillation to train smaller models, effectively balancing accuracy and resource consumption. These research efforts have collectively promoted the development of high-efficiency deep learning-based power component detection systems. However, several key technical challenges still persist in practical engineering deployment. Future research can focus on exploring integrated hybrid optimization strategies to further elevate system performance, including: (1) fusing multi-modal lightweight techniques for synergistic efficiency improvement; (2) optimizing the accuracy-computation-resource trade-off for complex on-site inspection scenarios; (3) improving model cross-domain adaptability to diverse power components and defect categories.
(4) Multi-stage target detection is a hierarchical technical framework in computer vision-based object detection that involves gradually filtering, refining, and optimizing candidate target regions through successive cascaded detection stages to achieve more accurate target localization and classification. To tackle the key challenge of detecting tiny or small-scale insulator defects in complex aerial inspection images, researchers have employed network cascading to enhance detection precision. Tao et al. [85] developed a two-stage framework by cascading an insulator localization network (ILN) and a defect detection network (DDN) (Fig. 5), achieving 91% accuracy and 96% recall on the Insulator Defect dataset. Liu et al. [86] proposed a cascaded approach combining the enhanced YOLOv3-dense and YOLOv4-tiny, which boosted the overall insulator defect detection accuracy to 98.4%—a 2 percentage points absolute improvement over the 96.4% accuracy of the baseline YOLOv4 model. Beyond the pure cascading of multiple detection networks, hybrid approaches integrating detection and segmentation have been explored. For instance, Ling et al. [87] integrated Faster R-CNN (for coarse insulator localization and defect detection) and U-Net (for fine-grained pixel-level defect segmentation) into a unified cascaded multi-task pipeline, enabling simultaneous target localization and pixel-level defect segmentation.

Figure 5: The cascaded model based on LIN and DDN.
In addition to the aforementioned techniques, various specialized deep learning approaches have been developed for insulator recognition and defect detection. Li et al. [88] addressed the problem of inaccurate insulator positioning caused by arbitrary orientation in oblique aerial images by incorporating angular rotation parameters into axis-aligned bounding boxes to implement rotated bounding boxes, thus developing a directional insulator recognition algorithm that improves the alignment accuracy of arbitrarily oriented targets in aerial inspection images. Jiang et al. [89] improved defect recognition accuracy and environmental robustness through multi-scale and multi-level feature perception by implementing ensemble learning with heterogeneous SSD-based detection models. Xu et al. [68] leveraged the DETR network for end-to-end insulator defect detection, eliminating the need for anchor-based box preprocessing and prior anchor design and streamlining the overall detection pipeline. To mitigate the challenge of labeled data scarcity in practical power inspection scenarios, Shi and Huang [90] proposed a weakly supervised method for detecting insulator string drop defects, thereby significantly reducing the model’s reliance on large-scale high-quality labeled defect datasets.
Table 3 summarizes the relevant research achievements of power insulator recognition and defect detection methods based on deep learning.

4.2 Detection of Transmission Towers
As a core supporting infrastructure for overhead high-voltage transmission lines, transmission towers serve to support high-voltage power transmission lines and maintain the required safe clearance distances between conductors and the ground, as well as among the three-phase conductors. In post-disaster emergency repair, rapidly identifying and precisely locating damaged or collapsed transmission towers from large-scale UAV aerial inspection images is critical for efficient disaster damage assessment and timely emergency maintenance of transmission line systems.
Guo et al. [91] proposed a real-time detection method for transmission towers based on an improved YOLOv3. By simplifying the network structure—reducing layers from 106 to 23—the model minimized parameters. The improved model sacrificed 1% of detection accuracy (from 95.13% to 94.09%) but increased detection speed by 50% (from 20 FPS to 30 FPS), providing auxiliary decision-making information for power maintenance personnel during post-disaster repairs. Bian et al. [92] introduced a transmission tower detection method based on Tower R-CNN, which is adapted from the baseline Faster R-CNN framework through targeted parameter optimization and a hierarchical staged training strategy. Compared with the baseline Faster R-CNN, this model maintained the same 89.8% transmission tower detection accuracy while increasing the inference speed from 0.8 FPS to 5 FPS. To enable quantitative damage assessment of transmission towers after disasters, Hosseini et al. [93] developed an intelligent transmission tower damage classification and severity estimation method (IDCE) that uses four parallel ResNet-18 models for tower collapse detection, flame detection, damage classification, and damage state assessment. The model achieved accuracy rates of 94.18% for tower collapse detection, 97.14% for flame detection, and 98.78% for damage classification and state assessment, respectively.
4.3 Feature Extraction of Power Lines
Power lines serve as the core carriers for electric energy transmission and exhibit faint linear features amid cluttered and complex natural backgrounds (e.g., vegetation, buildings, and mountainous terrain). UAV patrol inspections typically capture sequential aerial images along power lines traversing transmission line corridors. Accurately extracting and identifying power lines from aerial images not only enables UAVs to implement autonomous obstacle avoidance but also guarantees the safety of low-altitude inspection flight paths. Yetgin et al. [94] proposed a power line classification model with VGG-19 and ResNet-50 as dual backbone networks. While this model could effectively classify whether power lines were present in images, it failed to achieve pixel-level or sub-pixel-level precise localization of power line segments, limiting its practical application in UAV obstacle avoidance.
Due to the extremely narrow pixel width of power lines in high-altitude UAV aerial images, power line detection tasks are essentially equivalent to fine-grained power line semantic segmentation, as both require pixel-level localization of linear targets—prompting many researchers to apply advanced semantic segmentation algorithms for power line detection. Nguyen et al. [95] developed LS-Net, a dedicated power line segmentation network comprising a fully convolutional feature extractor, target classifier, and line segment spatial regressor. This algorithm achieved an inference speed of 20.4 FPS, enabling near-real-time power line segmentation and detection in complex outdoor power line corridor scenarios. In the presence of complex background clutter and noise interference (e.g., overlapping vegetation, light glare, and motion blur), traditional filtering and gradient-based methods failed to capture continuous and complete power line structures. To address this limitation, Zhang et al. [96] proposed a power line segmentation method based on multi-scale convolutional features and power line-specific structural prior features, which utilized multi-scale context information and structural prior constraints to achieve accurate and efficient power line detection. Beyond the above supervised learning methods for power line segmentation, researchers have also explored low-label or label-free learning approaches to address the challenge of labeled data scarcity. For example, Lee et al. [97] proposed a weakly supervised learning method for power line localization based on convolutional neural networks, while Chen et al. [98] proposed SaSnet (Fig. 6), a self-supervised learning framework for real-time power line semantic segmentation.

Figure 6: The structure of SaSnet.
The relevant research achievements on deep learning-based power line feature extraction and segmentation are systematically summarized in Table 4, covering algorithm, application, framwork, GPU, and key performance metrics.

4.4 Detection of Metal Fittings and Defects
Power fittings are specialized metal connecting accessories used to connect and assemble various devices in power systems, serving to transmit mechanical loads (e.g., tension, compression) and electrical loads (e.g., current conduction) while providing specific protective functions. Transmission lines incorporate numerous types of power fittings, and researchers have primarily focused on target detection of key components such as vibration dampers, spacers, grading rings, and bolts in UAV aerial inspection images. To address the low efficiency and poor accuracy of traditional vibration damper detection methods under challenging aerial inspection conditions (e.g., uneven lighting, cluttered backgrounds, and small target sizes), Zhang et al. [99] enhanced Faster R-CNN for vibration damper recognition and rust defect detection by integrating the Retinex low-light image enhancement algorithm and FPN module, enabling intelligent detection of vibration damper damage and rust defects in complex on-site scenarios. Wang et al. [100] implemented a Faster R-CNN-based detection system via the TensorFlow framework, realizing accurate grading ring recognition and spatial localization in complex transmission line environments. Song et al. [101] introduced a fitting recognition and rust defect detection method based on the YOLO algorithm with dual attention mechanisms (fusing channel attention and Vision Transformer-based spatial attention). Zhang et al. [102] proposed an attention-guided multi-task detection network (AGMNet, Fig. 7), for grading rust severity and identifying abnormal states of power line fittings.

Figure 7: The structure of AGMNet.
Bolts are indispensable mechanical fasteners for connecting components in transmission lines, featuring a wide application range, large quantity, and high deployment density. Under long-term service, they are continuously subjected to extrusion, tensile stress, and torsional forces—conditions that inevitably lead to progressive defects such as pin loss, nut detachment, and nut loosening over time. These defects are prone to trigger connection failures, thereby posing a severe threat to the operational safety of transmission lines. To address the core challenge of detecting small-target bolt defects in UAV aerial inspection images, researchers have proposed multiple targeted optimization schemes: (1) adopting the Faster R-CNN model integrated with dual attention mechanisms; (2) enhancing the Faster R-CNN framework by embedding residual connection modules and deformable convolution layers; and (3) developing dedicated detection models based on the Self-Calibrated Convolutional Network (SCNet) and a bottom-layer feature-enhanced Feature Pyramid Network. Collectively, these approaches strengthen the extraction of discriminative multi-scale and multi-location visual features for small bolt defects, significantly reducing the rates of missed detections and false positives in practical bolt defect detection tasks.
In bolt defect recognition, the visual indistinguishability of defect regions leads to significant intra-class variations and subtle inter-class differences in data samples. Incorporating domain-specific prior knowledge to mitigate such feature ambiguities can effectively improve defect recognition accuracy. Zhao et al. [103] proposed a pin loss detection method based on AVSCNet, optimizing the network model through visual shape clustering during the training phase to enhance the discrimination of small pin defects. Zhao et al. [104] presented a bolt defect detection method that integrates semantic and structural prior knowledge, where bolt structural features are extracted via dedicated visual-semantic and visual-position knowledge networks. Li et al. [105] introduced a DETR-based bolt defect detection method, which fuses bolt visual features with domain prior knowledge through attention mechanisms to strengthen target-background separation. A range of optimization strategies—including attention mechanism integration, feature extraction network enhancement, multi-scale feature fusion, algorithm cascading, and prior knowledge incorporation—have effectively boosted the performance of target detection algorithms for bolt defect tasks. Additionally, some researchers have combined weakly supervised learning, gated graph neural networks, and graph knowledge reasoning networks with deep learning-based object detection frameworks for bolt defect detection [106], as illustrated in Fig. 8.

Figure 8: Dynamic graph knowledge reasoning network.
Table 5 systematically summarizes the relevant research achievements on deep learning-based detection of power fittings and their associated defects.

4.5 Diagnosis of Thermal Faults in Power Components
Infrared image recognition technology, featuring non-contact temperature measurement and all-weather operational capability, has emerged as a core means for detecting the thermal status of power equipment. It can accurately capture abnormal temperature rises induced by loose electrical joints, insulation aging, and other typical faults, thereby providing a critical basis for early warning of equipment failures. In the era of power big data, massive infrared inspection images contain abundant information regarding the operational conditions of power equipment. How to leverage the image segmentation and feature extraction capabilities of machine vision, coupled with the autonomous learning advantages of deep learning, to achieve automatic identification of thermal faults in power equipment has become an urgent requirement in the intelligent transformation of power operation and maintenance practices. This technology not only replaces the inefficient traditional manual inspection mode but also reduces the risk of power outage accidents by proactively detecting potential hazards at an early stage.
Xu et al. [107] proposed a thermal fault detection method for high-voltage lead joints based on an improved R-FCN framework. By adopting ResNet-50 as the backbone feature extraction network, the optimized model achieved an 80% detection accuracy for thermal faults in high-voltage lead joints—representing an 8 percentage points improvement over the original R-FCN algorithm. Li et al. [108] proposed a detection method for thermal anomalies in power equipment using a lightweight YOLOv4-tiny; the structure of the improved YOLOv4-tiny algorithm is shown in Fig. 9. To enhance YOLOv4-tiny, they introduced a Global Information Aggregation Module (GIAM), an Improved Spatial Transformer Network (ISTN), and a feature enhancement and fusion network. This optimized model fully integrates feature information from key regions, rotated targets, and high-level semantic features, achieving a 92.66% detection accuracy and a real-time inference speed of 107 FPS on the test set. Wang et al. [109] proposed a lightweight YOLOv5-based recognition method for power equipment infrared images. To enhance YOLOv5, they introduced Ghost convolution, an attention mechanism module, and an Efficient Intersection over Union (EIOU) loss function—resulting in an optimized model that achieved a 93.8% average recognition accuracy for power equipment infrared images, with its parameter count reduced to only 80% of the original YOLOv5. This innovation provides new insights for deploying deep learning models on mobile terminals and UAVs in power inspection scenarios. Zhou et al. [110] proposed a lightweight U-Net-based method for thermal fault detection and segmentation of power equipment. By incorporating a lightweight inverted residual module into the encoder to improve the standard U-Net, the optimized model has a parameter size of merely 0.88 MB—significantly smaller than that of the standard U-Net (13.4 MB) and DeepLab v3+ (40.35 MB). This method not only facilitates the integration of segmentation and recognition tasks but also enables the deployment of deep learning models on mobile devices, thereby providing technical support for edge computing devices to participate in on-site intelligent recognition of power equipment.

Figure 9: Structure of improved YOLOv4-tiny algorithm.
4.6 Safety Hazard Detection in Power Scenarios
The intrusion of foreign objects (such as bird nests, kites, balloons, and engineering vehicles) into transmission lines has always been a significant hidden danger threatening the safety of power grids. Bird nests may cause line short circuits due to falling branches and leaves; once lightweight floating objects like kites and balloons get entangled in the wires, they can lead to a decline in insulation performance or even breakdown and discharge. What’s more, if engineering vehicles accidentally come into contact with the lines during construction, they may directly cause tower collapse and wire breakage. Therefore, promptly detecting and addressing hidden dangers through intelligent recognition technology is a key measure to ensure the continuous and stable operation of transmission lines and prevent the occurrence of major accidents.
Xiang et al. [111] developed an Faster R-CNN-based intelligent detection method for engineering vehicles intruding into transmission lines, addressing the diversity of engineering vehicle types by adjusting the position of the RoI pooling layer. Zhang et al. [112] presented a foreign object detection method based on YOLOv5, further improving the model’s detection accuracy and speed by embedding RepConv and C2f structures into the network. Yu et al. [113] optimized the hyperparameters of YOLOv7 using genetic algorithms and space-to-depth convolution to enable intelligent recognition of foreign objects in UAV aerial images. Zhu et al. [114] proposed a YOLOv3-based detection method for foreign object intrusion into transmission lines, improving YOLOv3 with oriented bounding boxes and scale histogram matching strategies to achieve directional detection of foreign objects (e.g., suspended objects, fireworks, and engineering vehicles). Li et al. [115] enhanced foreign object detection speed by using MobileNetv2 as the feature extraction network for CenterNet. Zhang et al. [116] improved YOLOv3 with MobileNetv3 to achieve efficient edge-side detection of targets such as suspended objects, mountain fires, and tower cranes.
To facilitate the practical implementation of UAV-based autonomous inspection technology for overhead transmission lines, researchers have focused their efforts on deploying deep learning network models on edge computing devices. For instance, MobileNetv2 is integrated as the backbone feature extraction network for CenterNet, or YOLOv3 is optimized with MobileNetv3 as its lightweight backbone—both modifications enabling efficient edge-side detection of critical threats such as suspended objects, mountain fires, and tower cranes. In addition, Qiu et al. [117] embedded a dual attention mechanism into YOLOv4 and proposed the YOLOv4-EDAM model (Fig. 10), which supports real-time detection of foreign intrusion targets including bird nests, kites, and balloons that pose risks to transmission line safety.

Figure 10: Structure of YOLOv4-EDAM.
In addition to the aforementioned object detection methods, researchers have also proposed approaches based on weak supervision and semantic segmentation for detecting foreign object intrusions in overhead transmission lines. Hao et al. [118] proposed an insulator icing recognition method that integrates deep weak supervision with transfer learning. Hu et al. [67] developed a U-Net-based power line icing semantic segmentation method.
Table 6 summarizes the research achievements on safety hazard detection in power scenarios based on deep learning.

Overall, researchers have achieved remarkable progress across the six aforementioned application scenarios by leveraging deep learning technologies. Within each research subfield, targeted algorithmic optimization and scenario-specific customization have yielded substantial improvements in detection performance relative to traditional methods, thereby furnishing effective technical support for power equipment operation and maintenance. However, these studies still face several common challenges: the accurate extraction and effective representation of complex morphological features of various power components and defects—such as target occlusion, small-scale targets, and multi-scale variations—remain key bottlenecks restricting further improvements in detection accuracy. This not only impedes the large-scale deployment of UAV-based intelligent inspection technology but also leads to notable limitations in current research. Therefore, future research should further advance deep learning-based methodologies, address the scenario-specific demands of different application contexts, break through technical bottlenecks in feature extraction and representation, and provide more robust theoretical guidance and technical support for power component recognition and defect detection. Such efforts will drive a comprehensive improvement in the intelligence level of power system operation and maintenance.
In this section, we present a comprehensive and detailed overview of all research papers reviewed in this study, which focus on the core domain of deep learning-based intelligent inspection applications. We have not only synthesized the exploration paths, technical solutions, and research outcomes of different research teams in this field but also conducted in-depth analyses of the theoretical foundations and experimental design approaches adopted in these papers, aiming to illustrate the overall context and development trends of research in this area for readers.
Furthermore, to more objectively and thoroughly assess the value and caliber of these research results, we have performed a systematic qualitative evaluation of all papers included in the review. During this evaluation, we carefully established several targeted qualitative criteria by integrating the technical characteristics of the field with practical application requirements. These criteria specifically include: dataset availability (i.e., whether the dataset is easily accessible, its format is standardized, and it has good representativeness); dataset size (explicitly requiring over 1000 samples to ensure the effectiveness and generalization capability of model training); adoption of data augmentation techniques (enhancing model robustness through diversified processing of original data); sample diversity (ensuring data covers samples from different scenarios and conditions to improve model adaptability); support for multi-target detection (assessing the model’s ability to identify multiple targets simultaneously in complex scenarios); development based on the PyTorch framework (whose widespread application in deep learning makes it a key evaluation reference); involvement of model compression technologies (reducing model size while maintaining performance to facilitate practical deployment); realization of model deployment (the capability to apply trained models to real-world scenarios); AP/mAP (requiring over 85% to reflect detection accuracy); and FPS (requiring over 30 to indicate real-time processing capability). Guided by these comprehensive and rigorous evaluation criteria, we have conducted in-depth, detailed assessments of all selected papers individually. The final evaluation results are systematically summarized in Table 7 to facilitate readers’ intuitive comparison and understanding of each paper’s performance.

Several insights can be drawn from the above data. Firstly, studies that implement multi-target detection and achieve an AP/mAP of over 85% perform particularly prominently, accounting for 84.2% and 81.6% of all relevant studies, respectively. As a core requirement for complex scenarios, multi-target detection has garnered extensive attention and practical deployment, reflecting the field’s proactive efforts to tackle sophisticated detection tasks in real-world applications. Meanwhile, researchers’ emphasis on core model performance metrics also underscores the notable progress of current technologies in improving detection accuracy. Secondly, the 78.9% proportion of studies with a dataset size of over 1000 samples indicates that, in research on deep learning-based intelligent inspection applications, most teams recognize the significance of sufficient sample sizes for ensuring effective model training.
However, the proportions of studies with adequate dataset availability (13.1%) and sufficient sample diversity (18.4%) reveal critical gaps in the field’s data infrastructure. Low dataset availability can impede the reproducibility and cross-study comparability of research findings, while insufficient sample diversity restricts models’ generalization across heterogeneous scenarios—ultimately resulting in inconsistent performance in real-world applications. In terms of technical implementation and deployment, 64.1% of studies leverage the PyTorch framework, solidifying its status as a mainstream tool for deep learning algorithm development. Data augmentation techniques have been adopted in 52.6% of cases, signaling their moderate integration into research workflows. Conversely, the relatively low proportions of model compression technology adoption (26.3%), successful on-site deployment (13.2%), and inference speeds exceeding 30 FPS (39.5%) reflect a research focus skewed toward model training and performance enhancement—with insufficient attention paid to critical practical application dimensions, such as engineering implementation, real-time inference optimization, and model lightweighting. This imbalance has emerged as a core bottleneck impeding the technology’s transition from laboratory validation to real-world operational deployment.
The rapid development of deep learning has benefited from large-scale datasets. Similarly, the widespread application of deep learning in intelligent inspection of transmission lines relies heavily on high-quality datasets. Currently, most data used for deep learning-based inspection tasks focuses on capturing vulnerable components of transmission lines, such as insulators, power lines, power fittings, and other key power facilities. Relevant information on common public datasets for transmission line inspection is summarized in Table 8.

(1) Chinese Power Line Insulator Dataset (CPLID): This dataset comprises UAV-captured aerial images of high-voltage insulators acquired by the State Grid Corporation of China, including 600 images of normal insulators and 248 images of defective insulators synthesized via digital image augmentation techniques.
(2) Overhead Power Distribution Lines (OPDL): The OPDL dataset publicly provides 4960 images of four types of 15 kV distribution insulators, namely ceramic pin-type insulators, two-color ceramic insulators, gray polymer insulators, and green glass insulators.
(3) Insulator Dataset (ID): This dataset publicly provides 2630 images of post insulators under varying outdoor illumination and background conditions, and is designed specifically for insulator detection tasks.
(4) Insulator Defect Image Dataset (IDID): The IDID dataset is centered on insulator strings as the main object of study, including flashover insulators, shell-broken insulators, and normal insulators, and contains 1688 images for insulator defect detection.
(5) Synthetic Foggy Insulator Dataset (SFID): Zhang et al., (2022) simulated foggy environments using an atmospheric scattering model and generated multi-scenario data by adjusting fog concentration parameters. The SFID dataset provides data for the detection of insulators and their defects in complex environments, which addresses the gap in insulator detection datasets for foggy environments.
(6) Unifying Public Insulator Dataset (UPID): The dataset integrates two public datasets (CPLID and ID) and contains 6860 images for insulator detection and defect classification.
(7) Transmission Towers and Power Lines Aerial-image (TTPLA): As the first open-source dataset for transmission tower instance segmentation, the TTPLA dataset provides 1100 aerial images of tower-line systems with segmentation labels, featuring an image resolution of 3840 × 2160.
(8) Powerline Image Dataset (PLD): The PLD, provided by Yetgin et al. (2019), is the first open-source dataset for powerline classification. It contains 8000 aerial images (128 × 128 resolution) for powerline presence classification: 4000 visible images and 4000 infrared images. Of these, 2000 visible images and 2000 infrared images contain powerlines, while the remaining images are powerline-free aerial images.
(9) Ground Truth of Powerline Dataset (GTPLD): The GTPLD dataset is a collection of aerial images captured using infrared and visible-light cameras via collaboration between Lee et al. (2017) and Turkish Electricity Transmission Corporation. It includes 800 images with a resolution of 512 × 512 for powerline semantic segmentation and detection.
(10) Power Line Detection in Unmanned Aerial Vehicle (PLD-UAV): The PLD-UAV dataset, a power line semantic segmentation dataset publicly released by Zhang et al. (2019), includes two subsets: PLDU (urban scenes) and PLDM (mountainous scenes), containing 573 and 287 images, respectively. The authors further expanded the number of images in the dataset to 48 times the original count using image augmentation techniques such as rotation and random multi-scale scaling.
Besides the publicly available datasets mentioned above, scholars have constructed numerous self-built datasets specifically for research on intelligent inspection of transmission lines.
6 Current Challenges and Future Directions
As intelligent inspection of transmission lines accelerates its evolution toward greater intelligence and full automation, UAV technology boasts increasingly broad application prospects in this domain. Meanwhile, the rapid advancement of deep learning technology will equip UAVs with more precise intelligent and automated inspection capabilities. Such capabilities can promptly identify transmission line abnormalities, provide robust support for the efficient execution of operation, maintenance, and repair (OMR) work, and thereby significantly improve the safety, stability, and operational reliability of transmission systems. However, several critical challenges still persist that must be addressed to further advance the state of the art. This section outlines these key challenges and proposes prospective future research directions.
Currently, the challenges confronting deep learning-based intelligent inspection of transmission lines are mainly manifested in the following aspects.
(1) Image feature extraction under complex environments
Aerial images of transmission lines feature complex and variable backgrounds, and are susceptible to interference from factors such as light fluctuations, shadows, and target occlusion. Due to the diversity and uncertainty of real-world environments, existing image feature extraction technologies struggle to effectively capture the deep-seated features of power components.
(2) Processing and storage of large-scale data
Devices such as UAVs, intelligent cameras, and various sensors are now widely deployed for transmission line inspections, leading to explosive growth in inspection-generated data. This inspection data is both massive and heterogeneous, demanding extensive storage space and robust data processing capabilities.
(3) Limited precision of inspection equipment
The limited precision of inspection equipment is a key restrictive factor for the intelligent operation and maintenance of transmission lines. While existing inspection equipment widely integrates advanced technologies—including high-resolution optical lenses, infrared thermal imaging sensors, and LiDAR—to establish multi-modal data acquisition systems, their detection efficiency still falls short of the ideal standard under extreme environments and complex working conditions.
(4) Robustness of artificial intelligence algorithms
Deep learning algorithms are data-driven, with their detection accuracy depending on the scale and quality of the dataset. Models trained on a limited number of samples are prone to overfitting or failure to converge. In particular, when faced with scenarios not covered in the training set, these models tend to exhibit poor generalization ability.
(5) Unsatisfactory detection effect for small targets
Power components span a wide range of types. In aerial images of varying scales, small-target devices suffer from low resolution, limited feature information, and low recognition accuracy. Power components such as anti-vibration hammers, spacer dampers, grading rings, bolts, and pins have subtle defect features, which renders effective feature extraction challenging. Furthermore, mainstream deep learning algorithms (e.g., YOLO, SSD, and Faster R-CNN) demonstrate subpar performance in small-target detection tasks.
(6) Target detection in unknown scenarios
Deep learning algorithms can accurately recognize objects within the training dataset but lack robustness when detecting objects in real-world, unconstrained scenarios. The scarcity of samples for edge cases and the inability of deep learning algorithms to generalize to unknown categories pose significant challenges to intelligent transmission line inspection.
As research on the intelligent inspection of transmission lines continues to advance, deep learning methods have demonstrated significantly superior detection performance compared with traditional image processing methods in power component recognition and fault diagnosis. This technical edge has, to a certain extent, accelerated the automation of the entire power operation and maintenance workflow, while also highlighting numerous critical issues that urgently require resolution. Future research will focus on the following key directions.
(1) Deep integration of deep learning with multiple learning paradigms
The in-depth integration of deep learning with multiple learning paradigms is emerging as a core research direction for breaking through technical bottlenecks in transmission line intelligent inspection. This integration is not a simple superposition of techniques, but a reconstruction of algorithmic logics to form a synergistic framework that can adapt to complex inspection scenarios.
For instance, the integration of deep learning and few-shot learning addresses the scarcity of defect samples by leveraging the “learning-to-learn” mechanism of meta-learning. The integration of deep learning and reinforcement learning focuses on optimizing dynamic decision-making processes: it takes environmental features extracted by deep learning models as state inputs and trains optimal inspection strategies via reward mechanisms (e.g., maximizing defect detection rates while minimizing energy consumption). The synergistic combination of deep learning, transfer learning, and multi-modal learning enhances cross-scenario adaptability—first by transferring feature extraction capabilities trained on general image datasets to inspection scenarios via transfer learning, then by integrating multi-modal fusion networks to process heterogeneous data such as visible-light, infrared, and vibration signals.
Future research should focus on breaking through the dynamic balance mechanism of the integrated framework. For example, attention mechanisms can be employed to dynamically allocate the weight of feature extraction in deep learning and the decision priority in reinforcement learning, ultimately achieving a full-chain upgrade of efficient data utilization, scenario adaptability, and optimal decision-making performance. This will provide core technical support for the unmanned and intelligent transformation of inspection workflows in power grid operations.
(2) Deep fusion of multimodal data
In the field of intelligent inspection of transmission lines, deep multimodal data fusion based on deep learning represents a highly transformative future research direction. This technology integrates multi-source information including visible-light images, infrared thermal imaging, LiDAR point clouds, and inspection textual data, which can overcome the inherent limitations of single-modality approaches and significantly improve inspection accuracy and overall intelligence.
Existing transmission line inspections often rely on single-modal data—for instance, using only visible-light images to identify defects in line components. This method exhibits notable limitations when facing complex environments and diverse fault types. For example, in scenarios with poor lighting or equipment obscured by obstacles, visible-light images struggle to provide clear fault-related information. However, deep multimodal data fusion can effectively remedy this shortcoming. It leverages the powerful feature extraction and fusion capabilities of deep learning models; through fusion strategies such as feature concatenation and weighted fusion, features from diverse modalities are integrated into a unified feature representation to support subsequent tasks like classification and detection. Additionally, deep multimodal data fusion can be combined with large language models (LLMs), which can perform semantic understanding and analysis on fused multimodal data, extract knowledge from extensive textual documents (e.g., inspection reports and equipment manuals), and assist in fault diagnosis and decision-making processes.
In the future, advancements in sensor technology will enable the acquisition of more diverse and high-precision multimodal data; meanwhile, progress in deep learning algorithms will further enhance the efficiency of multimodal data fusion and analysis. The intelligent inspection system constructed via deep multimodal data fusion is expected to achieve end-to-end intelligence across the entire workflow—from data collection and processing to fault diagnosis and decision support—thereby further bolstering the safe and stable operation of transmission line systems.
(3) Collaborative application of object detection algorithms and large language models
Traditional object detection algorithms, such as YOLO and Faster R-CNN, have yielded tangible results in transmission line inspection. They can quickly locate and identify common issues (e.g., component defects and foreign object intrusions on lines), exhibiting high detection speed and a certain degree of accuracy. However, their limitations in generalization ability and complex semantic understanding capacity have gradually become prominent when facing complex, diverse transmission line environments, dynamically changing scenarios, and the escalating demand for refined, intelligent inspection workflows.
The rise of LLMs has brought new opportunities to address these challenges. LLMs possess robust natural language understanding and generation capabilities, as well as comprehensive knowledge-based reasoning capabilities. In intelligent transmission line inspection, these models can conduct in-depth semantic analysis of inspection data and interpret complex task instructions. For instance, they can extract key information from numerous textual documents (including inspection reports and equipment manuals) to support decision-making processes. Meanwhile, based on multi-modal data descriptions (e.g., images and videos) combined with domain-specific knowledge, they can more accurately assess potential faults and abnormal conditions of transmission lines.
Future research should focus on the in-depth integration of LLMs and traditional object detection algorithms. On one hand, LLMs can be used to perform secondary verification and correction of detection results generated by traditional algorithms. Through semantic reasoning, they can mitigate the false positive and false negative issues that traditional algorithms are prone to in complex scenarios. For example, when traditional algorithms detect suspected defects, LLMs can judge the validity and severity of these defects based on transmission line operation knowledge, historical fault cases, and other relevant data. On the other hand, LLMs can guide traditional algorithms to optimize detection strategies, dynamically adjusting algorithm parameters according to different inspection scenarios and task requirements to enhance detection efficiency and adaptability. Additionally, by combining the advantages of both technologies, an end-to-end intelligent inspection system can be developed to achieve full-process end-to-end intelligence—from data collection and processing to fault diagnosis and report generation—thereby further improving the reliability and intelligence level of transmission line inspection.
(4) Cloud-edge-end collaborative integration
Intelligent inspection of transmission lines based on the collaborative integration of cloud-edge-terminal architectures adopts a hierarchical collaboration mode—featuring cloud-based global coordination, edge-node local processing, and terminal-side real-time response—to address core pain points in traditional inspection, such as redundant data transmission, inadequate real-time responsiveness, and unbalanced computing power allocation. This paradigm drives the evolution of inspection methodologies from passive defect detection to active risk early warning, providing technical support for full-process automation in power grid operation and maintenance.
The core of cloud-edge-terminal collaborative integration lies in establishing a three-dimensional collaboration mechanism. At the terminal side, priority is placed on exploring lightweight deep learning model deployment technologies. Combined with model compression and dynamic inference strategies, these technologies enable efficient data preprocessing and preliminary defect screening for devices such as UAVs and intelligent sensors, while simultaneously optimizing multi-terminal collaborative data acquisition protocols to reduce redundant data transmission. At the edge side, efforts are focused on constructing a low-latency inference framework. Through localized real-time analysis (e.g., monitoring abnormal conductor galloping, identifying thermal defects in power hardware), rapid on-site response is achieved. Additionally, a cloud-edge model collaborative iteration mechanism is established to optimize model adaptability based on scenario-specific data feedback from edge nodes. At the cloud layer, the key focus is on multi-source heterogeneous data fusion algorithms, which integrate multi-modal information (including visible-light, infrared, and 3D point clouds) to construct a full-lifecycle equipment knowledge graph. Combined with reinforcement learning, this enables intelligent scheduling of global inspection tasks and computing resources, further enhancing the system’s overall decision-making efficiency.
(5) Multi-agent collaborative inspection and cluster optimization
Multi-agent collaborative inspection involves the cooperation of various devices. UAVs, boasting strong mobility, can quickly reach the high-altitude line segments for large-scale preliminary scanning. Equipped with high-definition cameras and infrared thermal imagers, they capture the overall appearance of lines and equipment as well as temperature anomalies. Ground inspection robots, by contrast, specialize in complex terrains or close-range detailed inspections, navigate around tower bases and use robotic arms and high-precision sensors to detect issues such as loose tower bolts and foundation settlement. Fixed cameras and sensor nodes maintain continuous monitoring of critical components, delivering stable real-time data flows.
To achieve efficient collaboration, on the one hand, a dynamic task allocation mechanism is developed to dynamically adjust task assignments for each agent in real-time in accordance with factors such as line length, environmental complexity, and equipment importance. On the other hand, the collaborative system involves optimizing the communication network between agents by incorporating 5G, satellite communication, and cutting-edge communication technologies to ensure stable, low-latency data transmission performance, facilitating real-time data sharing and collaborative decision-making among multiple agents. Cluster optimization further applies to path planning: genetic algorithms, simulated annealing algorithms, and other algorithms are used to generate optimal inspection routes by comprehensively considering the agents’ endurance, signal coverage, inspection accuracy, and key operational constraints, thereby avoiding redundant inspections and minimizing energy consumption.
The continuous advancement of multi-agent collaborative inspection and cluster optimization will drive the evolution of transmission line inspection toward intelligence, efficiency, and comprehensiveness, ensure the safe and stable operation of power grids, and effectively propel the effective digital transformation of the power industry.
As a core technical means to ensure the safe and stable operation of power grids, deep learning-based transmission line inspection has made significant progress in the context of the digital and intelligent transformation of the power industry. This paper presents a systematic review of cutting-edge trends and key technologies in this field, and conducts a comprehensive survey and in-depth analysis from three dimensions: technical foundations, application practices, and future challenges. The aim is to provide structured references for researchers and practitioners, while facilitating further breakthroughs and practical applications of visual intelligent inspection technology.
At the technical foundation level, this paper focuses on deep learning as the core driver, systematically expounding on the deep learning-based algorithmic framework for transmission line inspection. Key components include: (1) the evolutionary trajectory of target detection algorithms, which has advanced from two-stage to one-stage and anchor-free architectures; (2) semantic segmentation techniques (e.g., U-Net, DeepLab) for fine-grained parsing of line components; and (3) the integration of complementary algorithms (e.g., keypoint detection, instance segmentation) to enhance task performance. By deeply embedding deep learning into every stage of transmission line inspection workflows—from defect localization to component identification—these innovations have significantly improved the accuracy and efficiency of defect recognition, equipment localization, and scene understanding. Collectively, these algorithmic advancements have laid a robust technical foundation for real-time condition monitoring of power equipment in complex, dynamic environments.
At the application practice level, this paper summarizes six typical scenarios—including power insulators and their defects detection, transmission tower detection, power line feature extraction, metal fitting and associated defect detection, thermal fault diagnosis of power components, and safety hazard detection in power scenarios—demonstrating the diversified application capabilities of deep learning in inspection tasks.
Regarding the current challenges of image-perception-based intelligent inspection for transmission lines, this paper proposes five future research directions. First, facilitating the deep integration of deep learning with transfer learning, few-shot learning, self-supervised learning, and other paradigms to enhance the adaptability of models to complex environments; second, exploring joint representation and feature-level fusion methods for multi-modal data to establish a more comprehensive information perception system; third, leveraging the semantic advantages of large language models in scene description and fault reasoning to achieve a closed-loop of perception-cognition-decision; fourth, developing cloud-edge-end collaborative distributed computing frameworks to balance the trade-off between real-time performance and computational resource constraints; fifth, advancing collaborative control and cluster optimization algorithms for multi-agents to improve the efficiency and reliability of large-scale inspections. Breakthroughs in these directions will drive visual intelligent inspection from functional to high-reliability, from single-task operations to multi-scenario collaboration, and from auxiliary tools to autonomous decision-making, thereby providing stronger technical support for the intelligent operation and maintenance of next-generation power systems.
To conclude, deep learning-based transmission line inspection is currently in a critical transition phase, moving from technical validation to large-scale engineering implementation. Through systematic review and forward-looking analysis, this paper has not only clarified the evolutionary trajectory of existing accomplishments but also proposed future research directions. It is anticipated to inject fresh impetus into academic research and industrial practices within the field, thereby propelling power grid inspection toward a new stage of enhanced efficiency, precision, and intelligence.
Acknowledgement: We sincerely acknowledge Professor Wu Yiquan for his meticulous guidance. He provided the authors of this paper with insights into cutting-edge research directions and offered valuable inspiration for the construction of the paper’s framework, the organization of research content, and the refinement of research ideas.
Funding Statement: This work was financially supported by the Natural Research Project of College in Anhui Province under grant 2024AH051365, 2025AHGXZK30826; Research Platform of New Energy and Energy-Saving Technology Research Center under grant KYJG002.
Author Contributions: The authors confirm their contribution to the paper as follows: execute literature retrieval and screening, lead the drafting of the initial manuscript, and complete data integration and analysis: Jingjing Liu; coordinate the selection of research topics, design the research framework, and review the quality of the full text: Chuanyang Liu. All authors reviewed and approved the final version of the manuscript.
Availability of Data and Materials: This paper is predominantly a review that synthesizes existing methods and literature findings. This investigation utilized only data obtained from publicly accessible sources. These datasets are accessible via the sources listed in the References section of this paper.
Ethics Approval: Not applicable.
Conflicts of Interest: The authors declare no conflicts of interest.
References
1. Dong C, Zhang K, Xie Z, Wang J, Guo X, Shi C, et al. Transmission line key components and defects detection based on meta-learning. IEEE Trans Instrum Meas. 2024;73:5022213. doi:10.1109/tim.2024.3403202. [Google Scholar] [CrossRef]
2. Ren Q, Kang W, Yang X, Wang Q, Huang Q. Multi-dimensional deep learning-based anomaly detection and adaptive security strategy for sustainable power grid operation. Measurement. 2025;252:117313. doi:10.1016/j.measurement.2025.117313. [Google Scholar] [CrossRef]
3. Ekren N, Karagöz Z, Şahin M. A review of line suspended inspection robots for power transmission lines. J Electr Eng Technol. 2024;19(4):2549–83. doi:10.1007/s42835-023-01713-7. [Google Scholar] [CrossRef]
4. Yi J, Mao J, Zhang H, Chen Y, Liu T, Zeng K, et al. Balancing accuracy and efficiency with a multiscale uncertainty-aware knowledge-based network for transmission line inspection. IEEE Trans Ind Inf. 2025;21(4):2829–38. doi:10.1109/tii.2024.3507936. [Google Scholar] [CrossRef]
5. Faisal MAA, Mecheter I, Qiblawey Y, Fernandez JH, Chowdhury MEH, Kiranyaz S. Deep learning in automated power line inspection: a review. Appl Energy. 2025;385:125507. doi:10.1016/j.apenergy.2025.125507. [Google Scholar] [CrossRef]
6. Shakiba FM, Azizi SM, Zhou M, Abusorrah A. Application of machine learning methods in fault detection and classification of power transmission lines: a survey. Artif Intell Rev. 2023;56(7):5799–836. doi:10.1007/s10462-022-10296-0. [Google Scholar] [CrossRef]
7. Li Z, Zhang Y, Wu H, Suzuki S, Namiki A, Wang W. Design and application of a UAV autonomous inspection system for high-voltage power transmission lines. Remote Sens. 2023;15(3):865. doi:10.3390/rs15030865. [Google Scholar] [CrossRef]
8. Ahmed F, Mohanta JC, Keshari A. Power transmission line inspections: methods, challenges, current status and usage of unmanned aerial systems. J Intell Rob Syst. 2024;110(2):54. doi:10.1007/s10846-024-02061-y. [Google Scholar] [CrossRef]
9. Maduako I, Igwe CF, Abah JE, Onwuasaanya OE, Chukwu GA, Ezeji F, et al. Deep learning for component fault detection in electricity transmission lines. J Big Data. 2022;9(1):81. doi:10.1186/s40537-022-00630-2. [Google Scholar] [CrossRef]
10. Tang X, Ru X, Su J, Adonis G. A transmission and transformation fault detection algorithm based on improved YOLOv5. Comput Mater Continua. 2023;76(3):2997–3011. doi:10.32604/cmc.2023.038923. [Google Scholar] [CrossRef]
11. Deng S, Chen L, He Y. Insulator defect detection from aerial images in adverse weather conditions. Appl Intell. 2025;55(6):365. doi:10.1007/s10489-025-06280-0. [Google Scholar] [CrossRef]
12. Ahmed MF, Mohanta JC, Sanyal A. Inspection and identification of transmission line insulator breakdown based on deep learning using aerial images. Electr Power Syst Res. 2022;211:108199. doi:10.1016/j.epsr.2022.108199. [Google Scholar] [CrossRef]
13. Xu C, Li Q, Zhou Q, Zhang S, Yu D, Ma Y. Power line-guided automatic electric transmission line inspection system. IEEE Trans Instrum Meas. 2022;71:1–18. doi:10.1109/tim.2022.3169555. [Google Scholar] [CrossRef]
14. Guan Q, Zhang X, Xie M, Nie J, Cao H, Chen Z, et al. Large-scale power inspection: a deep reinforcement learning approach. Front Energy Res. 2023;10:1054859. doi:10.3389/fenrg.2022.1054859. [Google Scholar] [CrossRef]
15. Cao Y, Xu H, Su C, Yang Q. Accurate glass insulators defect detection in power transmission grids using aerial image augmentation. IEEE Trans Power Deliv. 2023;38(2):956–65. doi:10.1109/tpwrd.2022.3202958. [Google Scholar] [CrossRef]
16. Zhang Y, Li B, Shang J, Huang X, Zhai P, Geng C. DSA-net: an attention-guided network for real-time defect detection of transmission line dampers applied to UAV inspections. IEEE Trans Instrum Meas. 2024;73:1–22. doi:10.1109/tim.2023.3331418. [Google Scholar] [CrossRef]
17. Lu L, Chen Z, Wang R, Liu L, Chi H. Yolo-inspection: defect detection method for power transmission lines based on enhanced YOLOv5s. J Real Time Image Process. 2023;20(5):104. doi:10.1007/s11554-023-01360-1. [Google Scholar] [CrossRef]
18. Zhao L, Liu CA, Qu H. Co-occurrence object detection of the transmission lines based on the cross-domain interactive feature enhancement. IEEE Trans Power Deliv. 2023;38(6):4443–53. doi:10.1109/tpwrd.2023.3321867. [Google Scholar] [CrossRef]
19. Li T, Zhu C, Wang Y, Li J, Cao H, Yuan P, et al. LMFC-DETR: a lightweight model for real-time detection of suspended foreign objects on power lines. IEEE Trans Instrum Meas. 2025;74:1–19. doi:10.1109/tim.2025.3584143. [Google Scholar] [CrossRef]
20. Ma Y, Yin J, Huang F, Li Q. Surface defect inspection of industrial products with object detection deep networks: a systematic review. Artif Intell Rev. 2024;57(12):333. doi:10.1007/s10462-024-10956-3. [Google Scholar] [CrossRef]
21. Nguyen VN, Jenssen R, Roverso D. Automatic autonomous vision-based power line inspection: a review of current status and the potential role of deep learning. Int J Electr Power Energy Syst. 2018;99:107–20. doi:10.1016/j.ijepes.2017.12.016. [Google Scholar] [CrossRef]
22. Yang L, Fan J, Liu Y, Li E, Peng J, Liang Z. A review on state-of-the-art power line inspection techniques. IEEE Trans Instrum Meas. 2020;69(12):9350–65. doi:10.1109/tim.2020.3031194. [Google Scholar] [CrossRef]
23. Liu X, Miao X, Jiang H, Chen J. Data analysis in visual power line inspection: an in-depth review of deep learning for component detection and fault diagnosis. Annu Rev Control. 2020;50:253–77. doi:10.1016/j.arcontrol.2020.09.002. [Google Scholar] [CrossRef]
24. Liu J, Hu M, Dong J, Lu X. Summary of insulator defect detection based on deep learning. Electr Power Syst Res. 2023;224:109688. doi:10.1016/j.epsr.2023.109688. [Google Scholar] [CrossRef]
25. Luo Y, Yu X, Yang D, Zhou B. A survey of intelligent transmission line inspection based on unmanned aerial vehicle. Artif Intell Rev. 2023;56(1):173–201. doi:10.1007/s10462-022-10189-2. [Google Scholar] [CrossRef]
26. Liu X, Jiang H, Chen J, Chen J, Zhuang S, Miao X. Insulator detection in aerial images based on faster regions with convolutional neural network. In: Proceedings of the 2018 IEEE 14th International Conference on Control and Automation (ICCA); 2018 Jun 12–15; Anchorage, AK, USA. p. 1082–6. doi:10.1109/icca.2018.8444172. [Google Scholar] [CrossRef]
27. Li X, Su H, Liu G. Insulator defect recognition based on global detection and local segmentation. IEEE Access. 2020;8:59934–46. doi:10.1109/access.2020.2982288. [Google Scholar] [CrossRef]
28. Wang S, Liu Y, Qing Y, Wang C, Lan T, Yao R. Detection of insulator defects with improved ResNeSt and region proposal network. IEEE Access. 2020;8:184841–50. doi:10.1109/access.2020.3029857. [Google Scholar] [CrossRef]
29. Wen Q, Luo Z, Chen R, Yang Y, Li G. Deep learning approaches on defect detection in high resolution aerial images of insulators. Sensors. 2021;21(4):1033. doi:10.3390/s21041033. [Google Scholar] [PubMed] [CrossRef]
30. Zhai Y, Hu Z, Wang Q, Yang Q, Yang K. Multi-geometric reasoning network for insulator defect detection of electric transmission lines. Sensors. 2022;22(16):6102. doi:10.3390/s22166102. [Google Scholar] [PubMed] [CrossRef]
31. Wang C, Pei H, Tang G, Liu B, Liu Z. Pointer meter recognition in UAV inspection of overhead transmission lines. Energy Rep. 2022;8:243–50. doi:10.1016/j.egyr.2022.02.108. [Google Scholar] [CrossRef]
32. Fu Q, Liu J, Zhang X, Zhang Y, Ou Y, Jiao R, et al. A small-sized defect detection method for overhead transmission lines based on convolutional neural networks. IEEE Trans Instrum Meas. 2023;72:1–12. doi:10.1109/tim.2023.3298424. [Google Scholar] [CrossRef]
33. Wang W, Lu F, Wu B, Yu J, Yan X, Fan H. GFRF R-CNN: object detection algorithm for transmission lines. Comput Mater Continua. 2025;82(1):1439–58. doi:10.32604/cmc.2024.057797. [Google Scholar] [CrossRef]
34. Sampedro C, Rodriguez-Vazquez J, Rodriguez-Ramos A, Carrio A, Campoy P. Deep learning-based system for automatic recognition and diagnosis of electrical insulator strings. IEEE Access. 2019;7:101283–308. doi:10.1109/access.2019.2931144. [Google Scholar] [CrossRef]
35. Yang Y, Wang Y, Jiao H. Insulator identification and self-shattering detection based on mask region with convolutional neural network. J Electron Imag. 2019;28(5):053011. doi:10.1117/1.jei.28.5.053011. [Google Scholar] [CrossRef]
36. Miao X, Liu X, Chen J, Zhuang S, Fan J, Jiang H. Insulator detection in aerial images for transmission line inspection using single shot multibox detector. IEEE Access. 2019;7:9945–56. doi:10.1109/access.2019.2891123. [Google Scholar] [CrossRef]
37. Song Z, Huang X, Ji C, Zhang Y. Deformable YOLOX: detection and rust warning method of transmission line connection fittings based on image processing technology. IEEE Trans Instrum Meas. 2023;72:1–21. doi:10.1109/tim.2023.3238742. [Google Scholar] [CrossRef]
38. Lin TY, Goyal P, Girshick R, He K, Dollar P. Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell. 2020;42(2):318–27. doi:10.1109/tpami.2018.2858826. [Google Scholar] [PubMed] [CrossRef]
39. Sapkota R, Flores-Calero M, Qureshi R, Badgujar C, Nepal U, Poulose A, et al. YOLO advances to its genesis: a decadal and comprehensive review of the You Only Look Once (YOLO) series. Artif Intell Rev. 2025;58(9):274. doi:10.1007/s10462-025-11253-3. [Google Scholar] [CrossRef]
40. Wang S, Zou X, Zhu W, Zeng L. Insulator defects detection for aerial photography of the power grid using you only look once algorithm. J Electr Eng Technol. 2023;18(4):3287–300. doi:10.1007/s42835-023-01385-3. [Google Scholar] [CrossRef]
41. Si Y, Gao J, Zhao M, Xu X. Research on the algorithm of detecting insulators in high-voltage transmission lines using UAV images. Signal Image Video Process. 2024;18(1):395–406. doi:10.1007/s11760-024-03162-9. [Google Scholar] [CrossRef]
42. He M, Qin L, Deng X, Liu K. MFI-YOLO: multi-fault insulator detection based on an improved YOLOv8. IEEE Trans Power Deliv. 2024;39(1):168–79. doi:10.1109/tpwrd.2023.3328178. [Google Scholar] [CrossRef]
43. Li Y, Zhu C, Zhang Q, Zhang J, Wang G. IF-YOLO: an efficient and accurate detection algorithm for insulator faults in transmission lines. IEEE Access. 2024;12:167388–403. doi:10.1109/access.2024.3496514. [Google Scholar] [CrossRef]
44. Zhang Q, Zhang J, Li Y, Zhu C, Wang G. ID-YOLO: a multimodule optimized algorithm for insulator defect detection in power transmission lines. IEEE Trans Instrum Meas. 2025;74:1–11. doi:10.1109/tim.2025.3527530. [Google Scholar] [CrossRef]
45. Li S, Wang Z, Lv Y, Liu X. Improved YOLOv5s-based algorithm for foreign object intrusion detection on overhead transmission lines. Energy Rep. 2024;11:6083–93. doi:10.1016/j.egyr.2024.05.061. [Google Scholar] [CrossRef]
46. Ji C, Jia X, Huang X, Zhou S, Chen G, Zhu Y. FusionNet: detection of foreign objects in transmission lines during inclement weather. IEEE Trans Instrum Meas. 2024;73:1–18. doi:10.1109/tim.2024.3403173. [Google Scholar] [CrossRef]
47. Wang S, Tan W, Yang T, Zeng L, Hou W, Zhou Q. High-voltage transmission line foreign object and power component defect detection based on improved YOLOv5. J Electr Eng Technol. 2024;19(1):851–66. doi:10.1007/s42835-023-01625-6. [Google Scholar] [CrossRef]
48. Wang H, Luo S, Wang Q. Improved YOLOv8n for foreign-object detection in power transmission lines. IEEE Access. 2024;12:121433–40. doi:10.1109/access.2024.3452782. [Google Scholar] [CrossRef]
49. Zou H, Sun J, Ye Z, Yang J, Yang C, Li F, et al. A bolt defect detection method for transmission lines based on improved YOLOv5. Front Energy Res. 2024;12:1269528. doi:10.3389/fenrg.2024.1269528. [Google Scholar] [CrossRef]
50. Shi W, Lyu X, Han L. SONet: a small object detection network for power line inspection based on YOLOv8. IEEE Trans Power Deliv. 2024;39(5):2973–84. doi:10.1109/tpwrd.2024.3450185. [Google Scholar] [CrossRef]
51. Liu M, Li Z, Li Y, Liu Y. A fast and accurate method of power line intelligent inspection based on edge computing. IEEE Trans Instrum Meas. 2022;71:1–12. doi:10.1109/tim.2022.3152855. [Google Scholar] [CrossRef]
52. Hou C, Li Z, Shen X, Li G. Real-time defect detection method based on YOLO-GSS at the edge end of a transmission line. IET Image Process. 2024;18(5):1315–27. doi:10.1049/ipr2.13028. [Google Scholar] [CrossRef]
53. Han G, Wang R, Yuan Q, Zhao L, Li S, Zhang M, et al. Typical fault detection on drone images of transmission lines based on lightweight structure and feature-balanced network. Drones. 2023;7(10):638. doi:10.3390/drones7100638. [Google Scholar] [CrossRef]
54. Wu D, Yang W, Li J. Fault detection method for transmission line components based on lightweight GMPPD-YOLO. Meas Sci Technol. 2024;35(11):116015. doi:10.1088/1361-6501/ad7310. [Google Scholar] [CrossRef]
55. Wang Y, Zhang L, Xiong X, Kuang J, Xiang S. A lightweight and efficient multi-type defect detection method for transmission lines based on DCP-YOLOv8. Sensors. 2024;24(14):4491. doi:10.3390/s24144491. [Google Scholar] [PubMed] [CrossRef]
56. Xiang Y, Du C, Mei Y, Zhang L, Du Y, Liu A. BN-YOLO: a lightweight method for bird’s nest detection on transmission lines. J Real Time Image Process. 2024;21(6):194. doi:10.1007/s11554-024-01577-8. [Google Scholar] [CrossRef]
57. Law H, Deng J. CornerNet: detecting objects as paired keypoints. Int J Comput Vis. 2020;128(3):642–56. doi:10.1007/s11263-019-01204-1. [Google Scholar] [CrossRef]
58. Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q. CenterNet: keypoint triplets for object detection. In: Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV); 2019 Oct 27–Nov 2; Seoul, Republic of Korea. p. 6568–77. doi:10.1109/iccv.2019.00667. [Google Scholar] [CrossRef]
59. Wang F, Song G, Mao J, Li Y, Ji Z, Chen D, et al. Internal defect detection of overhead aluminum conductor composite core transmission lines with an inspection robot and computer vision. IEEE Trans Instrum Meas. 2023;72:1–16. doi:10.1109/tim.2023.3265104. [Google Scholar] [CrossRef]
60. Meng B. Researching on insulator defect recognition based on context cluster CenterNet++. Sci Rep. 2025;15:2352. doi:10.1038/s41598-025-85630-x. [Google Scholar] [PubMed] [CrossRef]
61. Li Y, Liu M, Li Z, Jiang X. CSSAdet: real-time end-to-end small object detection for power transmission line inspection. IEEE Trans Power Deliv. 2023;38(6):4432–42. doi:10.1109/tpwrd.2023.3315579. [Google Scholar] [CrossRef]
62. Luo P, Wang B, Wang H, Ma F, Ma H, Wang L. An ultrasmall bolt defect detection method for transmission line inspection. IEEE Trans Instrum Meas. 2023;72:1–12. doi:10.1109/tim.2023.3241994. [Google Scholar] [CrossRef]
63. Chen L, Papandreou G, Kokkinos I, Murphy K, Yuille A. Semantic image segmentation with deep convolutional Nets and fully connected CRFs. arXiv:1412.7062. 2014. doi:10.48550/arXiv.1412.7062. [Google Scholar] [CrossRef]
64. Hota M, Rao BS, Kumar U. Power lines detection and segmentation in multi-spectral uav images using convolutional neural network. In: Proceedings of the 2020 IEEE India Geoscience and Remote Sensing Symposium (InGARSS); 2020 Dec 1–4; Ahmedabad, India. p. 154–7. doi:10.1109/ingarss48198.2020.9358967. [Google Scholar] [CrossRef]
65. Antwi-Bekoe E, Liu G, Ainam JP, Sun G, Xie X. A deep learning approach for insulator instance segmentation and defect detection. Neural Comput Appl. 2022;34(9):7253–69. doi:10.1007/s00521-021-06792-z. [Google Scholar] [CrossRef]
66. Ye B, Li F, Li M, Yan P, Yang H, Wang L. Intelligent detection method for substation insulator defects based on CenterMask. Front Energy Res. 2022;10:985600. doi:10.3389/fenrg.2022.985600. [Google Scholar] [CrossRef]
67. Hu T, Shen L, Wu D, Duan Y, Song Y. Research on transmission line ice-cover segmentation based on improved U-Net and GAN. Electr Power Syst Res. 2023;221:109405. doi:10.1016/j.epsr.2023.109405. [Google Scholar] [CrossRef]
68. Xu W, Zhong X, Luo M, Weng L, Zhou G. End-to-end insulator string defect detection in a complex background based on a deep learning model. Front Energy Res. 2022;10:928162. doi:10.3389/fenrg.2022.928162. [Google Scholar] [CrossRef]
69. Cheng Y, Liu D. An image-based deep learning approach with improved DETR for power line insulator defect detection. J Sens. 2022;2022:6703864–22. doi:10.1155/2022/6703864. [Google Scholar] [CrossRef]
70. Shi W, Lyu X, Han L. An object detection model for power lines with occlusions combining CNN and transformer. IEEE Trans Instrum Meas. 2025;74:1–12. doi:10.1109/tim.2025.3529073. [Google Scholar] [CrossRef]
71. Chen M, Li J, Pan J, Ji C, Ma W. Insulator extraction from UAV LiDAR point cloud based on multi-type and multi-scale feature histogram. Drones. 2024;8:241. doi:10.1109/ccdc58219.2023.10327436. [Google Scholar] [CrossRef]
72. Ni Z, Shi K, Cheng X, Wu X, Yang J, Pang L, et al. Research on UAV-LiDAR-based detection and prediction of tree risks on transmission lines. Forests. 2025;16(4):578. doi:10.3390/f16040578. [Google Scholar] [CrossRef]
73. Mittal P. A comprehensive survey of deep learning-based lightweight object detection models for edge devices. Artif Intell Rev. 2024;57(9):242. doi:10.1007/s10462-024-10877-1. [Google Scholar] [CrossRef]
74. Zhao Z, Jin C, Qi Y, Zhang K, Kong Y. Image classification of transmission line bolt defects based on dynamic supervision knowledge distillation. High Volt Eng. 2021;47(2):406–14. (In Chinese). doi:10.3390/pr13030898. [Google Scholar] [CrossRef]
75. Wang J, Li Y, Chen W. UAV aerial image generation of crucial components of high-voltage transmission lines based on multi-level generative adversarial network. Remote Sens. 2023;15(5):1412. doi:10.3390/rs15051412. [Google Scholar] [CrossRef]
76. Wu Y, Zhao S, Xing Z, Wei Z, Li Y, Li Y. Detection of foreign objects intrusion into transmission lines using diverse generation model. IEEE Trans Power Deliv. 2023;38(5):3551–60. doi:10.1109/tpwrd.2023.3279891. [Google Scholar] [CrossRef]
77. Zhang N, Yang G, Wang D, Hu F, Yu H, Fan J. A defect detection method for substation equipment based on image data generation and deep learning. IEEE Access. 2024;12:105042–54. doi:10.1109/access.2024.3436000. [Google Scholar] [CrossRef]
78. Zhang ZD, Zhang B, Lan ZC, Liu HC, Li DY, Pei L, et al. FINet: an insulator dataset and detection benchmark based on synthetic fog and improved YOLOv5. IEEE Trans Instrum Meas. 2022;71:1–8. doi:10.1109/tim.2022.3194909. [Google Scholar] [CrossRef]
79. Kang J, Wang Q, Liu W, Xia Y. Detection model of multi-defect of aerial photo insulator by integrating CAT-BiFPN and attention mechanism. High Volt Eng. 2023;49(8):3361–76. (In Chinese). doi:10.3390/s25134165. [Google Scholar] [CrossRef]
80. Li B, Qu L, Zhu X, Guo Z, Tian Y. Insulator defect detection based on multi-scale feature fusion. Trans China Electrotech Soc. 2023;38(1):60–70. (In Chinese). doi:10.21203/rs.3.rs-3853449/v1. [Google Scholar] [CrossRef]
81. Hao K, Chen G, Zhao L, Li Z, Liu Y, Wang C. An insulator defect detection model in aerial images based on multiscale feature pyramid network. IEEE Trans Instrum Meas. 2022;71:1–12. doi:10.1109/tim.2022.3200861. [Google Scholar] [CrossRef]
82. Yang L, Fan J, Song S, Liu Y. A light defect detection algorithm of power insulators from aerial images for power inspection. Neural Comput Appl. 2022;34(20):17951–61. doi:10.1007/s00521-022-07437-5. [Google Scholar] [CrossRef]
83. Zan W, Dong C, Zhang Z, Chen X, Zhao J, Hao F. Defect identification of power line insulators based on a MobileViT-yolo deep learning algorithm. IEEJ Trans Elec Engng. 2023;18(8):1271–9. doi:10.1002/tee.23825. [Google Scholar] [CrossRef]
84. Xie J, Du Y, Liu Z, Liu H, Wang T, Miao M. Defect detection algorithm based on lightweight and improved YOLOv5s for visible light insulators. Power Syst Technol. 2023;47(12):5273–83. (In Chinese). doi:10.1109/icpst56889.2023.10165005. [Google Scholar] [CrossRef]
85. Tao X, Zhang D, Wang Z, Liu X, Zhang H, Xu D. Detection of power line insulator defects using aerial images analyzed with convolutional neural networks. IEEE Trans Syst Man Cybern, Syst. 2020;50(4):1486–98. doi:10.1109/tsmc.2018.2871750. [Google Scholar] [CrossRef]
86. Liu J, Liu C, Wu Y, Sun Z, Xu H. Insulators’ identification and missing defect detection in aerial images based on cascaded YOLO models. Comput Intell Neurosci. 2022;2022:7113765. doi:10.1155/2022/7113765. [Google Scholar] [PubMed] [CrossRef]
87. Ling Z, Qiu R, Jin Z, Zhang Y, He X, Liu H, et al. An accurate and real-time self-blast glass insulator location method based on Faster R-CNN and U-net with aerial images. CSEE J Power Energy Syst. 2019;99:1–8. doi:10.17775/cseejpes.2019.00460. [Google Scholar] [CrossRef]
88. Li C, Zhang Q, Chen W, Jiang X, Yuan B, Yang C. Insulator orientation detection based on deep learning. J Electron Inf Technol. 2020;42(4):1033–40. (In Chinese). [Google Scholar]
89. Jiang H, Qiu X, Chen J, Liu X, Miao X, Zhuang S. Insulator fault detection in aerial images based on ensemble learning with multi-level perception. IEEE Access. 2019;7:61797–810. doi:10.1109/access.2019.2915985. [Google Scholar] [CrossRef]
90. Shi C, Huang Y. Cap-count guided weakly supervised insulator cap missing detection in aerial images. IEEE Sens J. 2021;21(1):685–91. doi:10.1109/jsen.2020.3012780. [Google Scholar] [CrossRef]
91. Guo J, Chen B, Wang R, Wang J, Zhong L. YOLO-based real-time detection of power line poles from unmanned aerial vehicle inspection vision. Electr Power. 2019;52(7):17–23. (In Chinese). doi:10.1109/cyber46603.2019.9066764. [Google Scholar] [CrossRef]
92. Bian J, Hui X, Zhao X, Tan M. A monocular vision-based perception approach for unmanned aerial vehicle close proximity transmission tower inspection. Int J Adv Rob Syst. 2019;16:1729881418820227. doi:10.1177/1729881418820227. [Google Scholar] [CrossRef]
93. Hosseini MM, Umunnakwe A, Parvania M, Tasdizen T. Intelligent damage classification and estimation in power distribution poles using unmanned aerial vehicles and convolutional neural networks. IEEE Trans Smart Grid. 2020;11(4):3325–33. doi:10.1109/tsg.2020.2970156. [Google Scholar] [CrossRef]
94. Yetgin OE, Benligiray B, Gerek ON. Power line recognition from aerial images with deep learning. IEEE Trans Aerosp Electron Syst. 2019;55(5):2241–52. doi:10.1109/taes.2018.2883879. [Google Scholar] [CrossRef]
95. Nguyen VN, Jenssen R, Roverso D. LS-Net: fast single-shot line-segment detector. Mach Vis Appl. 2020;32(1):12. doi:10.1007/s00138-020-01138-6. [Google Scholar] [CrossRef]
96. Zhang H, Yang W, Yu H, Zhang H, Xia GS. Detecting power lines in UAV images with convolutional features and structured constraints. Remote Sens. 2019;11(11):1342. doi:10.3390/rs11111342. [Google Scholar] [CrossRef]
97. Lee SJ, Yun JP, Choi H, Kwon W, Koo G, Kim SW. Weakly supervised learning with convolutional neural networks for power line localization. In: Proceedings of the 2017 IEEE Symposium Series on Computational Intelligence (SSCI); 2017 Nov 27–Dec 1; Honolulu, HI, USA. doi:10.1109/ssci.2017.8285410. [Google Scholar] [CrossRef]
98. Chen M, Wang Y, Dai Y, Yan Y, Qi D. Small and strong: power line segmentation network in real time based on self-supervised learning. Proc CSEE. 2021;42(4):1365–75. (In Chinese). [Google Scholar]
99. Zhang K, Hou Q, Huang W. Defect detection of anti-vibration hammer based on improved Faster R-CNN. In: Proceedings of the 2020 7th International Forum on Electrical Engineering and Automation (IFEEA); 2020 Sep 25–27; Hefei, China. doi:10.1109/ifeea51475.2020.00004. [Google Scholar] [CrossRef]
100. Wang J, Zhang X, Zheng L, Sugisaka M. A study on the grading ring recognition method of power line based on deep learning. In: Proceedings of the 2018 International Conference on Information and Communication Technology Robotics (ICT-ROBOT); 2018 Sep 6–8; Busan, Republic of Korea. doi:10.1109/ict-robot.2018.8549883. [Google Scholar] [CrossRef]
101. Song Z, Huang X, Ji C, Zhang Y. Double-attention YOLO: vision transformer model based on image processing technology in complex environment of transmission line connection fittings and rust detection. Machines. 2022;10(11):1002. doi:10.3390/machines10111002. [Google Scholar] [CrossRef]
102. Zhang H, Wu L, Chen Y, Chen R, Kong S, Wang Y, et al. Attention-guided multitask convolutional neural network for power line parts detection. IEEE Trans Instrum Meas. 2022;71:1–13. doi:10.1109/tim.2022.3162615. [Google Scholar] [CrossRef]
103. Zhao Z, Qi H, Qi Y, Zhang K, Zhai Y, Zhao W. Detection method based on automatic visual shape clustering for pin-missing defect in transmission lines. IEEE Trans Instrum Meas. 2020;69(9):6080–91. doi:10.1109/tim.2020.2969057. [Google Scholar] [CrossRef]
104. Zhao Z, Wang R, Li Y, Zhai Y, Zhao W, Zhang K. A new multilabel recognition framework for transmission lines bolt defects based on the combination of semantic knowledge and structural knowledge. IEEE Trans Instrum Meas. 2022;71:1–11. doi:10.1109/tim.2022.3200103. [Google Scholar] [CrossRef]
105. Li G, Zhang Y, Wang W, Zhang D. Defect detection method of transmission line bolts based on DETR and prior knowledge fusion. J Graph. 2023;44(3):438–47. (In Chinese). doi:10.1109/tpwrd.2022.3161124. [Google Scholar] [CrossRef]
106. Zhang K, Lou W, Wang J, Zhou R, Guo X, Xiao Y, et al. PA-DETR: end-to-end visually indistinguishable bolt defects detection method based on transmission line knowledge reasoning. IEEE Trans Instrum Meas. 2023;72:1–14. doi:10.1109/tim.2023.3282302. [Google Scholar] [CrossRef]
107. Xu Q, Huang H, Zhang X, Zhou C, Wu S. Online fault diagnosis method for infrared image feature analysis of high-voltage lead connectors based on improved R-FCN. Trans China Electrotech Soc. 2021;36(7):1380–8. (In Chinese). doi:10.3390/electronics10050544. [Google Scholar] [CrossRef]
108. Li J, Xu Y, Nie K, Cao B, Zuo S, Zhu J. PEDNet: a lightweight detection network of power equipment in infrared image based on YOLOv4-tiny. IEEE Trans Instrum Meas. 2023;72:1–12. doi:10.1109/tim.2023.3235416. [Google Scholar] [CrossRef]
109. Wang Y, Li Y, Duan Y, Wu H. Infrared image recognition of substation equipment based on lightweight backbone network and attention mechanism. Power Syst Technol. 2023;47(10):4358–69. (In Chinese). doi:10.21203/rs.3.rs-8199319/v1. [Google Scholar] [CrossRef]
110. Zhou S, Liu J, Fan X, Fu Q, Goh HH. Thermal fault diagnosis of electrical equipment in substations using lightweight convolutional neural network. IEEE Trans Instrum Meas. 2023;72:1–9. doi:10.1109/tim.2023.3240210. [Google Scholar] [CrossRef]
111. Xiang X, Lv N, Guo X, Wang S, El Saddik A. Engineering vehicles detection based on modified faster R-CNN for power grid surveillance. Sensors. 2018;18(7):2258. doi:10.3390/s18072258. [Google Scholar] [PubMed] [CrossRef]
112. Zhang H, Zhou X, Shi Y, Guo X, Liu H. Object detection algorithm of transmission lines based on improved YOLOv5 framework. J Sens. 2024;2024:5977332. doi:10.1155/2024/5977332. [Google Scholar] [CrossRef]
113. Yu C, Liu Y, Zhang W, Zhang X, Zhang Y, Jiang X. Foreign objects identification of transmission line based on improved YOLOv7. IEEE Access. 2023;11:51997–2008. doi:10.1109/access.2023.3277954. [Google Scholar] [CrossRef]
114. Zhu J, Guo Y, Yue F, Yuan H, Yang A, Wang X, et al. A deep learning method to detect foreign objects for inspecting power transmission lines. IEEE Access. 2020;8:94065–75. doi:10.1109/access.2020.2995608. [Google Scholar] [CrossRef]
115. Li L, Chen P, Zhang Y, Mei B, Gong P, Yu H. Detection of power devices and abnormal objects in transmission lines based on improved CenterNet. High Volt Eng. 2023;49(11):4757–65. (In Chinese). doi:10.1117/12.3041533. [Google Scholar] [CrossRef]
116. Zhang J, Wang J, Song R, Zhang S, Jiao F. Research on efficient detection technology of transmission line abnormal target based on edge intelligence. Power Syst Technol. 2022;46(05):1652–61. (In Chinese). doi:10.1109/icamechs.2019.8861617. [Google Scholar] [CrossRef]
117. Qiu Z, Zhu X, Liao C, Qu W, Yu Y. A lightweight YOLOv4-EDAM model for accurate and real-time detection of foreign objects suspended on power lines. IEEE Trans Power Deliv. 2023;38(2):1329–40. doi:10.1109/tpwrd.2022.3213598. [Google Scholar] [CrossRef]
118. Hao Y, Liang W, Yang L, He J, Wu J. Methods of image recognition of overhead power line insulators and ice types based on deep weakly-supervised and transfer learning. IET Generation Trans Dist. 2022;16(11):2140–53. doi:10.1049/gtd2.12428. [Google Scholar] [CrossRef]
Cite This Article
Copyright © 2026 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools