Open Access
ARTICLE
DSGF-Net: A Dense-SE Gated-Fusion Architecture for High-Accuracy Small Object Detection in UAV Imagery
School of Mathematical Sciences, Dalian Minzu University, Dalian, China
* Corresponding Author: Hongmei Liu. Email:
Computers, Materials & Continua 2026, 88(2), 26 https://doi.org/10.32604/cmc.2026.074281
Received 07 October 2025; Accepted 22 December 2025; Issue published 15 June 2026
Abstract
To address the critical challenges of small object detection in UAV imagery, this paper proposes DSGF-Net (Dense-SE Gated-Fusion Network), an enhanced architecture built upon YOLOv10. It integrates a Dense SE Network (DSENet) backbone, an Adaptive Gated Fusion (AGF) module, and a Channel-Spatial Attention (CSA) mechanism. Extensive experiments on VisDrone2019-DET and CODrone demonstrate that DSGF-Net achieves substantial mAP@0.5 improvements of 5.12% and 2.36% over the YOLOv10n baseline.Keywords
Unmanned Aerial Vehicle (UAV) technology, with its high mobility, low cost, and efficient data collection capabilities, has emerged as an indispensable instrument in intelligent sensing. Object detection, a pivotal task within the realm of computer vision, especially deep learning methodologies exemplified by the YOLO (You Only Look Once) series, has emerged as the prevailing technology for real-time object detection in UAVs, owing to its optimal balance between computational efficiency and detection accuracy. Given the widespread deployment of UAVs in critical domains, continuous optimization of UAV small object detection technology holds significant research value and practical importance.
Despite advancements in backbone network optimization, multi-scale feature fusion, and feature representation enhancement, UAV small object detection confronts three fundamental architectural limitations:
(1) Sparse deep-layer feature representation causing channel discrimination loss: Traditional backbones sparsely deploy attention in deep layers. When small objects (occupying <0.1% image area after downsampling to 1/32 resolution) are compressed to 1–3 pixels, lacking dense channel-level feature recalibration results in weak discriminative signals being overwhelmed by background noise. Statistical analysis reveals that small objects (<32
(2) Inflexible weight allocation in multi-scale fusion: Current feature pyramid networks employ uniform combinatorial operators (element-wise addition or concatenation), applying equal weights to multi-scale features. When information-rich shallow features compete with background-dominated deep features (background pixels exceeding 99.2% in P5 layer), fixed weights cause dilution. This stems from lacking learnable scale-adaptive mechanisms to dynamically adjust contribution weights based on target distribution, systematically weakening small object signals during fusion.
(3) Decoupled attention optimization bottleneck: Current methods apply either channel attention (e.g., SE) or spatial attention alone, failing to model coupled distributions of small objects across channel importance and spatial saliency. Small objects exhibit weak features in both dimensions: dispersed discriminative information in channels (low attention variance) and minimal spatial occupancy (<0.1%). Single-dimension optimization creates information bottlenecks, unable to simultaneously enhance discrimination in both dimensions. Sequential cascading (e.g., CBAM) suffers from information loss in preceding operations, while parallel schemes without adaptive fusion coefficients cannot dynamically balance dual-pathway contributions.
In response to these challenges, this study introduces an enhanced algorithm named DSGF-Net. The main contributions include: (1) Construction of a Dense SE Network (DSENet) as the backbone architecture, significantly enhancing multi-scale feature capture capability for small objects by densely deploying SE attention modules in deeper layers combined with reparameterization structures; (2) Design of an Adaptive Gated Fusion module (AGF), replacing traditional feature fusion methods with a gating mechanism, adaptively adjusting contribution weights of different level features through learnable parameters, effectively reducing information loss during fusion; (3) Proposal of a Channel-Spatial Attention mechanism (CSA), enhancing feature representation capabilities in both channel and spatial dimensions through a dual-pathway parallel framework.
YOLO [2] dominates real-time detection. YOLOv10 [3] eliminates NMS bottlenecks via dual assignment. However, UAV small objects remain challenging. Recent UAV-YOLO advances focus on three aspects:
Backbone Network Optimization: Recent works enhance detection through improved backbone architectures. BGF-YOLOv10 [4] integrates Multi-Head Self-Attention mechanisms via BoTNet layers to capture global context while reducing parameters. YOLO-LSD [5] incorporates attention mechanisms into YOLOv7 [6] to improve feature extraction efficiency for distant small objects. YOLOv8 [7] employs the C2f module for extracting multi-level semantic features.
Multi-scale Feature Fusion: Effective fusion strategies are critical for small object detection. YOLO-SAIL [8] utilizes bidirectional feature pyramid networks to enhance multi-scale discrimination in SAR imagery. DFTD-YOLO [9] balances shallow and deep information transmission through specialized extraction and aggregation modules. YOLO-MS [10] employs hierarchical multi-branch structures to enrich cross-scale feature representation. Recently, Bi et al. [11] proposed a region-adaptive feature distribution equalization (RAFDE) strategy that applies distinct fusion mechanisms for co-activated and single-activated regions, effectively reducing the risk of small object features being overwhelmed by dominant features during fusion.
Feature Representation Enhancement: Attention mechanisms have proven effective for enhancing feature quality. YOLO-SAIL [8] further optimizes dense target representation by fusing contextual cues and global interdependencies. Additionally, Bi et al.’s [11] boundary transition region detector (BTRD) module enhances boundary transition regions, mitigating critical information loss of small objects during downsampling.
Downstream Applications: Object detection, as a fundamental visual perception task, provides core support for multiple advanced applications. In scene graph generation (SGG) [12], precise object detection serves as the cornerstone for constructing structured scene representations. Improvements in small object detection directly enhance performance of these downstream tasks in complex scenes, particularly in UAV aerial applications containing numerous small objects.
DSGF-Net (Fig. 1) employs YOLOv10 [3] with three innovations: DSENet (backbone), AGF (neck), and CSA (representation).

Figure 1: Overall network architecture of DSGF-Net
DSENet addresses insufficient feature extraction for small objects. Drawing from RepViT [1]’s reparameterization structure, we propose a progressive dense SE deployment strategy specifically for small object detection, distinct from RepViT’s uniform sparse design (

Figure 2: Architecture of the proposed Dense SE Network (DSENet)
Visualization Verification: Fig. 3 compares feature activation patterns of Baseline, RepViT, and DSENet across four depth levels (P2-P5). DSENet demonstrates superior small-object activation in deep layers (Stage 4, P5/32), achieving +2585.9% SNR improvement over RepViT (0.0228

Figure 3: DSENet multi-stage feature response comparison across P2-P5 depth levels, showing superior small object activation with quantified SNR improvements
The core of DSENet lies in the design of DAU units. For input feature map
Channel mixing achieves cross-channel information interaction through residual structure:
This design combines efficient spatial feature extraction with dense channel attention, learning robust feature representations during training and can be reparameterized into a single convolution layer during inference, significantly enhancing the perception capability of key features for small objects.
3.2 Adaptive Gated Fusion Module (AGF)
To overcome the information loss problem in traditional feature fusion strategies, we design the Adaptive Gated Fusion module (AGF). Unlike fixed-weight attentional fusion, we propose a learnable gating parameter mechanism: the

Figure 4: Structure of the proposed Adaptive Gated Fusion (AGF) module
AGF consists of three components: dual-branch channel gating unit, enhanced local gating unit, and adaptive fusion mechanism. The dual-branch channel gating unit employs global pooling in parallel to capture channel statistics, processed through adaptive one-dimensional convolution:
where
The enhanced local gating unit integrates H-W decomposition attention and spatial attention, combined through learnable parameters (
The core gating fusion mechanism enhances input features through dual-branch attention and introduces learnable parameter
Through this design, AGF achieves comprehensive enhancement of features, addressing the information dilution problem and providing information-rich fused features.
AGF vs. NAS: While employing classic operations, AGF embeds task-specific inductive bias (scale imbalance) into differentiable parameters (
Scale-adaptive mechanism:
3.3 Channel-Spatial Attention Module (CSA)
To enhance weak feature representation of small objects, we design the Channel-Spatial Attention module (CSA). Unlike simple weighted combinations with fixed coefficients, CSA employs nn.Parameter tensors (

Figure 5: Architecture of the Channel-Spatial Attention (CSA) module
This module receives feature map
The spatial attention branch identifies important spatial regions, generating attention maps through channel dimension pooling and convolution:
The core innovation of CSA lies in its adaptive fusion mechanism, dynamically weighting the two types of attention through learnable parameters:
This design enables the module to precisely recalibrate features based on comprehensive information, significantly improving the perception and discrimination capabilities of weak features.
Fig. 6 validates CSA’s feature enhancement on dense small object scenes. Baseline model shows scattered channel attention (

Figure 6: CSA feature representation visualization showing 1.24
In order to systematically validate the efficacy and superiority of the proposed DSGF-Net algorithm in UAV-based small object detection tasks, we formulated and executed a series of extensive ablation analyses and comparative assessments. This section elaborates on the datasets, software and hardware infrastructures, hyperparameter settings, and assessment criteria employed for measuring model performance.
We evaluate on two UAV benchmarks: VisDrone2019-DET [15] and CODrone [16]. VisDrone2019-DET features high-density small objects with severe class imbalance (head class “car”: 187,005 vs. tail class “awning-tricycle”: 4377, 40

4.1.2 Experimental Environment and Hyperparameter Settings
All models were trained under identical settings (Table 2).

We evaluate using standard metrics: Precision, Recall, F1-Score, and mAP. We report mAP@0.5, mAP@0.75, and mAP@0.5:0.95 (COCO standard) across IoU thresholds.
We conducted comprehensive comparative experiments under identical settings to verify each component’s effectiveness. Table 3 summarizes the detailed comparative experimental results.

Specifically, for baseline comparison, we selected YOLOv5 [17], YOLOv6 [18], YOLOv8 [7], and YOLOv10n [3], as well as Transformer-based RT-DETR [19] to ensure comprehensive evaluation across architectural paradigms. For backbone networks, we compared HGNetV2 [19], EfficientViT [20], ConvNeXtV2, and FasterNet [21] against our DSENet. For feature fusion modules, we evaluated CARAFE [22] and CGAFusion alongside our AGF. For attention mechanisms, we compared MSGA [23], SimAM [24], and CPCA with our CSA. Additionally, to comprehensively address the Reviewer’s concern, we evaluated RT-DETR integrated with AIFI-SHSA [25] and Cascaded Group Attention [20] to compare Transformer-based architectures with traditional attention mechanisms.
The baseline comparison validates YOLOv10n’s suitability as the foundational architecture (mAP50: 32.86%). DSGF-Net achieved mAP50 of 37.98% (+5.12 pp) and mAP50-95 of 22.27% (+3.54 pp), demonstrating substantial improvements. We further evaluated RT-DETR with AIFI-SHSA (29.96% mAP50) and CGA (29.62% mAP50), both marginally improving vanilla RT-DETR (29.78%, <0.2 pp gain) yet 7.85–8.35 pp lower than DSGF-Net (37.97%). This validates: (1) generic attention additions provide minimal gains for Transformer-based small object detection, confirming O(
Component-wise analysis reveals distinct contributions: DSENet as the backbone yields the most significant gain (mAP50: 36.98%, +4.12 pp), attributed to its structure reparameterization and optimized attention distribution, outperforming alternatives like FasterNet and ConvNeXtV2. The AGF module achieves mAP50 of 34.04% (+1.18 pp) through its adaptive gating strategy, surpassing CARAFE and CGAFusion while maintaining computational efficiency. The CSA module attains mAP50 of 33.65% and F1 score of 38.32%, enhancing feature representation without additional computational overhead.
Table 4 presents the multi-dataset comparison results. To comprehensively evaluate the generalization ability and robustness of the DSGF-Net model, we deployed it on two UAV aerial datasets with significantly different characteristics—VisDrone and CODrone—and compared its performance with the baseline model YOLOv10n.

On the classic VisDrone dataset, DSGF-Net demonstrated excellent performance advantages, achieving an mAP@0.5 of 37.97%, a significant improvement of 5.11 percentage points over the baseline model YOLOv10n. On the more challenging CODrone dataset, DSGF-Net still achieved an mAP@0.5 of 24.28%, an improvement of 2.36 percentage points over the baseline. This ability to maintain stable performance gains on datasets of different complexity and characteristics fully demonstrates DSGF-Net’s strong robustness and generalization potential, proving its universal improvement capability to address different real-world challenges.
Robustness Evaluation: To validate DSGF-Net’s adaptability in real UAV scenarios, we evaluated performance under different flight altitudes and motion blur conditions. As shown in Fig. 7, by adjusting validation image sizes to simulate altitude variations (50 m/100 m/200 m corresponding to 960/640/320 pixels), DSGF-Net maintains significant advantages over baseline at all altitudes (42.19%/37.97%/21.09% vs. 37.51%/32.86%/17.74%). In motion blur tests (blur kernel 0/5/11), DSGF-Net similarly demonstrates stronger anti-interference capability (37.97%/35.12%/23.37% vs. 32.86%/30.54%/20.54%). Notably, under extreme conditions (200 m altitude or kernel = 11 blur), DSGF-Net still maintains 3–4 percentage point performance advantages, attributed to DSENet’s dense attention mechanism and AGF’s adaptive fusion enhancing feature robustness, validating the practical value of the proposed architecture in complex environments.

Figure 7: Robustness evaluation results. (a) Detection performance under varying flight altitudes of 50, 100, and 200 m (corresponding to image sizes of 960, 640, and 320 pixels). (b) Performance under motion blur conditions with Gaussian kernel sizes of 0, 5, and 11. DSGF-Net maintains 3–4 percentage point advantages over baseline in all conditions
To explore the synergistic gains between modules and validate the reasonability of the overall architecture, we conducted ablation experiments. Table 5 presents the comprehensive ablation study results with computational complexity analysis.

As illustrated in Fig. 8, the results indicate that the synergy between modules is not a simple performance addition, but follows a clear functional complementary logic.

Figure 8: Visual comparison of ablation study results on mAP@0.5 metric
DSENet (B) serves as the performance cornerstone; its presence or absence is key to determining the model’s performance level. All combinations containing DSENet significantly outperform other combinations, again proving that high-quality initial feature extraction is a prerequisite for achieving excellent detection accuracy.
AGF (C) module plays a crucial “bridge” role in synergy. With DSENet providing high-quality features, the introduction of AGF (B+C combination) can further bring significant performance improvements, increasing mAP50 from 36.98% to 37.61%. This strongly proves that AGF’s adaptive gating mechanism can efficiently integrate multi-scale feature flows produced by DSENet, achieving a “
Finally, the complete DSGF-Net (A+B+C) model reached the peak of performance, benefiting from the functional complementarity and progressive optimization of the three modules as an organic whole. In this architecture, DSENet is responsible for building a high-quality feature foundation; AGF serves as the central hub, optimizing and integrating multi-scale information flows; finally, the lightweight CSA performs final fine-tuning enhancement on the already highly optimized feature maps. This design with clear division of labor and high synergy makes the comprehensive performance of the entire system exceed the effect of simple addition of various parts, fully validating the scientific nature and advancement of our overall architectural design.
Computational complexity and performance trade-off analysis: From Table 5’s complexity metrics, complete DSGF-Net increases 193.8% parameters (2.27M
The proposed DSGF-Net achieves an mAP@0.5 of 37.97% on VisDrone (+5.12 pp over YOLOv10n) driven by three core innovations: DSENet (dense SE in deep layers), AGF (learnable gating), and CSA (parallel dual-pathway attention). CODrone experiments confirm generalization. Ablation studies validate synergistic gains and negative synergy in A+C. Future avenues include: (1) model lightweighting through knowledge distillation and quantization for edge device deployment; (2) enhancing cross-scenario adaptability via domain adaptation techniques; and (3) exploring adaptive spatiotemporal association, leveraging temporal consistency across consecutive UAV frames to reduce false positives and extend to video-based UAV detection.
Acknowledgement: Not applicable.
Funding Statement: The authors received no specific funding for this study.
Author Contributions: The authors confirm contribution to the paper as follows: Conceptualization, Changzhu Shi and Hongmei Liu; methodology, Changzhu Shi; software, Changzhu Shi; validation, Changzhu Shi and Hongmei Liu; formal analysis, Changzhu Shi; investigation, Changzhu Shi; resources, Hongmei Liu; data curation, Changzhu Shi; writing—original draft preparation, Changzhu Shi; writing—review and editing, Changzhu Shi and Hongmei Liu; visualization, Changzhu Shi; supervision, Hongmei Liu; project administration, Hongmei Liu; funding acquisition, not applicable. All authors reviewed the results and approved the final version of the manuscript.
Availability of Data and Materials: The VisDrone2019-DET dataset used in this study is publicly available at https://github.com/VisDrone/VisDrone-Dataset. The CODrone dataset is publicly available at https://github.com/AHideoKuzeA/CODrone-A-Comprehensive-Oriented-Object-Detection-benchmark-for-UAV. The code implementing the proposed DSGF-Net model is available at https://github.com/KtevenCroft/DSGF-Net-main.
Ethics Approval: Not applicable.
Conflicts of Interest: The authors declare no conflicts of interest to report regarding the present study.
Abbreviations
| UAV | Unmanned Aerial Vehicle |
| YOLO | You Only Look Once |
| DSGF-Net | Dense-SE Gated-Fusion Network |
| DSENet | Dense SE Network |
| SE | Squeeze-and-Excitation |
| DAU | Dense Attention Unit |
| AGF | Adaptive Gated Fusion |
| CSA | Channel-Spatial Attention |
| mAP | mean Average Precision |
| AP | Average Precision |
| IoU | Intersection over Union |
| TP | True Positives |
| FP | False Positives |
| FN | False Negatives |
| R-CNN | Region-based Convolutional Neural Network |
| SSD | Single Shot MultiBox Detector |
| CNN | Convolutional Neural Network |
| NMS | Non-Maximum Suppression |
References
1. Wang A, Chen H, Lin Z, Han J, Ding G. RepViT: revisiting mobile CNN from ViT perspective. In: Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2024 Jun 16–22; Seattle, WA, USA. p. 15909–20. doi:10.1109/cvpr52733.2024.01506. [Google Scholar] [PubMed] [CrossRef]
2. Jiang P, Ergu D, Liu F, Cai Y, Ma B. A review of YOLO algorithm developments. Procedia Comput Sci. 2022;199:1066–73. doi:10.1016/j.procs.2022.01.135. [Google Scholar] [CrossRef]
3. Wang A, Chen H, Liu L, Chen K, Lin Z, Han J, et al. YOLOv10: real-time end-to-end object detection. Adv Neural Inf Process Syst. 2024;37:107984–8011. doi:10.2139/ssrn.4289242. [Google Scholar] [CrossRef]
4. Mei J, Zhu W. BGF-YOLOv10: small object detection algorithm from unmanned aerial vehicle perspective based on improved YOLOv10. Sensors. 2024;24(21):6911. doi:10.3390/s24216911. [Google Scholar] [PubMed] [CrossRef]
5. Chung MA, Chai SY, Hsieh MC, Lin CW, Chen KX, Huang SJ, et al. YOLO-LSD: a lightweight object detection model for small targets at long distances to secure pedestrian safety. IEEE Access. 2025;13:83061–70. doi:10.1109/access.2025.3567843. [Google Scholar] [PubMed] [CrossRef]
6. Wang CY, Bochkovskiy A, Liao HYM. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2023 Jun 17–24; Vancouver, BC, Canada. p. 7464–75. doi:10.1109/cvpr52729.2023.00721. [Google Scholar] [PubMed] [CrossRef]
7. Sohan M, Sai Ram T, Rami Reddy CV. A review on YOLOv8 and its advancements. In: International Conference on Data Intelligence and Cognitive Informatics. Singapore: Springer; 2024. p. 529–45. doi:10.1007/978-981-99-7962-2_39. [Google Scholar] [PubMed] [CrossRef]
8. Selvam P, Sundari PS, Suresh T, Tamilselvi M, Murugappan M, Chowdhury MEH. YOLO-SAIL: attention-enhanced YOLOv5 with optimized Bi-FPN for ship target detection in SAR images. IEEE Access. 2025;13:29523–40. doi:10.1109/access.2025.3536621. [Google Scholar] [PubMed] [CrossRef]
9. Chen Y, Liu Z. DFTD-YOLO: lightweight multi-target detection from unmanned aerial vehicle viewpoints. IEEE Access. 2025;13(1):24672–80. doi:10.1109/access.2025.3535624. [Google Scholar] [PubMed] [CrossRef]
10. Chen Y, Yuan X, Wang J, Wu R, Li X, Hou Q, et al. YOLO-MS: rethinking multi-scale representation learning for real-time object detection. IEEE Trans Pattern Anal Mach Intell. 2025;47(6):4240–52. doi:10.1109/tpami.2025.3538473. [Google Scholar] [PubMed] [CrossRef]
11. Bi Y, Ning Y, Nie X, Lu X, Gong Y, Li L. Towards region-adaptive feature disentanglement and enhancement for small object detection. In: Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI); 2023 Aug 19–25; Jeju, Republic of Korea. p. 697–705. doi:10.24963/ijcai.2024/78. [Google Scholar] [CrossRef]
12. Fime AA, Mahmud S, Das A, Islam MS, Kim JH. Automatic scene generation: state-of-the-art techniques, models, datasets, challenges, and future prospects. IEEE Access. 2025;13:1–30. doi:10.1109/access.2025.3574298. [Google Scholar] [PubMed] [CrossRef]
13. Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition; 2018 Jun 18–23; Salt Lake City, UT, USA. p. 7132–41. doi:10.1109/cvpr.2018.00745. [Google Scholar] [PubMed] [CrossRef]
14. Sun H, Wen Y, Feng H, Zheng Y, Mei Q, Ren D, et al. Unsupervised bidirectional contrastive reconstruction and adaptive fine-grained channel attention networks for image dehazing. Neural Netw. 2024;176:106314. doi:10.1016/j.neunet.2024.106314. [Google Scholar] [PubMed] [CrossRef]
15. Du D, Zhu P, Wen L, Bian X, Lin H, Hu Q, et al. VisDrone-DET2019: the vision meets drone object detection in image challenge results. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops; 2019 Oct 27–28; Seoul, Republic of Korea. p. 213–26. doi:10.1109/iccvw54120.2021.00316. [Google Scholar] [PubMed] [CrossRef]
16. Ye K, Tang H, Liu B, Dai P, Cao L, Ji R. More clear, more flexible, more precise: a comprehensive oriented object detection benchmark for UAV. arXiv:2504.20032. 2025. [Google Scholar]
17. Zhang Y, Guo Z, Wu J, Tian Y, Tang H, Guo X. Real-time vehicle detection based on improved YOLO v5. Sustainability. 2022;14(19):12274. doi:10.3390/su141912274. [Google Scholar] [CrossRef]
18. Li C, Li L, Jiang H, Weng K, Geng Y, Li L, et al. YOLOv6: a single-stage object detection framework for industrial applications. arXiv:2209.02976. 2022. [Google Scholar]
19. Zhao Y, Lv W, Xu S, Wei J, Wang G, Dang Q, et al. DETRs beat YOLOs on real-time object detection. In: Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2024 Jun 16–22; Seattle, WA, USA. p. 16965–74. doi:10.1109/cvpr52733.2024.01605. [Google Scholar] [PubMed] [CrossRef]
20. Liu X, Peng H, Zheng N, Yang Y, Hu H, Yuan Y. EfficientViT: memory efficient vision transformer with cascaded group attention. In: Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2023 Jun 17–24; Vancouver, BC, Canada. p. 14420–30. doi:10.1109/cvpr52729.2023.01386. [Google Scholar] [PubMed] [CrossRef]
21. Yang F, Huang L, Tan X, Yuan Y. FasterNet-SSD: a small object detection method based on SSD model. Signal Image Video Process. 2024;18(1):173–80. doi:10.1007/s11760-023-02726-5. [Google Scholar] [CrossRef]
22. Wang J, Chen K, Xu R, Liu Z, Loy CC, Lin D. CARAFE: content-aware reassembly of features. In: Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision; 2019 Oct 27–Nov 2; Seoul, Republic of Korea. p. 3007–16. doi:10.1109/iccv.2019.00310. [Google Scholar] [PubMed] [CrossRef]
23. Gong Z, Xiao G, Shi Z, Chen R, Yu J. MSGA-Net: progressive feature matching via multi-layer sparse graph attention. IEEE Trans Circuits Syst Video Technol. 2024;34(7):5765–75. doi:10.1109/tcsvt.2024.3366912. [Google Scholar] [PubMed] [CrossRef]
24. Yang L, Zhang RY, Li L, Xie X. SimAM: a simple, parameter-free attention module for convolutional neural networks. In: Proceedings of the 38th International Conference on Machine Learning; 2021 Jul 18–24; Virtual. p. 11863–74. doi:10.1109/mlbdbi51377.2020.00079. [Google Scholar] [PubMed] [CrossRef]
25. Yun S, Ro Y. SHViT: single-head vision transformer with memory efficient macro design. In: Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2024 Jun 16–22; Seattle, WA, USA. p. 5756–67. doi:10.1109/cvpr52733.2024.00550. [Google Scholar] [PubMed] [CrossRef]
Cite This Article
Copyright © 2026 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools