TY  - EJOU
AU  - Shen, Xueli 
AU  - Wang, Meng 

TI  - A Weakly Supervised Semantic Segmentation Method Based on Improved Conformer
T2  - Computers, Materials \& Continua

PY  - 2025
VL  - 82
IS  - 3
SN  - 1546-2226

AB  - In the field of Weakly Supervised Semantic Segmentation (WSSS), methods based on image-level annotation face challenges in accurately capturing objects of varying sizes, lacking sensitivity to image details, and having high computational costs. To address these issues, we improve the dual-branch architecture of the Conformer as the fundamental network for generating class activation graphs, proposing a multi-scale efficient weakly-supervised semantic segmentation method based on the improved Conformer. In the Convolution Neural Network (CNN) branch, a cross-scale feature integration convolution module is designed, incorporating multi-receptive field convolution layers to enhance the model’s ability to capture long-range dependencies and improve sensitivity to multi-scale objects. In the Vision Transformer (ViT) branch, an efficient multi-head self-attention module is developed, reducing unnecessary computation through spatial compression and feature partitioning, thereby improving overall network efficiency. Finally, a multi-feature coupling module is introduced to complement the features generated by both branches. This design retains the strength of Convolution Neural Network in extracting local details while harnessing the strength of Vision Transformer to capture comprehensive global features. Experimental results show that the mean Intersection over Union of the image segmentation results of the proposed method on the validation and test sets of the PASCAL VOC 2012 datasets are improved by 2.9% and 3.6%, respectively, over the TransCAM algorithm. Besides, the improved model demonstrates a 1.3% increase of the mean Intersections over Union on the COCO 2014 datasets. Additionally, the number of parameters and the floating-point operations are reduced by 16.2% and 12.9%. However, the proposed method still has limitations of poor performance when dealing with complex scenarios. There is a need for further enhancing the performance of this method to address this issue.
KW  - WSSS; CAM; transformer; CNN; multi-scale feature extraction; lightweight

DO  - 10.32604/cmc.2025.059149