Context Patch Fusion with Class Token Enhancement for Weakly Supervised Semantic Segmentation

Yiyang Fu; Hui Li; Wangyu Wu

doi:10.32604/cmes.2025.074467

Open Access icon Open Access

ARTICLE

Context Patch Fusion with Class Token Enhancement for Weakly Supervised Semantic Segmentation

Yiyang Fu¹, Hui Li^2,*, Wangyu Wu^3,*

1 School of Cyber Science and Engineering, Wuxi University, Wuxi, 214105, China
2 School of Informatics, Xiamen University, Xiamen, 361005, China
3 School of Computer Science, University of Liverpool, Liverpool, L69 7ZX, UK

* Corresponding Authors: Hui Li. Email: email ; Wangyu Wu. Email: email

(This article belongs to the Special Issue: Advanced Image Segmentation and Object Detection: Innovations, Challenges, and Applications)

Computer Modeling in Engineering & Sciences 2026, 146(1), 37 https://doi.org/10.32604/cmes.2025.074467

Received 11 October 2025; Accepted 25 December 2025; Issue published 29 January 2026

Abstract

Weakly Supervised Semantic Segmentation (WSSS), which relies only on image-level labels, has attracted significant attention for its cost-effectiveness and scalability. Existing methods mainly enhance inter-class distinctions and employ data augmentation to mitigate semantic ambiguity and reduce spurious activations. However, they often neglect the complex contextual dependencies among image patches, resulting in incomplete local representations and limited segmentation accuracy. To address these issues, we propose the Context Patch Fusion with Class Token Enhancement (CPF-CTE) framework, which exploits contextual relations among patches to enrich feature representations and improve segmentation. At its core, the Contextual-Fusion Bidirectional Long Short-Term Memory (CF-BiLSTM) module captures spatial dependencies between patches and enables bidirectional information flow, yielding a more comprehensive understanding of spatial correlations. This strengthens feature learning and segmentation robustness. Moreover, we introduce learnable class tokens that dynamically encode and refine class-specific semantics, enhancing discriminative capability. By effectively integrating spatial and semantic cues, CPF-CTE produces richer and more accurate representations of image content. Extensive experiments on PASCAL VOC 2012 and MS COCO 2014 validate that CPF-CTE consistently surpasses prior WSSS methods.

Keywords

Weakly supervised; semantic segmentation; context-fusion; class enhancement

Cite This Article

APA Style

Fu, Y., Li, H., Wu, W. (2026). Context Patch Fusion with Class Token Enhancement for Weakly Supervised Semantic Segmentation. Computer Modeling in Engineering & Sciences, 146(1), 37. https://doi.org/10.32604/cmes.2025.074467

Vancouver Style

Fu Y, Li H, Wu W. Context Patch Fusion with Class Token Enhancement for Weakly Supervised Semantic Segmentation. Comput Model Eng Sci. 2026;146(1):37. https://doi.org/10.32604/cmes.2025.074467

IEEE Style

Y. Fu, H. Li, and W. Wu, “Context Patch Fusion with Class Token Enhancement for Weakly Supervised Semantic Segmentation,” Comput. Model. Eng. Sci., vol. 146, no. 1, pp. 37, 2026. https://doi.org/10.32604/cmes.2025.074467

BibTex EndNote RIS

Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Context Patch Fusion with Class Token Enhancement for Weakly Supervised Semantic Segmentation

Abstract

Keywords

Cite This Article

313

171

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link