Home / Journals / CMES / Online First / doi:10.32604/cmes.2025.074292
Special Issues
Table of Content

Open Access

ARTICLE

Superpixel-Aware Transformer with Attention-Guided Boundary Refinement for Salient Object Detection

Burhan Baraklı1,*, Can Yüzkollar2, Tuğrul Taşçı3, ˙Ibrahim Yıldırım2
1 Department of Electrical and Electronics Engineering, Sakarya University, Sakarya, 54050, Türkiye
2 Department of Computer Engineering, Sakarya University, Sakarya, 54050, Türkiye
3 Department of Information Systems Engineering, Sakarya University, Sakarya, 54050, Türkiye
* Corresponding Author: Burhan Baraklı. Email: barakli@sakarya.edu.tr
(This article belongs to the Special Issue: Advanced Image Segmentation and Object Detection: Innovations, Challenges, and Applications)

Computer Modeling in Engineering & Sciences https://doi.org/10.32604/cmes.2025.074292

Received 08 October 2025; Accepted 10 December 2025; Published online 31 December 2025

Abstract

Salient object detection (SOD) models struggle to simultaneously preserve global structure, maintain sharp object boundaries, and sustain computational efficiency in complex scenes. In this study, we propose SPSALNet, a task-driven two-stage (macro–micro) architecture that restructures the SOD process around superpixel representations. In the proposed approach, a “split-and-enhance” principle, introduced to our knowledge for the first time in the SOD literature, hierarchically classifies superpixels and then applies targeted refinement only to ambiguous or error-prone regions. At the macro stage, the image is partitioned into content-adaptive superpixel regions, and each superpixel is represented by a high-dimensional region-level feature vector. These representations define a regional decomposition problem in which superpixels are assigned to three classes: background, object interior, and transition regions. Superpixel tokens interact with a global feature vector from a deep network backbone through a cross-attention module and are projected into an enriched embedding space that jointly encodes local topology and global context. At the micro stage, the model employs a U-Net-based refinement process that allocates computational resources only to ambiguous transition regions. The image and distance–similarity maps derived from superpixels are processed through a dual-encoder pathway. Subsequently, channel-aware fusion blocks adaptively combine information from these two sources, producing sharper and more stable object boundaries. Experimental results show that SPSALNet achieves high accuracy with lower computational cost compared to recent competing methods. On the PASCAL-S and DUT-OMRON datasets, SPSALNet exhibits a clear performance advantage across all key metrics, and it ranks first on accuracy-oriented measures on HKU-IS. On the challenging DUT-OMRON benchmark, SPSALNet reaches a MAE of 0.034. Across all datasets, it preserves object boundaries and regional structure in a stable and competitive manner.

Keywords

Salient object detection; superpixel segmentation; transformers; attention mechanism; multi-level fusion; edge-preserving refinement; model-driven
  • 9

    View

  • 1

    Download

  • 0

    Like

Share Link