Open Access
ARTICLE
Semantic-Guided Stereo Matching Network Based on Parallax Attention Mechanism and SegFormer
School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou, 450001, China
* Corresponding Author: Yingqiang Ding. Email:
Computers, Materials & Continua 2026, 87(1), 54 https://doi.org/10.32604/cmc.2025.073846
Received 27 September 2025; Accepted 25 November 2025; Issue published 10 February 2026
Abstract
Stereo matching is a pivotal task in computer vision, enabling precise depth estimation from stereo image pairs, yet it encounters challenges in regions with reflections, repetitive textures, or fine structures. In this paper, we propose a Semantic-Guided Parallax Attention Stereo Matching Network (SGPASMnet) that can be trained in unsupervised manner, building upon the Parallax Attention Stereo Matching Network (PASMnet). Our approach leverages unsupervised learning to address the scarcity of ground truth disparity in stereo matching datasets, facilitating robust training across diverse scene-specific datasets and enhancing generalization. SGPASMnet incorporates two novel components: a Cross-Scale Feature Interaction (CSFI) block and semantic feature augmentation using a pre-trained semantic segmentation model, SegFormer, seamlessly embedded into the parallax attention mechanism. The CSFI block enables effective fusion of multi-scale features, integrating coarse and fine details to enhance disparity estimation accuracy. Semantic features, extracted by SegFormer, enrich the parallax attention mechanism by providing high-level scene context, significantly improving performance in ambiguous regions. Our model unifies these enhancements within a cohesive architecture, comprising semantic feature extraction, an hourglass network, a semantic-guided cascaded parallax attention module, output module, and a disparity refinement network. Evaluations on the KITTI2015 dataset demonstrate that our unsupervised method achieves a lower error rate compared to the original PASMnet, highlighting the effectiveness of our enhancements in handling complex scenes. By harnessing unsupervised learning without ground truth disparity needed, SGPASMnet offers a scalable and robust solution for accurate stereo matching, with superior generalization across varied real-world applications.Keywords
Cite This Article
Copyright © 2026 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools