Open Access
ARTICLE
Encoder-Guided Latent Space Search Based on Generative Networks for Stereo Disparity Estimation in Surgical Imaging
1 School of Automation, University of Electronic Science and Technology of China, Chengdu, 611731, China
2 School of the Environment, The University of Queensland, Brisbane, QLD 4072, Australia
3 Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
4 Department of Geography, Texas A&M University, College Station, TX 77843, USA
5 School of Artificial Intelligence, Guangzhou Huashang university, Guangzhou, 511300, China
* Corresponding Authors: Siyu Lu. Email: ; Wenfeng Zheng. Email:
(This article belongs to the Special Issue: Recent Advances in Signal Processing and Computer Vision)
Computer Modeling in Engineering & Sciences 2025, 145(3), 4037-4053. https://doi.org/10.32604/cmes.2025.074901
Received 21 October 2025; Accepted 01 December 2025; Issue published 23 December 2025
Abstract
Robust stereo disparity estimation plays a critical role in minimally invasive surgery, where dynamic soft tissues, specular reflections, and data scarcity pose major challenges to traditional end-to-end deep learning and deformable model-based methods. In this paper, we propose a novel disparity estimation framework that leverages a pretrained StyleGAN generator to represent the disparity manifold of Minimally Invasive Surgery (MIS) scenes and reformulates the stereo matching task as a latent-space optimization problem. Specifically, given a stereo pair, we search for the optimal latent vector in the intermediate latent space of StyleGAN, such that the photometric reconstruction loss between the stereo images is minimized while regularizing the latent code to remain within the generator’s high-confidence region. Unlike existing encoder-based embedding methods, our approach directly exploits the geometry of the learned latent space and enforces both photometric consistency and manifold prior during inference, without the need for additional training or supervision. Extensive experiments on stereo-endoscopic videos demonstrate that our method achieves high-fidelity and robust disparity estimation across varying lighting, occlusion, and tissue dynamics, outperforming Thin Plate Spline (TPS)-based and linear representation baselines. This work bridges generative modeling and 3D perception by enabling efficient, training-free disparity recovery from pre-trained generative models with reduced inference latency.Keywords
Cite This Article
Copyright © 2025 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools