Open Access iconOpen Access

ARTICLE

Metacognition Inspired Reflective Chain-of-Thought for Knowledge-Based VQA

Zhongfan Sun, Kan Guo, Yongli Hu*, Yong Zhang

School of Information Science and Technology, Beijing University of Technology, Beijing, 100124, China

* Corresponding Author: Yongli Hu. Email: email

(This article belongs to the Special Issue: Advances in Large Models and Domain-specific Applications)

Computers, Materials & Continua 2026, 87(1), 80 https://doi.org/10.32604/cmc.2025.072903

Abstract

Knowledge-based Visual Question Answering (VQA) requires the integration of visual information with external knowledge reasoning. Existing approaches typically retrieve information from external corpora and rely on pretrained language models for reasoning. However, their performance is often hindered by the limited capabilities of retrievers and the constrained size of knowledge bases. Moreover, relying on image captions to bridge the modal gap between visual and language modalities can lead to the omission of critical visual details. To address these limitations, we propose the Reflective Chain-of-Thought (ReCoT) method, a simple yet effective framework inspired by metacognition theory. ReCoT effectively activates the reasoning capabilities of Multimodal Large Language Models (MLLMs), providing essential visual and knowledge cues required to solve complex visual questions. It simulates a metacognitive reasoning process that encompasses monitoring, reflection, and correction. Specifically, in the initial generation stage, an MLLM produces a preliminary answer that serves as the model’s initial cognitive output. During the reflective reasoning stage, this answer is critically examined to generate a reflective rationale that integrates key visual evidence and relevant knowledge. In the final refinement stage, a smaller language model leverages this rationale to revise the initial prediction, resulting in a more accurate final answer. By harnessing the strengths of MLLMs in visual and knowledge grounding, ReCoT enables smaller language models to reason effectively without dependence on image captions or external knowledge bases. Experimental results demonstrate that ReCoT achieves substantial performance improvements, outperforming state-of-the-art methods by 2.26% on OK-VQA and 5.8% on A-OKVQA.

Keywords

Knowledge-based VQA; metacognition; reflective chain-of-thought; answer refinement

Cite This Article

APA Style
Sun, Z., Guo, K., Hu, Y., Zhang, Y. (2026). Metacognition Inspired Reflective Chain-of-Thought for Knowledge-Based VQA. Computers, Materials & Continua, 87(1), 80. https://doi.org/10.32604/cmc.2025.072903
Vancouver Style
Sun Z, Guo K, Hu Y, Zhang Y. Metacognition Inspired Reflective Chain-of-Thought for Knowledge-Based VQA. Comput Mater Contin. 2026;87(1):80. https://doi.org/10.32604/cmc.2025.072903
IEEE Style
Z. Sun, K. Guo, Y. Hu, and Y. Zhang, “Metacognition Inspired Reflective Chain-of-Thought for Knowledge-Based VQA,” Comput. Mater. Contin., vol. 87, no. 1, pp. 80, 2026. https://doi.org/10.32604/cmc.2025.072903



cc Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 362

    View

  • 60

    Download

  • 0

    Like

Share Link