Metacognition Inspired Reflective Chain-of-Thought for Knowledge-Based VQA

Zhongfan Sun; Kan Guo; Yongli Hu; Yong Zhang

doi:10.32604/cmc.2025.072903

Open Access icon Open Access

ARTICLE

Metacognition Inspired Reflective Chain-of-Thought for Knowledge-Based VQA

Zhongfan Sun, Kan Guo, Yongli Hu^*, Yong Zhang

School of Information Science and Technology, Beijing University of Technology, Beijing, 100124, China

* Corresponding Author: Yongli Hu. Email: email

(This article belongs to the Special Issue: Advances in Large Models and Domain-specific Applications)

Computers, Materials & Continua 2026, 87(1), 80 https://doi.org/10.32604/cmc.2025.072903

Received 06 September 2025; Accepted 08 December 2025; Issue published 10 February 2026

Abstract

Knowledge-based Visual Question Answering (VQA) requires the integration of visual information with external knowledge reasoning. Existing approaches typically retrieve information from external corpora and rely on pretrained language models for reasoning. However, their performance is often hindered by the limited capabilities of retrievers and the constrained size of knowledge bases. Moreover, relying on image captions to bridge the modal gap between visual and language modalities can lead to the omission of critical visual details. To address these limitations, we propose the Reflective Chain-of-Thought (ReCoT) method, a simple yet effective framework inspired by metacognition theory. ReCoT effectively activates the reasoning capabilities of Multimodal Large Language Models (MLLMs), providing essential visual and knowledge cues required to solve complex visual questions. It simulates a metacognitive reasoning process that encompasses monitoring, reflection, and correction. Specifically, in the initial generation stage, an MLLM produces a preliminary answer that serves as the model’s initial cognitive output. During the reflective reasoning stage, this answer is critically examined to generate a reflective rationale that integrates key visual evidence and relevant knowledge. In the final refinement stage, a smaller language model leverages this rationale to revise the initial prediction, resulting in a more accurate final answer. By harnessing the strengths of MLLMs in visual and knowledge grounding, ReCoT enables smaller language models to reason effectively without dependence on image captions or external knowledge bases. Experimental results demonstrate that ReCoT achieves substantial performance improvements, outperforming state-of-the-art methods by 2.26% on OK-VQA and 5.8% on A-OKVQA.

Keywords

Knowledge-based VQA; metacognition; reflective chain-of-thought; answer refinement

Cite This Article

APA Style

Sun, Z., Guo, K., Hu, Y., Zhang, Y. (2026). Metacognition Inspired Reflective Chain-of-Thought for Knowledge-Based VQA. Computers, Materials & Continua, 87(1), 80. https://doi.org/10.32604/cmc.2025.072903

Vancouver Style

Sun Z, Guo K, Hu Y, Zhang Y. Metacognition Inspired Reflective Chain-of-Thought for Knowledge-Based VQA. Comput Mater Contin. 2026;87(1):80. https://doi.org/10.32604/cmc.2025.072903

IEEE Style

Z. Sun, K. Guo, Y. Hu, and Y. Zhang, “Metacognition Inspired Reflective Chain-of-Thought for Knowledge-Based VQA,” Comput. Mater. Contin., vol. 87, no. 1, pp. 80, 2026. https://doi.org/10.32604/cmc.2025.072903

BibTex EndNote RIS

Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Metacognition Inspired Reflective Chain-of-Thought for Knowledge-Based VQA

Abstract

Keywords

Cite This Article

545

153

0

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link