MultiAgent-CoT: A Multi-Agent Chain-of-Thought Reasoning Model for Robust Multimodal Dialogue Understanding

Ans Alghamdi

doi:10.32604/cmc.2025.071210

Open Access icon Open Access

ARTICLE

MultiAgent-CoT: A Multi-Agent Chain-of-Thought Reasoning Model for Robust Multimodal Dialogue Understanding

Ans D. Alghamdi^*

Department of Computer Science, Faculty of Computing and Information, Al-Baha University, Al-Baha, 65779, Saudi Arabia

* Corresponding Author: Ans D. Alghamdi. Email: email

(This article belongs to the Special Issue: Artificial Intelligence in Visual and Audio Signal Processing)

Computers, Materials & Continua 2026, 86(2), 1-35. https://doi.org/10.32604/cmc.2025.071210

Received 02 August 2025; Accepted 22 October 2025; Issue published 09 December 2025

Abstract

Multimodal dialogue systems often fail to maintain coherent reasoning over extended conversations and suffer from hallucination due to limited context modeling capabilities. Current approaches struggle with cross-modal alignment, temporal consistency, and robust handling of noisy or incomplete inputs across multiple modalities. We propose MultiAgent-Chain of Thought (CoT), a novel multi-agent chain-of-thought reasoning framework where specialized agents for text, vision, and speech modalities collaboratively construct shared reasoning traces through inter-agent message passing and consensus voting mechanisms. Our architecture incorporates self-reflection modules, conflict resolution protocols, and dynamic rationale alignment to enhance consistency, factual accuracy, and user engagement. The framework employs a hierarchical attention mechanism with cross-modal fusion and implements adaptive reasoning depth based on dialogue complexity. Comprehensive evaluations on Situated Interactive MultiModal Conversations (SIMMC) 2.0, VisDial v1.0, and newly introduced challenging scenarios demonstrate statistically significant improvements in grounding accuracy (p < 0.01), chain-of-thought interpretability, and robustness to adversarial inputs compared to state-of-the-art monolithic transformer baselines and existing multi-agent approaches.

Keywords

Multi-agent systems; chain-of-thought reasoning; multimodal dialogue; conversational artificial intelligence (AI); cross-modal fusion; reasoning Interpretability

Cite This Article

APA Style

Alghamdi, A.D. (2026). MultiAgent-CoT: A Multi-Agent Chain-of-Thought Reasoning Model for Robust Multimodal Dialogue Understanding. Computers, Materials & Continua, 86(2), 1–35. https://doi.org/10.32604/cmc.2025.071210

Vancouver Style

Alghamdi AD. MultiAgent-CoT: A Multi-Agent Chain-of-Thought Reasoning Model for Robust Multimodal Dialogue Understanding. Comput Mater Contin. 2026;86(2):1–35. https://doi.org/10.32604/cmc.2025.071210

IEEE Style

A. D. Alghamdi, “MultiAgent-CoT: A Multi-Agent Chain-of-Thought Reasoning Model for Robust Multimodal Dialogue Understanding,” Comput. Mater. Contin., vol. 86, no. 2, pp. 1–35, 2026. https://doi.org/10.32604/cmc.2025.071210

BibTex EndNote RIS

Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

MultiAgent-CoT: A Multi-Agent Chain-of-Thought Reasoning Model for Robust Multimodal Dialogue Understanding

Abstract

Keywords

Cite This Article

843

397

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link