Open Access iconOpen Access

ARTICLE

MultiAgent-CoT: A Multi-Agent Chain-of-Thought Reasoning Model for Robust Multimodal Dialogue Understanding

Ans D. Alghamdi*

Department of Computer Science, Faculty of Computing and Information, Al-Baha University, Al-Baha, 65779, Saudi Arabia

* Corresponding Author: Ans D. Alghamdi. Email: email

(This article belongs to the Special Issue: Artificial Intelligence in Visual and Audio Signal Processing)

Computers, Materials & Continua 2026, 86(2), 1-35. https://doi.org/10.32604/cmc.2025.071210

Abstract

Multimodal dialogue systems often fail to maintain coherent reasoning over extended conversations and suffer from hallucination due to limited context modeling capabilities. Current approaches struggle with cross-modal alignment, temporal consistency, and robust handling of noisy or incomplete inputs across multiple modalities. We propose MultiAgent-Chain of Thought (CoT), a novel multi-agent chain-of-thought reasoning framework where specialized agents for text, vision, and speech modalities collaboratively construct shared reasoning traces through inter-agent message passing and consensus voting mechanisms. Our architecture incorporates self-reflection modules, conflict resolution protocols, and dynamic rationale alignment to enhance consistency, factual accuracy, and user engagement. The framework employs a hierarchical attention mechanism with cross-modal fusion and implements adaptive reasoning depth based on dialogue complexity. Comprehensive evaluations on Situated Interactive MultiModal Conversations (SIMMC) 2.0, VisDial v1.0, and newly introduced challenging scenarios demonstrate statistically significant improvements in grounding accuracy (p < 0.01), chain-of-thought interpretability, and robustness to adversarial inputs compared to state-of-the-art monolithic transformer baselines and existing multi-agent approaches.

Keywords

Multi-agent systems; chain-of-thought reasoning; multimodal dialogue; conversational artificial intelligence (AI); cross-modal fusion; reasoning Interpretability

Cite This Article

APA Style
Alghamdi, A.D. (2026). MultiAgent-CoT: A Multi-Agent Chain-of-Thought Reasoning Model for Robust Multimodal Dialogue Understanding. Computers, Materials & Continua, 86(2), 1–35. https://doi.org/10.32604/cmc.2025.071210
Vancouver Style
Alghamdi AD. MultiAgent-CoT: A Multi-Agent Chain-of-Thought Reasoning Model for Robust Multimodal Dialogue Understanding. Comput Mater Contin. 2026;86(2):1–35. https://doi.org/10.32604/cmc.2025.071210
IEEE Style
A. D. Alghamdi, “MultiAgent-CoT: A Multi-Agent Chain-of-Thought Reasoning Model for Robust Multimodal Dialogue Understanding,” Comput. Mater. Contin., vol. 86, no. 2, pp. 1–35, 2026. https://doi.org/10.32604/cmc.2025.071210



cc Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 250

    View

  • 55

    Download

  • 0

    Like

Share Link