Home / Journals / CMC / Online First / doi:10.32604/cmc.2025.071210
Special Issues
Table of Content

Open Access

ARTICLE

MultiAgent-CoT: A Multi-Agent Chain-of-Thought Reasoning Model for Robust Multimodal Dialogue Understanding

Ans D. Alghamdi*
Department of Computer Science, Faculty of Computing and Information, Al-Baha University, Al-Baha, 65779, Saudi Arabia
* Corresponding Author: Ans D. Alghamdi. Email: email
(This article belongs to the Special Issue: Artificial Intelligence in Visual and Audio Signal Processing)

Computers, Materials & Continua https://doi.org/10.32604/cmc.2025.071210

Received 02 August 2025; Accepted 22 October 2025; Published online 21 November 2025

Abstract

Multimodal dialogue systems often fail to maintain coherent reasoning over extended conversations and suffer from hallucination due to limited context modeling capabilities. Current approaches struggle with cross-modal alignment, temporal consistency, and robust handling of noisy or incomplete inputs across multiple modalities. We propose MultiAgent-Chain of Thought (CoT), a novel multi-agent chain-of-thought reasoning framework where specialized agents for text, vision, and speech modalities collaboratively construct shared reasoning traces through inter-agent message passing and consensus voting mechanisms. Our architecture incorporates self-reflection modules, conflict resolution protocols, and dynamic rationale alignment to enhance consistency, factual accuracy, and user engagement. The framework employs a hierarchical attention mechanism with cross-modal fusion and implements adaptive reasoning depth based on dialogue complexity. Comprehensive evaluations on Situated Interactive MultiModal Conversations (SIMMC) 2.0, VisDial v1.0, and newly introduced challenging scenarios demonstrate statistically significant improvements in grounding accuracy (p < 0.01), chain-of-thought interpretability, and robustness to adversarial inputs compared to state-of-the-art monolithic transformer baselines and existing multi-agent approaches.

Keywords

Multi-agent systems; chain-of-thought reasoning; multimodal dialogue; conversational artificial intelligence (AI); cross-modal fusion; reasoning Interpretability
  • 149

    View

  • 22

    Download

  • 0

    Like

Share Link