Open Access iconOpen Access

REVIEW

A Survey on Multimodal Emotion Recognition: Methods, Datasets, and Future Directions

A-Seong Moon, Haesung Kim, Ye-Chan Park, Jaesung Lee*

Department of Artificial Intelligence, Chung-Ang University, Seoul, Republic of Korea

* Corresponding Author: Jaesung Lee. Email: email

Computers, Materials & Continua 2026, 87(2), 1 https://doi.org/10.32604/cmc.2026.076411

Abstract

Multimodal emotion recognition has emerged as a key research area for enabling human-centered artificial intelligence, supported by the rapid progress in vision, audio, language, and physiological modeling. Existing approaches integrate heterogeneous affective cues through diverse embedding strategies and fusion mechanisms, yet the field remains fragmented due to differences in feature alignment, temporal synchronization, modality reliability, and robustness to noise or missing inputs. This survey provides a comprehensive analysis of MER research from 2021 to 2025, consolidating advances in modality-specific representation learning, cross-modal feature construction, and early, late, and hybrid fusion paradigms. We systematically review visual, acoustic, textual, and sensor-based embeddings, highlighting how pre-trained encoders, self-supervised learning, and large language models have reshaped the representational foundations of MER. We further categorize fusion strategies by interaction depth and architectural design, examining how attention mechanisms, cross-modal transformers, adaptive gating, and multimodal large language models redefine the integration of affective signals. Finally, we summarize major benchmark datasets and evaluation metrics and discuss emerging challenges related to scalability, generalization, and interpretability. This survey aims to provide a unified perspective on multimodal fusion for emotion recognition and to guide future research toward more coherent and generalizable multimodal affective intelligence.

Keywords

Multimodal emotion recognition; multimodal learning; cross-modal learning; fusion strategies; representation learning

Cite This Article

APA Style
Moon, A., Kim, H., Park, Y., Lee, J. (2026). A Survey on Multimodal Emotion Recognition: Methods, Datasets, and Future Directions. Computers, Materials & Continua, 87(2), 1. https://doi.org/10.32604/cmc.2026.076411
Vancouver Style
Moon A, Kim H, Park Y, Lee J. A Survey on Multimodal Emotion Recognition: Methods, Datasets, and Future Directions. Comput Mater Contin. 2026;87(2):1. https://doi.org/10.32604/cmc.2026.076411
IEEE Style
A. Moon, H. Kim, Y. Park, and J. Lee, “A Survey on Multimodal Emotion Recognition: Methods, Datasets, and Future Directions,” Comput. Mater. Contin., vol. 87, no. 2, pp. 1, 2026. https://doi.org/10.32604/cmc.2026.076411



cc Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 1108

    View

  • 737

    Download

  • 0

    Like

Share Link