A Survey on Multimodal Emotion Recognition: Methods, Datasets, and Future Directions

A-Seong Moon; Haesung Kim; Ye-Chan Park; Jaesung Lee

doi:10.32604/cmc.2026.076411

Open Access icon Open Access

REVIEW

A Survey on Multimodal Emotion Recognition: Methods, Datasets, and Future Directions

A-Seong Moon, Haesung Kim, Ye-Chan Park, Jaesung Lee^*

Department of Artificial Intelligence, Chung-Ang University, Seoul, Republic of Korea

* Corresponding Author: Jaesung Lee. Email: email

Computers, Materials & Continua 2026, 87(2), 1 https://doi.org/10.32604/cmc.2026.076411

Received 20 November 2025; Accepted 19 January 2026; Issue published 12 March 2026

Abstract

Multimodal emotion recognition has emerged as a key research area for enabling human-centered artificial intelligence, supported by the rapid progress in vision, audio, language, and physiological modeling. Existing approaches integrate heterogeneous affective cues through diverse embedding strategies and fusion mechanisms, yet the field remains fragmented due to differences in feature alignment, temporal synchronization, modality reliability, and robustness to noise or missing inputs. This survey provides a comprehensive analysis of MER research from 2021 to 2025, consolidating advances in modality-specific representation learning, cross-modal feature construction, and early, late, and hybrid fusion paradigms. We systematically review visual, acoustic, textual, and sensor-based embeddings, highlighting how pre-trained encoders, self-supervised learning, and large language models have reshaped the representational foundations of MER. We further categorize fusion strategies by interaction depth and architectural design, examining how attention mechanisms, cross-modal transformers, adaptive gating, and multimodal large language models redefine the integration of affective signals. Finally, we summarize major benchmark datasets and evaluation metrics and discuss emerging challenges related to scalability, generalization, and interpretability. This survey aims to provide a unified perspective on multimodal fusion for emotion recognition and to guide future research toward more coherent and generalizable multimodal affective intelligence.

Keywords

Multimodal emotion recognition; multimodal learning; cross-modal learning; fusion strategies; representation learning

Cite This Article

APA Style

Moon, A., Kim, H., Park, Y., Lee, J. (2026). A Survey on Multimodal Emotion Recognition: Methods, Datasets, and Future Directions. Computers, Materials & Continua, 87(2), 1. https://doi.org/10.32604/cmc.2026.076411

Vancouver Style

Moon A, Kim H, Park Y, Lee J. A Survey on Multimodal Emotion Recognition: Methods, Datasets, and Future Directions. Comput Mater Contin. 2026;87(2):1. https://doi.org/10.32604/cmc.2026.076411

IEEE Style

A. Moon, H. Kim, Y. Park, and J. Lee, “A Survey on Multimodal Emotion Recognition: Methods, Datasets, and Future Directions,” Comput. Mater. Contin., vol. 87, no. 2, pp. 1, 2026. https://doi.org/10.32604/cmc.2026.076411

BibTex EndNote RIS

Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

A Survey on Multimodal Emotion Recognition: Methods, Datasets, and Future Directions

Abstract

Keywords

Cite This Article

4404

1431

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link