Open Access
ARTICLE
SYMPHONIA–Enhanced Multimodal Emotion Recognition with Dual-Branch Dynamic Attention and Hierarchical Adaptive Fusion
1 Department of Computer Engineering, Gachon University Sujeong-Gu, Seongnam-si, Gyeonggi-Do, Republic of Korea
2 Department of Computer Systems, Tashkent University of Information Technologies Named after Muhammad Al-Khwarizmi, Tashkent, Uzbekistan
3 Department of Industrial Management and Digital Technologies, Nordic International University, Tashkent, Uzbekistan
4 Department of Applied Informatics, Kimyo International University in Tashkent, Tashkent, Uzbekistan
5 Department of Information Processing and Management Systems, Tashkent State Technical University, Tashkent, Uzbekistan
6 Department of General Education Disciplines and Distance Education, Nukus State Pedagogical Institute Named after Ajiniyaz, Nukus, Uzbekistan
* Corresponding Author: Young-Im Cho. Email:
(This article belongs to the Special Issue: Deep Learning for Emotion Recognition)
Computers, Materials & Continua 2026, 88(1), 74 https://doi.org/10.32604/cmc.2026.077057
Received 01 December 2025; Accepted 03 March 2026; Issue published 08 May 2026
Abstract
Human emotions are intricate and difficult to decipher through various modalities. Current methodologies frequently employ inflexible fusion strategies that do not consider the dynamic and context-sensitive characteristics of emotional expressions in both visual and textual mediums. This paper presents SYMPHONIA (Synchronizing Facial and Textual Modalities for Emotion Understanding), an innovative architecture engineered to capture and amalgamate emotional signals from facial expressions and language, attuned to contextual and modality interactions. There are two parts to SYMPHONIA: a Facial Emotion Branch that uses Vision Transformers and facial landmarks, and a Textual Emotion Branch that uses RoBERTa embeddings and graph-based reasoning. A Dual-Branch Dynamic Attention Mechanism and a Hierarchical Adaptive Fusion Module are used to connect these branches. SYMPHONIA beat the best models on four datasets: IEMOCAP, MELD, CMU-MOSI, and CMU-MOSEI. It got 80.9% accuracy and 80.1% F1-score on IEMOCAP, which was better than Dualgats (74.8%) and EmoCLIP (75.3%). SYMPHONIA got 74.2% accuracy and 73.5% F1-score for MELD. It beat its competitors by getting a 0.86 Pearson correlation on MOSI and a 0.83 on MOSEI for predicting sentiment. Cross-dataset tests showed that SYMPHONIA could generalize, with 66.9% accuracy when trained on IEMOCAP and tested on MELD. This was better than all the baselines. These results show that SYMPHONIA is good at recognizing emotions and analyzing sentiment in different situations, which shows that it can adapt and do well in different settings.Keywords
Cite This Article
Copyright © 2026 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools