A-Seong Moon1, Seungyeon Jeong1, Donghee Kim1, Mohd Asyraf Zulkifley2, Bong-Soo Sohn3,*, Jaesung Lee1,*
CMC-Computers, Materials & Continua, Vol.85, No.2, pp. 2851-2872, 2025, DOI:10.32604/cmc.2025.067103
- 23 September 2025
Abstract Emotion recognition under uncontrolled and noisy environments presents persistent challenges in the design of emotionally responsive systems. The current study introduces an audio-visual recognition framework designed to address performance degradation caused by environmental interference, such as background noise, overlapping speech, and visual obstructions. The proposed framework employs a structured fusion approach, combining early-stage feature-level integration with decision-level coordination guided by temporal attention mechanisms. Audio data are transformed into mel-spectrogram representations, and visual data are represented as raw frame sequences. Spatial and temporal features are extracted through convolutional and transformer-based encoders, allowing the framework to capture… More >
Graphic Abstract