Open Access iconOpen Access

ARTICLE

crossmark

Enhanced Multimodal Sentiment Analysis via Integrated Spatial Position Encoding and Fusion Embedding

Chenquan Gan1,2,*, Xu Liu1, Yu Tang2, Xianrong Yu3, Qingyi Zhu1, Deepak Kumar Jain4

1 School of Cyber Security and Information Law, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
2 School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
3 Jiangxi Provincial Key Laboratory of Electronic Data Control and Forensics (Jiangxi Police College), Nanchang, 330100, China
4 Key Laboratory of Intelligent Control and Optimization for Industrial Equipment of Ministry of Education, Dalian University of Technology, Dalian, 116024, China

* Corresponding Author: Chenquan Gan. Email: email

Computers, Materials & Continua 2025, 85(3), 5399-5421. https://doi.org/10.32604/cmc.2025.068126

Abstract

Multimodal sentiment analysis aims to understand emotions from text, speech, and video data. However, current methods often overlook the dominant role of text and suffer from feature loss during integration. Given the varying importance of each modality across different contexts, a central and pressing challenge in multimodal sentiment analysis lies in maximizing the use of rich intra-modal features while minimizing information loss during the fusion process. In response to these critical limitations, we propose a novel framework that integrates spatial position encoding and fusion embedding modules to address these issues. In our model, text is treated as the core modality, while speech and video features are selectively incorporated through a unique position-aware fusion process. The spatial position encoding strategy preserves the internal structural information of speech and visual modalities, enabling the model to capture localized intra-modal dependencies that are often overlooked. This design enhances the richness and discriminative power of the fused representation, enabling more accurate and context-aware sentiment prediction. Finally, we conduct comprehensive evaluations on two widely recognized standard datasets in the field—CMU-MOSI and CMU-MOSEI to validate the performance of the proposed model. The experimental results demonstrate that our model exhibits good performance and effectiveness for sentiment analysis tasks.

Keywords

Multimodal sentiment analysis; spatial position encoding; fusion embedding; feature loss reduction

Cite This Article

APA Style
Gan, C., Liu, X., Tang, Y., Yu, X., Zhu, Q. et al. (2025). Enhanced Multimodal Sentiment Analysis via Integrated Spatial Position Encoding and Fusion Embedding. Computers, Materials & Continua, 85(3), 5399–5421. https://doi.org/10.32604/cmc.2025.068126
Vancouver Style
Gan C, Liu X, Tang Y, Yu X, Zhu Q, Jain DK. Enhanced Multimodal Sentiment Analysis via Integrated Spatial Position Encoding and Fusion Embedding. Comput Mater Contin. 2025;85(3):5399–5421. https://doi.org/10.32604/cmc.2025.068126
IEEE Style
C. Gan, X. Liu, Y. Tang, X. Yu, Q. Zhu, and D. K. Jain, “Enhanced Multimodal Sentiment Analysis via Integrated Spatial Position Encoding and Fusion Embedding,” Comput. Mater. Contin., vol. 85, no. 3, pp. 5399–5421, 2025. https://doi.org/10.32604/cmc.2025.068126



cc Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 400

    View

  • 143

    Download

  • 0

    Like

Share Link