Open Access iconOpen Access

ARTICLE

Hierarchical Contrastive Representation Learning Guided by Multimodal Feature Decomposition for Multimodal Sentiment Analysis

Hongbin Wang1,2, Liusong Li1,2, Di Jiang1,2,*

1 Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China
2 Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, China

* Corresponding Author: Di Jiang. Email: email

Computers, Materials & Continua 2026, 88(2), 54 https://doi.org/10.32604/cmc.2026.079330

Abstract

Multimodal sentiment analysis aims to fuse emotional information from data across different modalities to predict human emotional states. Although existing multimodal sentiment analysis methods have made significant progress, the heterogeneity between modalities still leads to an imbalance in feature space distribution, thereby hindering the effective learning and fusion of multimodal representations. In addition, the presence of emotion-irrelevant information in auxiliary modalities is another major factor contributing to differences in feature space distributions. To address this issue, we propose a Hierarchical Contrastive Representation Learning framework with Multimodal Feature Decoupling (HCRL-MFD). To reduce emotion-irrelevant information and optimize feature representations, we introduce a Multimodal Feature Decoupling (MFD) mechanism. This mechanism decouples the multimodal feature space into two subspaces: an emotion-relevant core space and an emotion-irrelevant space, effectively separating key information from redundant signals. This helps alleviate modality heterogeneity and improves the granularity of feature representations in multimodal learning. Meanwhile, we design a Hierarchical Contrastive Representation Learning (HCRL) strategy to further explore intra-modal interactions and cross-sample relationships between different emotional states, aiming to reduce distributional discrepancies across modalities. Finally, we concatenate and fuse the features from each modality for sentiment prediction. We conduct extensive experiments on two publicly available multimodal sentiment analysis datasets, CMU-MOSI and CMU-MOSEI, and the results demonstrate that our model achieves significant performance gains and shows strong competitiveness compared to existing methods.

Keywords

Multimodal sentiment analysis; feature decomposition; contrastive representation learning

Cite This Article

APA Style
Wang, H., Li, L., Jiang, D. (2026). Hierarchical Contrastive Representation Learning Guided by Multimodal Feature Decomposition for Multimodal Sentiment Analysis. Computers, Materials & Continua, 88(2), 54. https://doi.org/10.32604/cmc.2026.079330
Vancouver Style
Wang H, Li L, Jiang D. Hierarchical Contrastive Representation Learning Guided by Multimodal Feature Decomposition for Multimodal Sentiment Analysis. Comput Mater Contin. 2026;88(2):54. https://doi.org/10.32604/cmc.2026.079330
IEEE Style
H. Wang, L. Li, and D. Jiang, “Hierarchical Contrastive Representation Learning Guided by Multimodal Feature Decomposition for Multimodal Sentiment Analysis,” Comput. Mater. Contin., vol. 88, no. 2, pp. 54, 2026. https://doi.org/10.32604/cmc.2026.079330



cc Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 321

    View

  • 49

    Download

  • 0

    Like

Share Link