Hierarchical Contrastive Representation Learning Guided by Multimodal Feature Decomposition for Multimodal Sentiment Analysis

Hongbin Wang^1,2, Liusong Li^1,2, Di Jiang^1,2,*
1 Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China
2 Yunnan Key Laboratory of Artificial Intelligence, Kunming University of Science and Technology, Kunming, China
* Corresponding Author: Di Jiang. Email: email

Computers, Materials & Continua https://doi.org/10.32604/cmc.2026.079330

Received 20 January 2026; Accepted 22 April 2026; Published online 12 May 2026

Download PDF

Abstract

Multimodal sentiment analysis aims to fuse emotional information from data across different modalities to predict human emotional states. Although existing multimodal sentiment analysis methods have made significant progress, the heterogeneity between modalities still leads to an imbalance in feature space distribution, thereby hindering the effective learning and fusion of multimodal representations. In addition, the presence of emotion-irrelevant information in auxiliary modalities is another major factor contributing to differences in feature space distributions. To address this issue, we propose a Hierarchical Contrastive Representation Learning framework with Multimodal Feature Decoupling (HCRL-MFD). To reduce emotion-irrelevant information and optimize feature representations, we introduce a Multimodal Feature Decoupling (MFD) mechanism. This mechanism decouples the multimodal feature space into two subspaces: an emotion-relevant core space and an emotion-irrelevant space, effectively separating key information from redundant signals. This helps alleviate modality heterogeneity and improves the granularity of feature representations in multimodal learning. Meanwhile, we design a Hierarchical Contrastive Representation Learning (HCRL) strategy to further explore intra-modal interactions and cross-sample relationships between different emotional states, aiming to reduce distributional discrepancies across modalities. Finally, we concatenate and fuse the features from each modality for sentiment prediction. We conduct extensive experiments on two publicly available multimodal sentiment analysis datasets, CMU-MOSI and CMU-MOSEI, and the results demonstrate that our model achieves significant performance gains and shows strong competitiveness compared to existing methods.

Keywords

Multimodal sentiment analysis; feature decomposition; contrastive representation learning

Downloads
- Full-Text PDF
Citation Tools
- BibTex
- EndNote
- RIS

260

View
34

Download
0

Like

Improving Targeted Multimodal Sentiment Classification with Semantic Description of Images
Jieyu An, Wan Mohd Nazmee Wan...
Multi-Model Fusion Framework Using Deep Learning for Visual-Textual Sentiment Classification
Israa K. Salman Al-Tameemi, Mohammad-Reza...
Multimodal Sentiment Analysis Based on a Cross-Modal Multihead Attention Mechanism
Lujuan Deng, Boyi Liu, Zuhe Li
Text-Image Feature Fine-Grained Learning for Joint Multimodal Aspect-Based Sentiment Analysis
Tianzhi Zhang, Gang Zhou, Shuang...
PKME-MLM: A Novel Multimodal Large Model for Sarcasm Detection
Jian Luo, Yaling Li, Xueyu Li,...

All issues

Online First

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Hierarchical Contrastive Representation Learning Guided by Multimodal Feature Decomposition for Multimodal Sentiment Analysis

Abstract

Keywords

260

34

0

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link