Quantum-Inspired Complex-Valued Fusion Framework: Optimizing Intra-Modal Semantics and Inter-Modal Fusion in Multimodal Sarcasm Detection

Dong Zhang¹, Lianhe Shao^2,*, Weijie Xu³, Xihan Wang^1,*, Quanli Gao²
1 School of Computer Science, Xi’an Polytechnic University, Xi’an, China
2 School of Cybersecurity, Xi’an Polytechnic University, Xi’an, China
3 State Grid (Xi’an) Environmental Protection Technology Center Co., Ltd., Xi’an, China
* Corresponding Author: Lianhe Shao. Email: email ; Xihan Wang. Email: email
(This article belongs to the Special Issue: Deep Learning for Emotion Recognition)

Computers, Materials & Continua https://doi.org/10.32604/cmc.2026.078074

Received 23 December 2025; Accepted 26 March 2026; Published online 17 April 2026

Download PDF

Abstract

With the popularization of multimodal content on social media, accurately identifying sarcastic intent is of great significance for understanding public attitudes and grasping public opinion trends. However, sarcastic expressions rely on context, exhibit inconsistencies in multimodal information, and have implicitly contradictory semantics. These characteristics pose challenges to traditional single-text modality methods. Existing multimodal methods, due to their default assumption of symmetric modal interactions and difficulty in capturing the subtlety of sarcasm and modal contradictions, yield limited detection performance. Therefore, this paper proposes a quantum-inspired complex-valued fusion framework to optimize the intra-modal semantics and inter-modal fusion in multimodal sarcasm detection. Firstly, this framework constructs a quantum-inspired complex-valued multimodal feature representation method. It embeds the text, visual, and audio modalities into the complex-valued Hilbert space, and models the feature intensity and directional information, respectively, through the two dimensions of “amplitude-phase”, providing highly expressive basic features for fusion. Secondly, an asymmetric quantum interference fusion mechanism is designed. Based on the principle of quantum interference, a directional interference term and trainable parameters are introduced to accurately capture the asymmetric interaction relationship between modalities, where “text dominates semantic interpretation and vision supplements detailed evidence”, effectively mining the modal contradictions on which sarcasm depends. Experimental results show that the F1-score of the proposed model has increased by 3.71% and 2.74% compared with M2Seq2Seq and SRLM, respectively, on the Mustard dataset. On the Memotion dataset, it also achieves performance improvements of 0.28% and 0.83% relative to M2Seq2Seq and SRLM. The effectiveness of the key modules in the model is also verified through ablation experiments.

Keywords

Sarcasm detection; multimodal analysis; quantum interference

Downloads
- Full-Text PDF
Citation Tools
- BibTex
- EndNote
- RIS

153

View
30

Download
0

Like

Computational Linguistics with Optimal Deep Belief Network Based Irony Detection in Social Media
Manar Ahmed Hamza, Hala J. Alshahrani,...
Feature-Based Augmentation in Sarcasm Detection Using Reverse Generative Adversarial Network
Derwin Suhartono, Alif Tri Handoyo,...
Research on Sarcasm Detection Technology Based on Image-Text Fusion
Xiaofang Jin, Yuying Yang, Yinan...
PKME-MLM: A Novel Multimodal Large Model for Sarcasm Detection
Jian Luo, Yaling Li, Xueyu Li,...
Event-Aware Sarcasm Detection in Chinese Social Media Using Multi-Head Attention and Contrastive Learning
Kexuan Niu, Xiameng Si, Xiaojie...

All issues

Online First

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Quantum-Inspired Complex-Valued Fusion Framework: Optimizing Intra-Modal Semantics and Inter-Modal Fusion in Multimodal Sarcasm Detection

Abstract

Keywords

153

30

0

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link