Home / Journals / CMC / Online First / doi:10.32604/cmc.2026.078074
Special Issues
Table of Content

Open Access

ARTICLE

Quantum-Inspired Complex-Valued Fusion Framework: Optimizing Intra-Modal Semantics and Inter-Modal Fusion in Multimodal Sarcasm Detection

Dong Zhang1, Lianhe Shao2,*, Weijie Xu3, Xihan Wang1,*, Quanli Gao2
1 School of Computer Science, Xi’an Polytechnic University, Xi’an, China
2 School of Cybersecurity, Xi’an Polytechnic University, Xi’an, China
3 State Grid (Xi’an) Environmental Protection Technology Center Co., Ltd., Xi’an, China
* Corresponding Author: Lianhe Shao. Email: email; Xihan Wang. Email: email
(This article belongs to the Special Issue: Deep Learning for Emotion Recognition)

Computers, Materials & Continua https://doi.org/10.32604/cmc.2026.078074

Received 23 December 2025; Accepted 26 March 2026; Published online 17 April 2026

Abstract

With the popularization of multimodal content on social media, accurately identifying sarcastic intent is of great significance for understanding public attitudes and grasping public opinion trends. However, sarcastic expressions rely on context, exhibit inconsistencies in multimodal information, and have implicitly contradictory semantics. These characteristics pose challenges to traditional single-text modality methods. Existing multimodal methods, due to their default assumption of symmetric modal interactions and difficulty in capturing the subtlety of sarcasm and modal contradictions, yield limited detection performance. Therefore, this paper proposes a quantum-inspired complex-valued fusion framework to optimize the intra-modal semantics and inter-modal fusion in multimodal sarcasm detection. Firstly, this framework constructs a quantum-inspired complex-valued multimodal feature representation method. It embeds the text, visual, and audio modalities into the complex-valued Hilbert space, and models the feature intensity and directional information, respectively, through the two dimensions of “amplitude-phase”, providing highly expressive basic features for fusion. Secondly, an asymmetric quantum interference fusion mechanism is designed. Based on the principle of quantum interference, a directional interference term and trainable parameters are introduced to accurately capture the asymmetric interaction relationship between modalities, where “text dominates semantic interpretation and vision supplements detailed evidence”, effectively mining the modal contradictions on which sarcasm depends. Experimental results show that the F1-score of the proposed model has increased by 3.71% and 2.74% compared with M2Seq2Seq and SRLM, respectively, on the Mustard dataset. On the Memotion dataset, it also achieves performance improvements of 0.28% and 0.83% relative to M2Seq2Seq and SRLM. The effectiveness of the key modules in the model is also verified through ablation experiments.

Keywords

Sarcasm detection; multimodal analysis; quantum interference
  • 153

    View

  • 30

    Download

  • 0

    Like

Share Link