Generative Multi-Modal Mutual Enhancement Video Semantic Communications

Yuanle Chen; Haobo Wang; Chunyu Liu; Linyi Wang; Jiaxin Liu; Wei Wu

doi:10.32604/cmes.2023.046837

Open Access icon Open Access

ARTICLE

Generative Multi-Modal Mutual Enhancement Video Semantic Communications

Yuanle Chen¹, Haobo Wang¹, Chunyu Liu¹, Linyi Wang², Jiaxin Liu¹, Wei Wu^1,*

1 The College of Communication and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
2 The College of Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China

* Corresponding Author: Wei Wu. Email: email

(This article belongs to the Special Issue: Machine Learning Empowered Distributed Computing: Advance in Architecture, Theory and Practice)

Computer Modeling in Engineering & Sciences 2024, 139(3), 2985-3009. https://doi.org/10.32604/cmes.2023.046837

Received 16 October 2023; Accepted 19 December 2023; Issue published 11 March 2024

Abstract

Recently, there have been significant advancements in the study of semantic communication in single-modal scenarios. However, the ability to process information in multi-modal environments remains limited. Inspired by the research and applications of natural language processing across different modalities, our goal is to accurately extract frame-level semantic information from videos and ultimately transmit high-quality videos. Specifically, we propose a deep learning-based Multi-Modal Mutual Enhancement Video Semantic Communication system, called M3E-VSC. Built upon a Vector Quantized Generative Adversarial Network (VQGAN), our system aims to leverage mutual enhancement among different modalities by using text as the main carrier of transmission. With it, the semantic information can be extracted from key-frame images and audio of the video and perform differential value to ensure that the extracted text conveys accurate semantic information with fewer bits, thus improving the capacity of the system. Furthermore, a multi-frame semantic detection module is designed to facilitate semantic transitions during video generation. Simulation results demonstrate that our proposed model maintains high robustness in complex noise environments, particularly in low signal-to-noise ratio conditions, significantly improving the accuracy and speed of semantic transmission in video communication by approximately 50 percent.

Keywords

Generative adversarial networks; multi-modal mutual enhancement; video semantic transmission; deep learning

Cite This Article

APA Style

Chen, Y., Wang, H., Liu, C., Wang, L., Liu, J. et al. (2024). Generative Multi-Modal Mutual Enhancement Video Semantic Communications. Computer Modeling in Engineering & Sciences, 139(3), 2985–3009. https://doi.org/10.32604/cmes.2023.046837

Vancouver Style

Chen Y, Wang H, Liu C, Wang L, Liu J, Wu W. Generative Multi-Modal Mutual Enhancement Video Semantic Communications. Comput Model Eng Sci. 2024;139(3):2985–3009. https://doi.org/10.32604/cmes.2023.046837

IEEE Style

Y. Chen, H. Wang, C. Liu, L. Wang, J. Liu, and W. Wu, “Generative Multi-Modal Mutual Enhancement Video Semantic Communications,” Comput. Model. Eng. Sci., vol. 139, no. 3, pp. 2985–3009, 2024. https://doi.org/10.32604/cmes.2023.046837

BibTex EndNote RIS

Copyright © 2024 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Generative Multi-Modal Mutual Enhancement Video Semantic Communications

Abstract

Keywords

Cite This Article

1858

907

1

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link