Special lssues
Table of Content

Multimodal Learning in Image Processing

Submission Deadline: 31 July 2024 Submit to Special Issue

Guest Editors

Prof. Shuai Liu, Hunan Normal University, China
Prof. Gautam Srivastava, Brandon University, Canada

Summary

Multimodal image segmentation and recognition is a significant and challenging research field. With the rapid development of information technology, multimodal target information is caught from different kinds of sensors, such as optical, infrared, and radar information. In this way, how to effectively fuse and utilize these multimodal data with different features and information has become a key issue.


Multimodal learning, as a powerful machine for data learning and fusion, has the ability to learn fused feature for complex data processing. In multimodal image processing, deep learning methods extract different features from multiple sensors; and then information fusion methods combine the features considering their contribution to target recognition. This can defend major challegences of classical methods, however, there are still many issues waiting solutions, such as the fusion strategy of multimodal data, data imbalance based cognitive distortion, small sample driven one/few-shot models, etc.


In this way, this issue is provided to focus on the methods and applications of multimodal learning in image processing, aiming to explore innovative methods and technologies to solve existing problems. Respected experts, scholars, and researchers are invited to share their latest research achievements and practical experience in this field, which can promote the development of multimodal image recognition, improve classification and recognition accuracy, and provide reliable solutions for practical applications.


We sincerely invite researchers from academia and industry to submit original research papers, review articles, and technical reports to jointly explore the methods and applications of multimodal mearning in image processing, solve existing problems, and promote further development in this field.


Keywords

Image processing; multimodal fusion; deep learning; classification; recognition

Published Papers


  • Open Access

    ARTICLE

    Attention-Enhanced Voice Portrait Model Using Generative Adversarial Network

    Jingyi Mao, Yuchen Zhou, Yifan Wang, Junyu Li, Ziqing Liu, Fanliang Bu
    CMC-Computers, Materials & Continua, Vol.79, No.1, pp. 837-855, 2024, DOI:10.32604/cmc.2024.048703
    (This article belongs to the Special Issue: Multimodal Learning in Image Processing)
    Abstract Voice portrait technology has explored and established the relationship between speakers’ voices and their facial features, aiming to generate corresponding facial characteristics by providing the voice of an unknown speaker. Due to its powerful advantages in image generation, Generative Adversarial Networks (GANs) have now been widely applied across various fields. The existing Voice2Face methods for voice portraits are primarily based on GANs trained on voice-face paired datasets. However, voice portrait models solely constructed on GANs face limitations in image generation quality and struggle to maintain facial similarity. Additionally, the training process is relatively unstable, thereby affecting the overall generative performance… More >

  • Open Access

    ARTICLE

    BCCLR: A Skeleton-Based Action Recognition with Graph Convolutional Network Combining Behavior Dependence and Context Clues

    Yunhe Wang, Yuxin Xia, Shuai Liu
    CMC-Computers, Materials & Continua, Vol.78, No.3, pp. 4489-4507, 2024, DOI:10.32604/cmc.2024.048813
    (This article belongs to the Special Issue: Multimodal Learning in Image Processing)
    Abstract In recent years, skeleton-based action recognition has made great achievements in Computer Vision. A graph convolutional network (GCN) is effective for action recognition, modelling the human skeleton as a spatio-temporal graph. Most GCNs define the graph topology by physical relations of the human joints. However, this predefined graph ignores the spatial relationship between non-adjacent joint pairs in special actions and the behavior dependence between joint pairs, resulting in a low recognition rate for specific actions with implicit correlation between joint pairs. In addition, existing methods ignore the trend correlation between adjacent frames within an action and context clues, leading to… More >

Share Link