Table of Content

Recognition Tasks with Transformers

Submission Deadline: 01 July 2024 Submit to Special Issue

Guest Editors

Prof. Huimin Lu, Kyushu Institute of Technology, Japan.
Prof. Jihua Zhu, Xi’an Jiaotong University, China.
Dr. Xing Xu, University of Electronic Science and Technology of China, China.
Dr. Yuchao Zheng, Kyushu Institute of Technology, Japan.


Pattern recognition (PR) is experiencing a revolutionary change with the rapid advancements in transformer-based methodologies, which hold the potential to reveal unprecedented insights from complex data and enable more efficient, cost-effective solutions for enhancing human initiatives. Moreover, the inherently simple architecture of transformers enables the processing of diverse modalities (such as point cloud, images, videos, text, and speech) using similar processing blocks, thereby fostering the creation of robust and adaptable pattern recognition solutions. These advanced architectures have been increasingly applied in fields such as natural language processing, system identification, speech recognition and image recognition. The rising demand for sophisticated recognition solutions across multiple industries motivates researchers to explore innovative techniques and methodologies that leverage the potential of transformers.


However, recognition tasks with transformers present various challenges, including the necessity for robust feature extraction, the improvement of model interpretability, ensuring recognition robustness, effectively addressing dynamic environments, accommodating variations in scale and viewpoint, and devising real-time, scalable solutions. Addressing these complexities necessitates the development of innovative approaches and a profound understanding of the underlying principles to advance the state-of-the-art in pattern recognition research.


The objective of this special issue is to establish a forum for researchers to share their most recent discoveries and advancements in recognition tasks using Transformer-based methodologies. In order to overcome the challenges, groundbreaking methods and theoretical insights are necessary to push the boundaries of recognition tasks for transformer-based applications. We invite original research papers that tackle fundamental problems within this domain. Additionally, we welcome survey papers that assess the current state of progress and challenges in recognition tasks employing transformer architectures. Researchers from fields such as computer science, engineering, mathematics, physics, and other related disciplines are encouraged to submit their research findings. This interdisciplinary approach will foster the exchange of ideas and stimulate the evolution of state-of-the-art recognition technologies, ultimately driving progress in pattern recognition and its practical applications in the real world. Therefore, the following subtopics are the particular interests of this special issue, including but not limited to:

• Transformer-based real-time recognition applications in areas like 2D/3D object detection, video analytics, and action identification.

• Transformer-centric approaches for complex vision problems, such as recognition, panoptic segmentation, multi-object tracking, and pose estimation.

• Transformer-centric approaches for fundamental vision challenges, such as image super-resolution, de-blurring, de-raining, de-noising and colorization.

• Advanced transformer architectures designed specifically for the handling of spatial (image) and temporal (video) data.

• Transformer models tailored for the processing of 3D data types, such as volumetric, mesh, and point-cloud data.

• Fine-tuning methodologies of transformer models for improved performance in recognition tasks.

• Optimization of multi-modal recognition through the innovative use of transformers.

• Unsupervised, weakly supervised, and semi-supervised learning with transformer models.

• Construction of interpretability methods based on transformer's attention mechanism.

• Multi-modal transformer models designed to foster better understanding of image-text interactions.


Pattern Recognition, Artificial Intelligence, Deep Learning, Transfer Learning, Feature Engineering, Representation Learning, Knowledge Learning, Interpretable Machine Learning, Multimodal Machine Learning, Visual Transformer, Optimization Policy.

Share Link