Submission Deadline: 30 June 2026 View: 185 Submit to Special Issue
Prof. Dr. Ahmad Taher Azar
Email: aazar@psu.edu.sa
Affiliation: 1. College of Computer and Information Sciences, Prince Sultan University, Riyadh, 11586, Saudi Arabia;
2. Automated Systems and Computing Lab (ASCL), Prince Sultan University, Riyadh, 11586, Saudi Arabia
Research Interests: artificial intelligence, robotics, control theory & applications, reinforcement learning, computational intelligence

Dr. Weiwei Jiang
Email: jww@bupt.edu.cn
Affiliation: School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing, China
Research Interests: artificial intelligence, machine learning, big data, wireless communication and edge computing

The field of artificial intelligence is undergoing a revolutionary shift with the convergence of large language models (LLMs) and computer vision. This fusion, known as multimodal learning, is pushing the boundaries of how machines perceive, interpret, and interact with the visual world. This special issue, titled "Multimodal Vision with Large Language Models," is dedicated to exploring this transformative synergy. It aims to collate cutting-edge research that moves beyond traditional siloed approaches, instead creating unified models that can jointly reason over visual and textual data to achieve a deeper, more contextual, and semantically grounded understanding.
We seek to investigate the full spectrum of this integration, from novel neural architectures that seamlessly blend visual and linguistic features to innovative training paradigms that leverage the complementary strengths of both modalities. This includes enabling complex capabilities such as generating natural language descriptions from images, answering intricate questions about visual content, following open-ended linguistic instructions to manipulate visual data, and creating realistic imagery from text prompts. This issue will also address the critical challenges inherent in this endeavor, including computational efficiency, the mitigation of hallucinations in model outputs, data scarcity, and the development of robust evaluation metrics.
Topics of Interest:
We invite the submission of high-quality, original research articles and comprehensive reviews on topics including, but not limited to:
· Architectures for Multimodal Fusion: Novel models for integrating visual encoders (e.g., ViT, CNN) with large language models (e.g., GPT, LLaMA).
· Vision-Language Pre-training (VLP): Strategies for large-scale pre-training on aligned image-text data and efficient fine-tuning for downstream tasks.
· Generative Multimodal Models: Advanced text-to-image generation, text-guided image/video editing, and controllable synthesis.
· Interpretable and Explainable Multimodal AI: Techniques to understand and visualize the reasoning processes of complex vision-language models.
· Efficiency and Scalability: Methods for compressing, distilling, and accelerating large-scale multimodal models for practical deployment.
· Reasoning and Knowledge Grounding: Enhancing models with commonsense reasoning, factual knowledge, and spatial understanding for complex question answering (VQA) and dialogue.
· Domain-Specific Applications:
Healthcare: Automated radiology report generation, medical visual question answering.
Autonomous Systems: Enhanced scene understanding and decision-making for robotics and self-driving cars.
Accessibility: Advanced tools for image captioning and visual assistance for the visually impaired.
This special issue will serve as a pivotal platform for researchers at the forefront of multimodal AI. It provides an international forum to present groundbreaking work that bridges the computer vision and natural language processing communities. By bringing together diverse expertise, this collection aims to define the state of the art, address fundamental challenges, and chart the future direction of intelligent systems that can truly see and talk about our world. Contributions to this issue will be instrumental in shaping the next wave of AI applications and research.


Submit a Paper
Propose a Special lssue