Open Access
ARTICLE
A Trajectory-Guided Diffusion Model for Consistent and Realistic Video Synthesis in Autonomous Driving
School of Mechatronics Engineering, Harbin Institute of Technology, Weihai, China
* Corresponding Author: Dafang Wang. Email:
(This article belongs to the Special Issue: Recent Advances in Signal Processing and Computer Vision)
Computer Modeling in Engineering & Sciences 2026, 146(1), 35 https://doi.org/10.32604/cmes.2026.076439
Received 20 November 2025; Accepted 06 January 2026; Issue published 29 January 2026
Abstract
Scalable simulation leveraging real-world data plays an essential role in advancing autonomous driving, owing to its efficiency and applicability in both training and evaluating algorithms. Consequently, there has been increasing attention on generating highly realistic and consistent driving videos, particularly those involving viewpoint changes guided by the control commands or trajectories of ego vehicles. However, current reconstruction approaches, such as Neural Radiance Fields and 3D Gaussian Splatting, frequently suffer from limited generalization and depend on substantial input data. Meanwhile, 2D generative models, though capable of producing unknown scenes, still have room for improvement in terms of coherence and visual realism. To overcome these challenges, we introduce GenScene, a world model that synthesizes front-view driving videos conditioned on trajectories. A new temporal module is presented to improve video consistency by extracting the global context of each frame, calculating relationships of frames using these global representations, and fusing frame contexts accordingly. Moreover, we propose an innovative attention mechanism that computes relations of pixels within each frame and pixels in the corresponding window range of the initial frame. Extensive experiments show that our approach surpasses various state-of-the-art models in driving video generation, and the introduced modules contribute significantly to model performance. This work establishes a new paradigm for goal-oriented video synthesis in autonomous driving, which facilitates on-demand simulation to expedite algorithm development.Keywords
Cite This Article
Copyright © 2026 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools