Multimodal Trajectory Generation for Robotic Motion Planning Using Transformer-Based Fusion and Adversarial Learning

Shtwai Alsubai; Ahmad Almadhor; Abdullah Hejaili; Najib Aoun; Tahani Alsubait; Vincent Karovič

doi:10.32604/cmes.2026.074687

Open Access icon Open Access

ARTICLE

Multimodal Trajectory Generation for Robotic Motion Planning Using Transformer-Based Fusion and Adversarial Learning

Shtwai Alsubai¹, Ahmad Almadhor², Abdullah Al Hejaili³, Najib Ben Aoun^4,5,*, Tahani Alsubait⁶, Vincent Karovič^7,*

1 College of Computer Engineering and Sciences, Prince Sattam bin Abdulaziz University, AlKharj, 16273, Saudi Arabia
2 Department of Computer Engineering and Networks, College of Computer and Information Sciences, Jouf University, Sakaka, 72388, Saudi Arabia
3 Faculty of Computers & Information Technology, Computer Science Department, University of Tabuk, Tabuk, 71491, Saudi Arabia
4 Faculty of Computing and Information, Al-Baha University, Alaqiq 65779-7738, Saudi Arabia
5 REGIM-Lab: Research Groups in Intelligent Machines, National School of Engineers of Sfax (ENIS), University of Sfax, Sfax, 3038, Tunisia
6 Department of Computer Science and Artificial Intelligence, College of Computing, Umm Al-Qura University, Makkah, 21955, Saudi Arabia
7 Department of Information Management and Business Systems, Faculty of Management, Comenius University Bratislava, Odbojárov 10, Bratislava, 82005, Slovakia

* Corresponding Authors: Najib Ben Aoun. Email: email ; Vincent Karovič. Email: email

(This article belongs to the Special Issue: Applied Artificial Intelligence: Advanced Solutions for Engineering Real-World Challenges)

Computer Modeling in Engineering & Sciences 2026, 146(2), 29 https://doi.org/10.32604/cmes.2026.074687

Received 16 October 2025; Accepted 05 January 2026; Issue published 26 February 2026

Abstract

In Human–Robot Interaction (HRI), generating robot trajectories that accurately reflect user intentions while ensuring physical realism remains challenging, especially in unstructured environments. In this study, we develop a multimodal framework that integrates symbolic task reasoning with continuous trajectory generation. The approach employs transformer models and adversarial training to map high-level intent to robotic motion. Information from multiple data sources, such as voice traits, hand and body keypoints, visual observations, and recorded paths, is integrated simultaneously. These signals are mapped into a shared representation that supports interpretable reasoning while enabling smooth and realistic motion generation. Based on this design, two different learning strategies are investigated. In the first step, grammar-constrained Linear Temporal Logic (LTL) expressions are created from multimodal human inputs. These expressions are subsequently decoded into robot trajectories. The second method generates trajectories directly from symbolic intent and linguistic data, bypassing an intermediate logical representation. Transformer encoders combine multiple types of information, and autoregressive transformer decoders generate motion sequences. Adding smoothness and speed limits during training increases the likelihood of physical feasibility. To improve the realism and stability of the generated trajectories during training, an adversarial discriminator is also included to guide them toward the distribution of actual robot motion. Tests on the NATSGLD dataset indicate that the complete system exhibits stable training behaviour and performance. In normalised coordinates, the logic-based pipeline has an Average Displacement Error (ADE) of 0.040 and a Final Displacement Error (FDE) of 0.036. The adversarial generator makes substantially more progress, reducing ADE to 0.021 and FDE to 0.018. Visual examination confirms that the generated trajectories closely align with observed motion patterns while preserving smooth temporal dynamics.

Keywords

Multimodal trajectory generation; robotic motion planning; transformer networks; sensor fusion; reinforcement learning; generative adversarial networks

Cite This Article

APA Style

Alsubai, S., Almadhor, A., Hejaili, A.A., Aoun, N.B., Alsubait, T. et al. (2026). Multimodal Trajectory Generation for Robotic Motion Planning Using Transformer-Based Fusion and Adversarial Learning. Computer Modeling in Engineering & Sciences, 146(2), 29. https://doi.org/10.32604/cmes.2026.074687

Vancouver Style

Alsubai S, Almadhor A, Hejaili AA, Aoun NB, Alsubait T, Karovič V. Multimodal Trajectory Generation for Robotic Motion Planning Using Transformer-Based Fusion and Adversarial Learning. Comput Model Eng Sci. 2026;146(2):29. https://doi.org/10.32604/cmes.2026.074687

IEEE Style

S. Alsubai, A. Almadhor, A. A. Hejaili, N. B. Aoun, T. Alsubait, and V. Karovič, “Multimodal Trajectory Generation for Robotic Motion Planning Using Transformer-Based Fusion and Adversarial Learning,” Comput. Model. Eng. Sci., vol. 146, no. 2, pp. 29, 2026. https://doi.org/10.32604/cmes.2026.074687

BibTex EndNote RIS

Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Multimodal Trajectory Generation for Robotic Motion Planning Using Transformer-Based Fusion and Adversarial Learning

Abstract

Keywords

Cite This Article

425

178

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link