Open Access
ARTICLE
EdgeST-Fusion: A Cross-Modal Federated Learning and Graph Transformer Framework for Multimodal Spatiotemporal Data Analytics in Smart City Consumer Electronics
Faculty of Computers and Information Technology, Department of Computer Engineering, University of Tabuk, Tabuk, Saudi Arabia
* Corresponding Author: Mohammed M. Alenazi. Email:
(This article belongs to the Special Issue: Integrating Computing Technology of Cloud-Fog-Edge Environments and its Application)
Computers, Materials & Continua 2026, 87(2), 59 https://doi.org/10.32604/cmc.2026.075966
Received 11 November 2025; Accepted 05 January 2026; Issue published 12 March 2026
Abstract
Multimodal spatiotemporal data from smart city consumer electronics present critical challenges including cross-modal temporal misalignment, unreliable data quality, limited joint modeling of spatial and temporal dependencies, and weak resilience to adversarial updates. To address these limitations, EdgeST-Fusion is introduced as a cross-modal federated graph transformer framework for context-aware smart city analytics. The architecture integrates cross-modal embedding networks for modality alignment, graph transformer encoders for spatial dependency modeling, temporal self-attention for dynamic pattern learning, and adaptive anomaly detection to ensure data quality and security during aggregation. A privacy-preserving federated learning protocol with differential privacy guarantees enables collaborative model training without centralizing sensitive data. The framework employs data-quality-aware weighted aggregation to enhance robustness against noisy and malicious client updates. Experimental evaluation on the GeoLife, PeMS-Bay, and SmartHome+ datasets demonstrates that EdgeST-Fusion achieves 21.8% improvement in prediction accuracy, 35.7% reduction in communication overhead, and 29.4% enhancement in security resilience compared to recent baselines. Real-world deployment across three smart city testbeds validates practical viability with 90.0% average accuracy and sub-250 ms inference latency. The proposed framework remains feasible for deployment on heterogeneous and resource-constrained consumer electronics devices while maintaining strong privacy guarantees and scalability for large-scale urban environments.Keywords
Cite This Article
Copyright © 2026 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools