Open Access iconOpen Access

ARTICLE

crossmark

Industrial EdgeSign: NAS-Optimized Real-Time Hand Gesture Recognition for Operator Communication in Smart Factories

Meixi Chu1, Xinyu Jiang1,*, Yushu Tao2

1 School of Engineering, The University of Sydney, Sydney, 2006, Australia
2 School of Information Science and Engineering, Northeastern University, Shenyang, 110819, China

* Corresponding Author: Xinyu Jiang. Email: email

(This article belongs to the Special Issue: Intelligent Computation and Large Machine Learning Models for Edge Intelligence in industrial Internet of Things)

Computers, Materials & Continua 2026, 86(2), 1-23. https://doi.org/10.32604/cmc.2025.071533

Abstract

Industrial operators need reliable communication in high-noise, safety-critical environments where speech or touch input is often impractical. Existing gesture systems either miss real-time deadlines on resource-constrained hardware or lose accuracy under occlusion, vibration, and lighting changes. We introduce Industrial EdgeSign, a dual-path framework that combines hardware-aware neural architecture search (NAS) with large multimodal model (LMM) guided semantics to deliver robust, low-latency gesture recognition on edge devices. The searched model uses a truncated ResNet50 front end, a dimensional-reduction network that preserves spatiotemporal structure for tubelet-based attention, and localized Transformer layers tuned for on-device inference. To reduce reliance on gloss annotations and mitigate domain shift, we distill semantics from factory-tuned vision-language models and pre-train with masked language modeling and video-text contrastive objectives, aligning visual features with a shared text space. On ML2HP and SHREC’17, the NAS-derived architecture attains 94.7% accuracy with 86 ms inference latency and about 5.9 W power on Jetson Nano. Under occlusion, lighting shifts, and motion blur, accuracy remains above 82%. For safety-critical commands, the emergency-stop gesture achieves 72 ms 99th percentile latency with 99.7% fail-safe triggering. Ablation studies confirm the contribution of the spatiotemporal tubelet extractor and text-side pre-training, and we observe gains in translation quality (BLEU-4 22.33). These results show that Industrial EdgeSign provides accurate, resource-aware, and safety-aligned gesture recognition suitable for deployment in smart factory settings.

Keywords

Hand gesture recognition; spatio-temporal feature extraction; transformer; industrial Internet; edge intelligence

Cite This Article

APA Style
Chu, M., Jiang, X., Tao, Y. (2026). Industrial EdgeSign: NAS-Optimized Real-Time Hand Gesture Recognition for Operator Communication in Smart Factories. Computers, Materials & Continua, 86(2), 1–23. https://doi.org/10.32604/cmc.2025.071533
Vancouver Style
Chu M, Jiang X, Tao Y. Industrial EdgeSign: NAS-Optimized Real-Time Hand Gesture Recognition for Operator Communication in Smart Factories. Comput Mater Contin. 2026;86(2):1–23. https://doi.org/10.32604/cmc.2025.071533
IEEE Style
M. Chu, X. Jiang, and Y. Tao, “Industrial EdgeSign: NAS-Optimized Real-Time Hand Gesture Recognition for Operator Communication in Smart Factories,” Comput. Mater. Contin., vol. 86, no. 2, pp. 1–23, 2026. https://doi.org/10.32604/cmc.2025.071533



cc Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 446

    View

  • 142

    Download

  • 0

    Like

Share Link