Home / Journals / CMC / Online First / doi:10.32604/cmc.2025.071533
Special Issues
Table of Content

Open Access

ARTICLE

Industrial EdgeSign: NAS-Optimized Real-Time Hand Gesture Recognition for Operator Communication in Smart Factories

Meixi Chu1, Xinyu Jiang1,*, Yushu Tao2
1 School of Engineering, The University of Sydney, Sydney, 2006, Australia
2 School of Information Science and Engineering, Northeastern University, Shenyang, 110819, China
* Corresponding Author: Xinyu Jiang. Email: email
(This article belongs to the Special Issue: Intelligent Computation and Large Machine Learning Models for Edge Intelligence in industrial Internet of Things)

Computers, Materials & Continua https://doi.org/10.32604/cmc.2025.071533

Received 06 August 2025; Accepted 13 October 2025; Published online 13 November 2025

Abstract

Industrial operators need reliable communication in high-noise, safety-critical environments where speech or touch input is often impractical. Existing gesture systems either miss real-time deadlines on resource-constrained hardware or lose accuracy under occlusion, vibration, and lighting changes. We introduce Industrial EdgeSign, a dual-path framework that combines hardware-aware neural architecture search (NAS) with large multimodal model (LMM) guided semantics to deliver robust, low-latency gesture recognition on edge devices. The searched model uses a truncated ResNet50 front end, a dimensional-reduction network that preserves spatiotemporal structure for tubelet-based attention, and localized Transformer layers tuned for on-device inference. To reduce reliance on gloss annotations and mitigate domain shift, we distill semantics from factory-tuned vision-language models and pre-train with masked language modeling and video-text contrastive objectives, aligning visual features with a shared text space. On ML2HP and SHREC’17, the NAS-derived architecture attains 94.7% accuracy with 86 ms inference latency and about 5.9 W power on Jetson Nano. Under occlusion, lighting shifts, and motion blur, accuracy remains above 82%. For safety-critical commands, the emergency-stop gesture achieves 72 ms 99th percentile latency with 99.7% fail-safe triggering. Ablation studies confirm the contribution of the spatiotemporal tubelet extractor and text-side pre-training, and we observe gains in translation quality (BLEU-4 22.33). These results show that Industrial EdgeSign provides accurate, resource-aware, and safety-aligned gesture recognition suitable for deployment in smart factory settings.

Keywords

Hand gesture recognition; spatio-temporal feature extraction; transformer; industrial Internet; edge intelligence
  • 148

    View

  • 21

    Download

  • 0

    Like

Share Link