TY - EJOU AU - Ullah, Aman Aman AU - Wu, Yanfeng AU - Najam, Shaheryar AU - Almujally, Nouf Abdullah AU - Jalal, Ahmad AU - Liu, Hui TI - Transformer-Driven Multimodal for Human-Object Detection and Recognition for Intelligent Robotic Surveillance T2 - Computers, Materials \& Continua PY - 2026 VL - 87 IS - 1 SN - 1546-2226 AB - Human object detection and recognition is essential for elderly monitoring and assisted living however, models relying solely on pose or scene context often struggle in cluttered or visually ambiguous settings. To address this, we present SCENET-3D, a transformer-driven multimodal framework that unifies human-centric skeleton features with scene-object semantics for intelligent robotic vision through a three-stage pipeline. In the first stage, scene analysis, rich geometric and texture descriptors are extracted from RGB frames, including surface-normal histograms, angles between neighboring normals, Zernike moments, directional standard deviation, and Gabor-filter responses. In the second stage, scene-object analysis, non-human objects are segmented and represented using local feature descriptors and complementary surface-normal information. In the third stage, human-pose estimation, silhouettes are processed through an enhanced MoveNet to obtain 2D anatomical keypoints, which are fused with depth information and converted into RGB-based point clouds to construct pseudo-3D skeletons. Features from all three stages are fused and fed in a transformer encoder with multi-head attention to resolve visually similar activities. Experiments on UCLA (95.8%), ETRI-Activity3D (89.4%), and CAD-120 (91.2%) demonstrate that combining pseudo-3D skeletons with rich scene-object fusion significantly improves generalizable activity recognition, enabling safer elderly care, natural human–robot interaction, and robust context-aware robotic perception in real-world environments. KW - Human object detection; elderly care; RGB-based pose estimation; scene context analysis; object recognition Gabor features; point cloud reconstruction DO - 10.32604/cmc.2025.072508