Home / Journals / CMC / Online First / doi:10.32604/cmc.2026.076732
Special Issues
Table of Content

Open Access

ARTICLE

TQU-GraspingObject: 3D Common Objects Detection, Recognition, and Localization on Point Cloud for Hand Grasping in Sharing Environments

Thi-Loan Nguyen1,2,*, Huy-Nam Chu3, The-Thanh Hua3, Trung-Nghia Phung2, Van-Hung Le3,*
1 Institute of Information Technology, Hanoi Pedagogical University 2, Phu Tho Province, Vietnam
2 University of Information Technology and Communication, Thai Nguyen University, Thai Nguyen Province, Vietnam
3 Information Technology Department, Tan Trao University, Tuyen Quang Province, Vietnam
* Corresponding Author: Thi-Loan Nguyen. Email: email; Van-Hung Le. Email: email

Computers, Materials & Continua https://doi.org/10.32604/cmc.2026.076732

Received 25 November 2025; Accepted 13 January 2026; Published online 13 February 2026

Abstract

To support the process of grasping objects on a tabletop for the blind or robotic arm, it is necessary to address fundamental computer vision tasks, such as detecting, recognizing, and locating objects in space, and determining the position of the grasping information. These results can then be used to guide the visually impaired or to execute grasping tasks with a robotic arm. In this paper, we collected, annotated, and published the benchmark TQU-GraspingObject dataset for testing, validation, and evaluation of deep learning (DL) models for detecting, recognizing, and localizing grasping objects in 2D and 3D space, especially 3D point cloud data. Our dataset is collected in a shared room, with common everyday objects placed on the tabletop in jumbled positions by Intel RealSense D435 (IR-D435). This dataset includes more than 63k RGB-D pairs and related data such as normalized 3D object point cloud, 3D object point cloud segmented, coordinate system normalization matrix, 3D object point cloud normalized, and hand pose for grasping each object. At the same time, we also conducted experiments on four DL networks with the best performance: SSD-MobileNetV3, ResNet50-Transformer, ResNet101-Transformer, and YOLOv12. The results present that YOLOv12 has the most suitable results in detecting and recognizing objects in images. All data, annotations, toolkit, source code, point cloud data, and results are publicly available on our project website: https://github.com/HuaTThanhIT2327Tqu/datasetv2.

Keywords

Grasping object of blind/Robot arm; TQU-graspingobject benchmark dataset; 3D point cloud data; deep learning (DL); object detection/recognition; intel realsense D435 (IR-D435)
  • 133

    View

  • 25

    Download

  • 0

    Like

Share Link