A YOLOv11-Based Deep Learning Framework for Multi-Class Human Action Recognition

Nayeemul Nayeem; Shirin Mahbuba; Sanjida Disha; Md Rifat; Shakila Rahman; M. Abdullah-Al-Wadud; Jia Uddin

doi:10.32604/cmc.2025.065061

Open Access icon Open Access

ARTICLE

A YOLOv11-Based Deep Learning Framework for Multi-Class Human Action Recognition

Nayeemul Islam Nayeem¹, Shirin Mahbuba¹, Sanjida Islam Disha¹, Md Rifat Hossain Buiyan¹, Shakila Rahman^1,*, M. Abdullah-Al-Wadud², Jia Uddin^3,*

1 Department of Computer Science, American International University-Bangladesh, Dhaka, 1229, Bangladesh
2 Department of Software Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, 11543, Saudi Arabia
3 Artificial Intelligence and Big Data Department, Woosong University, Daejeon, 34606, Republic of Korea

* Corresponding Authors: Shakila Rahman. Email: email ; Jia Uddin. Email: email

(This article belongs to the Special Issue: Advances in Deep Learning and Neural Networks: Architectures, Applications, and Challenges)

Computers, Materials & Continua 2025, 85(1), 1541-1557. https://doi.org/10.32604/cmc.2025.065061

Received 02 March 2025; Accepted 27 June 2025; Issue published 29 August 2025

Abstract

Human activity recognition is a significant area of research in artificial intelligence for surveillance, healthcare, sports, and human-computer interaction applications. The article benchmarks the performance of You Only Look Once version 11-based (YOLOv11-based) architecture for multi-class human activity recognition. The article benchmarks the performance of You Only Look Once version 11-based (YOLOv11-based) architecture for multi-class human activity recognition. The dataset consists of 14,186 images across 19 activity classes, from dynamic activities such as running and swimming to static activities such as sitting and sleeping. Preprocessing included resizing all images to 512 512 pixels, annotating them in YOLO’s bounding box format, and applying data augmentation methods such as flipping, rotation, and cropping to enhance model generalization. The proposed model was trained for 100 epochs with adaptive learning rate methods and hyperparameter optimization for performance improvement, with a mAP@0.5 of 74.93% and a mAP@0.5-0.95 of 64.11%, outperforming previous versions of YOLO (v10, v9, and v8) and general-purpose architectures like ResNet50 and EfficientNet. It exhibited improved precision and recall for all activity classes with high precision values of 0.76 for running, 0.79 for swimming, 0.80 for sitting, and 0.81 for sleeping, and was tested for real-time deployment with an inference time of 8.9 ms per image, being computationally light. Proposed YOLOv11’s improvements are attributed to architectural advancements like a more complex feature extraction process, better attention modules, and an anchor-free detection mechanism. While YOLOv10 was extremely stable in static activity recognition, YOLOv9 performed well in dynamic environments but suffered from overfitting, and YOLOv8, while being a decent baseline, failed to differentiate between overlapping static activities. The experimental results determine proposed YOLOv11 to be the most appropriate model, providing an ideal balance between accuracy, computational efficiency, and robustness for real-world deployment. Nevertheless, there exist certain issues to be addressed, particularly in discriminating against visually similar activities and the use of publicly available datasets. Future research will entail the inclusion of 3D data and multimodal sensor inputs, such as depth and motion information, for enhancing recognition accuracy and generalizability to challenging real-world environments.

Keywords

Human activity recognition; YOLOv11; deep learning; real-time detection; anchor-free detection; attention mechanisms; object detection; image classification; multi-class recognition; surveillance applications

Cite This Article

APA Style

Islam Nayeem, N., Mahbuba, S., Disha, S.I., Buiyan, M.R.H., Rahman, S. et al. (2025). A YOLOv11-Based Deep Learning Framework for Multi-Class Human Action Recognition. Computers, Materials & Continua, 85(1), 1541–1557. https://doi.org/10.32604/cmc.2025.065061

Vancouver Style

Islam Nayeem N, Mahbuba S, Disha SI, Buiyan MRH, Rahman S, Abdullah-Al-Wadud M, et al. A YOLOv11-Based Deep Learning Framework for Multi-Class Human Action Recognition. Comput Mater Contin. 2025;85(1):1541–1557. https://doi.org/10.32604/cmc.2025.065061

IEEE Style

N. Islam Nayeem et al., “A YOLOv11-Based Deep Learning Framework for Multi-Class Human Action Recognition,” Comput. Mater. Contin., vol. 85, no. 1, pp. 1541–1557, 2025. https://doi.org/10.32604/cmc.2025.065061

BibTex EndNote RIS

Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

A YOLOv11-Based Deep Learning Framework for Multi-Class Human Action Recognition

Abstract

Keywords

Cite This Article

4232

2327

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link