Efficient Arabic Essay Scoring with Hybrid Models: Feature Selection, Data Optimization, and Performance Trade-Offs

Mohamed Ezz; Meshrif Alruily; Ayman Mostafa; Alaa Alaerjan; Bader Aldughayfiq; Hisham Allahem; Abdulaziz Shehab

doi:10.32604/cmc.2025.063189

Open Access icon Open Access

ARTICLE

Efficient Arabic Essay Scoring with Hybrid Models: Feature Selection, Data Optimization, and Performance Trade-Offs

Mohamed Ezz¹, Meshrif Alruily^1,*, Ayman Mohamed Mostafa^2,*, Alaa S. Alaerjan¹, Bader Aldughayfiq², Hisham Allahem², Abdulaziz Shehab²

1 Department of Computer Science, College of Computer and Information Sciences, Jouf University, Sakaka, 72388, Saudi Arabia
2 Information Systems Department, College of Computer and Information Sciences, Jouf University, Sakaka, 72388, Saudi Arabia

* Corresponding Authors: Meshrif Alruily. Email: email ; Ayman Mohamed Mostafa. Email: email

Computers, Materials & Continua 2026, 86(1), 1-28. https://doi.org/10.32604/cmc.2025.063189

Received 08 January 2025; Accepted 29 April 2025; Issue published 10 November 2025

Abstract

Automated essay scoring (AES) systems have gained significant importance in educational settings, offering a scalable, efficient, and objective method for evaluating student essays. However, developing AES systems for Arabic poses distinct challenges due to the language’s complex morphology, diglossia, and the scarcity of annotated datasets. This paper presents a hybrid approach to Arabic AES by combining text-based, vector-based, and embedding-based similarity measures to improve essay scoring accuracy while minimizing the training data required. Using a large Arabic essay dataset categorized into thematic groups, the study conducted four experiments to evaluate the impact of feature selection, data size, and model performance. Experiment 1 established a baseline using a non-machine learning approach, selecting top-N correlated features to predict essay scores. The subsequent experiments employed 5-fold cross-validation. Experiment 2 showed that combining embedding-based, text-based, and vector-based features in a Random Forest (RF) model achieved an R² of 88.92% and an accuracy of 83.3% within a 0.5-point tolerance. Experiment 3 further refined the feature selection process, demonstrating that 19 correlated features yielded optimal results, improving R² to 88.95%. In Experiment 4, an optimal data efficiency training approach was introduced, where training data portions increased from 5% to 50%. The study found that using just 10% of the data achieved near-peak performance, with an R² of 85.49%, emphasizing an effective trade-off between performance and computational costs. These findings highlight the potential of the hybrid approach for developing scalable Arabic AES systems, especially in low-resource environments, addressing linguistic challenges while ensuring efficient data usage.

Keywords

Automated essay scoring; text-based features; vector-based features; embedding-based features; feature selection; optimal data efficiency

Cite This Article

APA Style

Ezz, M., Alruily, M., Mostafa, A.M., Alaerjan, A.S., Aldughayfiq, B. et al. (2026). Efficient Arabic Essay Scoring with Hybrid Models: Feature Selection, Data Optimization, and Performance Trade-Offs. Computers, Materials & Continua, 86(1), 1–28. https://doi.org/10.32604/cmc.2025.063189

Vancouver Style

Ezz M, Alruily M, Mostafa AM, Alaerjan AS, Aldughayfiq B, Allahem H, et al. Efficient Arabic Essay Scoring with Hybrid Models: Feature Selection, Data Optimization, and Performance Trade-Offs. Comput Mater Contin. 2026;86(1):1–28. https://doi.org/10.32604/cmc.2025.063189

IEEE Style

M. Ezz et al., “Efficient Arabic Essay Scoring with Hybrid Models: Feature Selection, Data Optimization, and Performance Trade-Offs,” Comput. Mater. Contin., vol. 86, no. 1, pp. 1–28, 2026. https://doi.org/10.32604/cmc.2025.063189

BibTex EndNote RIS

Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Efficient Arabic Essay Scoring with Hybrid Models: Feature Selection, Data Optimization, and Performance Trade-Offs

Abstract

Keywords

Cite This Article

1794

502

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link