Open Access iconOpen Access

ARTICLE

crossmark

Efficient Arabic Essay Scoring with Hybrid Models: Feature Selection, Data Optimization, and Performance Trade-Offs

Mohamed Ezz1, Meshrif Alruily1,*, Ayman Mohamed Mostafa2,*, Alaa S. Alaerjan1, Bader Aldughayfiq2, Hisham Allahem2, Abdulaziz Shehab2

1 Department of Computer Science, College of Computer and Information Sciences, Jouf University, Sakaka, 72388, Saudi Arabia
2 Information Systems Department, College of Computer and Information Sciences, Jouf University, Sakaka, 72388, Saudi Arabia

* Corresponding Authors: Meshrif Alruily. Email: email; Ayman Mohamed Mostafa. Email: email

Computers, Materials & Continua 2026, 86(1), 1-28. https://doi.org/10.32604/cmc.2025.063189

Abstract

Automated essay scoring (AES) systems have gained significant importance in educational settings, offering a scalable, efficient, and objective method for evaluating student essays. However, developing AES systems for Arabic poses distinct challenges due to the language’s complex morphology, diglossia, and the scarcity of annotated datasets. This paper presents a hybrid approach to Arabic AES by combining text-based, vector-based, and embedding-based similarity measures to improve essay scoring accuracy while minimizing the training data required. Using a large Arabic essay dataset categorized into thematic groups, the study conducted four experiments to evaluate the impact of feature selection, data size, and model performance. Experiment 1 established a baseline using a non-machine learning approach, selecting top-N correlated features to predict essay scores. The subsequent experiments employed 5-fold cross-validation. Experiment 2 showed that combining embedding-based, text-based, and vector-based features in a Random Forest (RF) model achieved an R2 of 88.92% and an accuracy of 83.3% within a 0.5-point tolerance. Experiment 3 further refined the feature selection process, demonstrating that 19 correlated features yielded optimal results, improving R2 to 88.95%. In Experiment 4, an optimal data efficiency training approach was introduced, where training data portions increased from 5% to 50%. The study found that using just 10% of the data achieved near-peak performance, with an R2 of 85.49%, emphasizing an effective trade-off between performance and computational costs. These findings highlight the potential of the hybrid approach for developing scalable Arabic AES systems, especially in low-resource environments, addressing linguistic challenges while ensuring efficient data usage.

Keywords

Automated essay scoring; text-based features; vector-based features; embedding-based features; feature selection; optimal data efficiency

Cite This Article

APA Style
Ezz, M., Alruily, M., Mostafa, A.M., Alaerjan, A.S., Aldughayfiq, B. et al. (2026). Efficient Arabic Essay Scoring with Hybrid Models: Feature Selection, Data Optimization, and Performance Trade-Offs. Computers, Materials & Continua, 86(1), 1–28. https://doi.org/10.32604/cmc.2025.063189
Vancouver Style
Ezz M, Alruily M, Mostafa AM, Alaerjan AS, Aldughayfiq B, Allahem H, et al. Efficient Arabic Essay Scoring with Hybrid Models: Feature Selection, Data Optimization, and Performance Trade-Offs. Comput Mater Contin. 2026;86(1):1–28. https://doi.org/10.32604/cmc.2025.063189
IEEE Style
M. Ezz et al., “Efficient Arabic Essay Scoring with Hybrid Models: Feature Selection, Data Optimization, and Performance Trade-Offs,” Comput. Mater. Contin., vol. 86, no. 1, pp. 1–28, 2026. https://doi.org/10.32604/cmc.2025.063189



cc Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 983

    View

  • 308

    Download

  • 0

    Like

Share Link