Open Access iconOpen Access

ARTICLE

crossmark

Novel Machine Learning–Based Approach for Arabic Text Classification Using Stylistic and Semantic Features

Fethi Fkih1,2,*, Mohammed Alsuhaibani1, Delel Rhouma1,2, Ali Mustafa Qamar1

1 Department of Computer Science, College of Computer, Qassim University, Buraydah, Saudi Arabia
2 MARS Research Lab LR17ES05, University of Sousse, Sousse, Tunisia

* Corresponding Author: Fethi Fkih. Email: email

Computers, Materials & Continua 2023, 75(3), 5871-5886. https://doi.org/10.32604/cmc.2023.035910

Abstract

Text classification is an essential task for many applications related to the Natural Language Processing domain. It can be applied in many fields, such as Information Retrieval, Knowledge Extraction, and Knowledge modeling. Even though the importance of this task, Arabic Text Classification tools still suffer from many problems and remain incapable of responding to the increasing volume of Arabic content that circulates on the web or resides in large databases. This paper introduces a novel machine learning-based approach that exclusively uses hybrid (stylistic and semantic) features. First, we clean the Arabic documents and translate them to English using translation tools. Consequently, the semantic features are automatically extracted from the translated documents using an existing database of English topics. Besides, the model automatically extracts from the textual content a set of stylistic features such as word and character frequencies and punctuation. Therefore, we obtain 3 types of features: semantic, stylistic and hybrid. Using each time, a different type of feature, we performed an in-depth comparison study of nine well-known Machine Learning models to evaluate our approach and used a standard Arabic corpus. The obtained results show that Neural Network outperforms other models and provides good performances using hybrid features (F1-score = 0.88%).

Keywords


Cite This Article

APA Style
Fkih, F., Alsuhaibani, M., Rhouma, D., Qamar, A.M. (2023). Novel machine learning–based approach for arabic text classification using stylistic and semantic features. Computers, Materials & Continua, 75(3), 5871-5886. https://doi.org/10.32604/cmc.2023.035910
Vancouver Style
Fkih F, Alsuhaibani M, Rhouma D, Qamar AM. Novel machine learning–based approach for arabic text classification using stylistic and semantic features. Comput Mater Contin. 2023;75(3):5871-5886 https://doi.org/10.32604/cmc.2023.035910
IEEE Style
F. Fkih, M. Alsuhaibani, D. Rhouma, and A.M. Qamar, “Novel Machine Learning–Based Approach for Arabic Text Classification Using Stylistic and Semantic Features,” Comput. Mater. Contin., vol. 75, no. 3, pp. 5871-5886, 2023. https://doi.org/10.32604/cmc.2023.035910



cc Copyright © 2023 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 893

    View

  • 468

    Download

  • 0

    Like

Share Link