Automated Arabic Text Classification Using Hyperparameter Tuned Hybrid Deep Learning Model

Badriyya Al-onazi; Saud Alotaib; Saeed Alshahrani; Najm Alotaibi; Mrim Alnfiai; Ahmed Salama; Manar Hamza

doi:10.32604/cmc.2023.033564

icon Open Access

ARTICLE

Automated Arabic Text Classification Using Hyperparameter Tuned Hybrid Deep Learning Model

Badriyya B. Al-onazi¹, Saud S. Alotaib², Saeed Masoud Alshahrani^3,*, Najm Alotaibi⁴, Mrim M. Alnfiai⁵, Ahmed S. Salama⁶, Manar Ahmed Hamza⁷

1 Department of Language Preparation, Arabic Language Teaching Institute, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh, 11671, Saudi Arabia
2 Department of Information Systems, College of Computing and Information System, Umm Al-Qura University, Saudi Arabia
3 Department of Computer Science, College of Computing and Information Technology, Shaqra University, Shaqra, Saudi Arabia
4 Prince Saud AlFaisal Institute for Diplomatic Studies, Saudi Arabia
5 Department of Information Technology, College of Computers and Information Technology, Taif University, Taif P.O. Box 11099, Taif, 21944, Saudi Arabia
6 Department of Electrical Engineering, Faculty of Engineering & Technology, Future University in Egypt, New Cairo, 11845, Egypt
7 Department of Computer and Self Development, Preparatory Year Deanship, Prince Sattam bin Abdulaziz University, AlKharj, Saudi Arabia

* Corresponding Author: Saeed Masoud Alshahrani. Email: email

Computers, Materials & Continua 2023, 74(3), 5447-5465. https://doi.org/10.32604/cmc.2023.033564

Received 20 June 2022; Accepted 09 September 2022; Issue published 28 December 2022

Abstract

The text classification process has been extensively investigated in various languages, especially English. Text classification models are vital in several Natural Language Processing (NLP) applications. The Arabic language has a lot of significance. For instance, it is the fourth mostly-used language on the internet and the sixth official language of the United Nations. However, there are few studies on the text classification process in Arabic. A few text classification studies have been published earlier in the Arabic language. In general, researchers face two challenges in the Arabic text classification process: low accuracy and high dimensionality of the features. In this study, an Automated Arabic Text Classification using Hyperparameter Tuned Hybrid Deep Learning (AATC-HTHDL) model is proposed. The major goal of the proposed AATC-HTHDL method is to identify different class labels for the Arabic text. The first step in the proposed model is to pre-process the input data to transform it into a useful format. The Term Frequency-Inverse Document Frequency (TF-IDF) model is applied to extract the feature vectors. Next, the Convolutional Neural Network with Recurrent Neural Network (CRNN) model is utilized to classify the Arabic text. In the final stage, the Crow Search Algorithm (CSA) is applied to fine-tune the CRNN model’s hyperparameters, showing the work’s novelty. The proposed AATC-HTHDL model was experimentally validated under different parameters and the outcomes established the supremacy of the proposed AATC-HTHDL model over other approaches.

Keywords

Hybrid deep learning; natural language processing; arabic language; text classification; parameter tuning

1 Introduction

Arabic is one of the six prevalently-spoken global languages and its speakers are spread across the nations, especially in the Middle East region. Further, a certain number of Islamic nations to use Arabic as their main language since it is the language of the Qur’an [1,2]. The Arabic language has a total of 28 letters including elmodod and hamzas. Every note contains some notations that denote the position of a letter in a word, i.e., in the beginning, middle or at the end of a word [3,4]. There are no upper-or lower cases in the Arabic language, which is a peculiar characteristic compared to the other languages [5]. The geomorphology of the Arabic language is a complex phenomenon, while the complexity of the language is attributed to the fact that it contains radical letters, suffixes and prefixes.

Further, the letters in the Arabic language do not have consecutive procedures as in other languages. This is because the term assemblies of the letters are dissimilar and hinge on location in a sentence to convey a meaning [6,7]. As mentioned above, the categorization of the texts is extensive in the Arabic language. In contrast, various research works have been conducted earlier to categorize the texts in other languages, such as English language. However, the studies concerning the categorization of the texts in Arabic are limited [8]. Several methods and systems have been developed for classifying English texts which provide brilliant outcomes with high precision, thanks to the nature of the words, language and the letters in the English language [9].

Arabic Natural Language Processing (NLP) is a challenging process due to the challenging morphology and structure of the language. Therefore, the number of studies on Arabic NLP is less than the English NLP [10]. In general, several methods are utilized to improve the applications of NLP, like statistical methods, word embedding, graph representation and Deep Learning (DL) [11]. Text classification is a categorization or labelling procedure focusing on tagging the writing processes and distinguishing its kind or category [12]. Text is classified either as a single-label text or a multi-label text under which many difficult and different categories of labels exist. Several challenges exist in the classification of Arabic text and it cannot be handled using the methods generally used for English NLP. This is because Arabic is an amusing yet difficult language with delicate geomorphology [13]. Various studies indicate that the data is an important complexity in the Arabic text classification process. If a text contains additional structures, it increases the runtime too. So, the preferred solution to overcome this problem is to reduce the number of features. Therefore, feature selection techniques are proposed to select the appropriate features representing the language [14].

1.1 Prior Works on Arabic Text Classification

Boukil et al. [15] introduced a state-of-the-art technique for classifying Arabic texts. In this study, an Arabic stemming approach was introduced for the extraction, selection and reduction of the required characteristics. Next, the Term Frequency-Inverse Document Frequency (TF-IDF) approach was incorporated to provide weightage for the features. In the categorization stage, the DL algorithm was utilized, which is powerful in other fields such as pattern recognition and image processing. Though it is rarely used in text mining approaches, this study used Convolutional Neural Network. Aljedani et al. [16] examined a hierarchical multi-level categorization model with respect to Arabic. In this study, a Hierarchical Multi-level Arabic Text Categorization (HMATC) mechanism was developed based on Machine Learning (ML) method. Also, the study assessed the influence of the feature set dimensions and Feature Selection (FS) methods on the classification accuracy of the method. The Hierarchy of Multi-label categorization (HOMER) approach was improved by examining distinct sets of multi-label categories, clustering algorithms and dissimilar cluster counts to enhance the outcomes of the hierarchical classification.

Elnagar et al. [17] presented a novel unbiased, and rich dataset for single-label (SANAD) and multi-label (NADiA) Arabic text classification process. These datasets were made easily accessible to the researchers in Arabic computation linguistics. Furthermore, the study presented a wide range of DL analytical approaches for the Arabic text classification and estimated the model’s efficacy. The proposed model had a distinctive feature compared to the existing models, i.e., no pre-processing stage was required in the proposed model, and it was completely based on the DL method. In literature [18], a better methodology was proposed for the Arabic text classification process in which the Chi-square FS method was applied to improve the accuracy of the classification process. In addition, the study compared the chi-square values with three other conventional FS metrics as Chi-square, mutual data and information gain. Sainte et al. [19] could not execute the Arabic text classification process comprehensively due to the difficulty of the Arabic language. A new firefly algorithm-related FS technique was developed in this study. This technique was effectively employed in dissimilar combinatorial challenges. But the model did not include the FS model to perform Arabic Text categorization.

1.2 Paper Contribution

The current study has developed an Automated Arabic Text Classification using Hyperparameter-Tuned Hybrid Deep Learning (AATC-HTHDL) model. The major goal of the developed AATC-HTHDL algorithm is to identify different class labels for the Arabic text. It includes distinct levels of operations. At first, the data is pre-processed to transform the input data into a useful format. Secondly, the TF-IDF model is applied to extract the feature vectors. Next, the Convolutional Neural Network with Recurrent Neural Network (CRNN) method is utilized for the classification of the Arabic text. Finally, the Crow Search Algorithm (CSA) is used for fine tuning the hyperparameters involved in the CRNN model. The results from the experimental analysis conducted upon the proposed AATC-HTHDL model were investigated under distinct aspects.

2 Materials and Methods

The proposed AATC-HTHDL model effectively identifies different class labels for the Arabic text. It encompasses different stages of operations. At first, data pre-processing is performed to transform the input data into a useful format. Secondly, the TF-IDF model is applied to extract the feature vectors. Next, the CRNN model is utilized for Arabic text classification. Finally, the CSA is used to fine-tune the hyperparameters involved in the CRNN model. Fig. 1 depicts the block diagram of the AATC-HTHDL approach.

images

Figure 1: Block diagram of the AATC-HTHDL approach

2.1 Stage I: Data Pre-Processing

Primarily, the data is pre-processed to transform the input data into a useful format. In general, the tokenization procedure considers the dataset and divides it into distinct tokens. The tokens of single-or 2-characters length, numerical values and the non-Arabic characters are avoided from the dataset so that it does not affect the performance of the classifier [20]. Regular expressions help perform the tokenization process, whereas stop words are nothing but general functional words, for instance, prepositions, conjunctions and so on. Mostly, these words appear in a text with less impact on the classification procedure. A list of Arabic stop-words is collected, and such words are removed from the text. The NLP application developers eliminate such stop-words from the search engine indices so that the size of the indices gets extensively reduced. This phenomenon enhances recall and the precision results. Then, stemming, a mapping procedure is performed on the derivative words into a base format, i.e., stem that has been shared. Stemming employs the morphological heuristics to remove the affixes from the words before indexing it. For instance, the Arabic words images , images and images share the same root images .

2.2 Stage II: TF-IDF Model

In this stage, the TF-IDF model is applied to extract the feature vectors. Generally, the TF-IDF model operates by defining a similar frequency of the words in a document, in comparison with inverse ratio of the words over an entire corpus [21]. Indeed, this computation defines the significance of a word, provided in a specific document. A conventional word, in a few documents or a single document, is appropriate to have a high TF-IDF number than the conventional words like prepositions and articles. Assume a group of documents D and a word w, while deD denotes an individual document. Here, wd i.e., the weight of the word w is calculated by employing the Eqs. (1) and (2).

wd=TF∗IDF(1)

wd=fw,d∗log(|D|/fw,D)(2)

In this expression, fw,d indicates the number of times, the word w emerges in the d document, |D| indicates the size of the corpus and fw,D indicates the document count in which w emerges in D.

Given that |D|∼fw,D, viz., the size of the corpus tentatively corresponds to the frequency of w.

1<log(|D|/fw,D)<cte(3)

The above equation is satisfied by a small constant like cte. Here, wd remains small in comparison with fw,d, which is still positive. This recommends that w is roughly communal over an entire corpus and simultaneously holds a certain set of consequences all over the element, D. For example, at some circumstances, the extreme elementary words like prepositions, articles and pronouns do not possess any applicable significance. Consequently, these words secure low TF-IDF scores that make it fundamentally unrelated during the search.

At last, visualize that fw,d is larger and fw,D is smaller. Here, the log(|D|/fw,D) is larger and wd is smaller. These circumstances heavily create the interest, because the words with maximum wd recommend that a w word is significant in the d document, but is not communal in the D corpus. This word w claims to be a significant discriminative power.

2.3 Stage III: CRNN Based Classification

For effectual classification of the Arabic text, the CRNN model is exploited in this study. The Convolutional Neural Network (CNN) model yields better performance in the development of features from spatial datasets like images. However, the CNN model cannot handle consecutive datasets. On the contrary, the Recurrent Neural Network (RNN)-based model possesses stronger abilities in terms of modelling the consecutive datasets like text [22]. Based on the simulation from features, the authors propose to combine RNN and CNN as a novel mechanism and term it as CRNN. When grouping RNN and CNN, the CNN method is placed in front of the RNN method which results in the creation of the CRNN method.

y=F(X,θ) of CRNN is denoted by:

y=Softmax(FC(LSTM(Pool(Conv(X,θC0NV),θPool),θLSTM),θFC))(4)

Here, the additional grouping is performed to place the RNN in front of the CNN and it results in the formation of the RCNN mechanism. Y=F(X,θ) of RCNN is denoted as follows.

y=Softmax(FC(Pool(Conv(LSTM(X,θLSTM),θC0NV),θPool),θFC))(5)

Here, the input dataset x denotes the short and sequential sentences. A novel and a better CNN-RNN mechanism is developed in this study. In the presented method, the CNN approach is placed in front of the RNN approach. A strong pooling layer moves towards the back of the Long Short Term Memory (LSTM)-RNN layer. In other terms, the LSTM layer processes the textual features that are attained by the convolutional layer and extracts the time-based features of the input dataset. Then, with the help of the maximal pooling layer, the text and its temporal characteristics are grouped, highlighted and compressed. The principle of the presented mechanism is that the problem is consecutive yet shorter. Hence, a better CRNN method places the pooling layer with prominent features behind the LSTM layer. It also highlights the textual features as well as the temporal features.

y=F(X,θ) of CRNN is denoted by:

y=Softmax (FC(Pool(LSTM(Conv(X,θC0NV),θLSTM),θPool),θFC))(6)

2.4 Stage IV: CSA Based Parameter Tuning

In this final stage, the CSA method is used to fine-tune the hyperparameters of the CRNN method. CSA is a population-based stochastic searching technique [23]. The presented method is a recently-established optimization approach that is used to resolve complex engineering-based optimization problems. It is inspired from the smart performance of the crows. The principles of the CSA model are listed herewith.

a. Crow lives in a flock.

b. Crow remembers the location of the hiding position

c. Crow follows one another to commit thievery.

d. Crow protects its caches from getting stolen through possibility.

In line with the abovementioned assumptions, the central method of the CSA has three fundamental stages such as initialization, generation of a novel location and updating the crow’s memory. In the beginning, an initialized population of the crows, characterized by n dimension, is arbitrarily produced. During the t iteration, the location of the crow is quantified as xi,t=[x1i,t, x2i,t,……,xni,t] and is considered that these crows possess better knowledge and memory, mi,t=[mm1i,t, m2iJt,⋯⋯′mniJt]. To produce a novel location, the ith crow arbitrarily selects a j-th crow from the population and tries to chase it so as to find the location of its hiding position (mj). In such cases, two states may occur based on a variable called Awareness Possibility (AP).

a. State 1: The crow j does not know that the crow i is chasing it i.e., it. Consequently, the crow i determines the hidden location of the crow j.

b. State 2: The crow j knows that it is being chased upon. Therefore, to defend the cache from being pilfered, the crow j fools the crow i by flying to other locations within the search space.

Based on the states mentioned above, the location of the crows is upgraded as follows.

xi,iter+1={xi,iter+1+ri×fli, iter×(mi,iter−xi,iter)rj≥APj,iterA random positon of search space otherwise(7)

In Eq. (7), rj represents a uniform distribution fuzzy value within [0,1] and APj,iter signifies the awareness possibility of the crow j at t iteration. At last, the crow upgrades its memory with the help of the following expression.

mi,iter+1={xi,iter+1,if f(xi,iter)is better than f(mi,iter)mi,iter,otherwise(8)

In Eq. (8), f(−) represents the objective function. Once the fitness function of the novel location of a crow is superior to the fitness function of the existing location, the crow upgrades its memory about the novel location. The abovementioned procedure is reiterated until a satisfactory condition i.e., (iter-max) is achieved. At last, the optimal solution of the memory is attained as the best possible solution. Fig. 2 demonstrates the flowchart of the CSA method.

images

Figure 2: Flowchart of the CSA method

The CSA method derives a Fitness Function (FF) to achieve the enhanced classification outcomes. It uses a positive numeral to indicate the superior outcomes of the candidate solution. In this work, the reduced classification error rate is considered as the fitness function and is provided in Eq. (9). The finest solution contains the least error rate whereas the poor solution possesses the highest error rate.

fitness(xi)==number of misclassified samplesTotal number of samples∗100(9)

3 Experimental Validation

In this section, a detailed analysis was conducted upon the proposed AATC-HTHDL model using two datasets and the results are discussed herewith. The details related to the dataset-1 are shown in Table 1. The dataset holds 5,690 samples under four class labels.

images

Fig. 3 depicts the confusion matrices formed by the proposed AATC-HTHDL model on the applied dataset-1. The figure reports that the proposed AATC-HTHDL model proficiently recognized the samples under four distinct class labels.

images

Figure 3: Confusion matrices of the AATC-HTHDL approach under dataset-1 (a) 80% of TR data, (b) 20% of TS data, (c) 70% of TR data, and (d) 30% of TS data

Table 2 shows the detailed classification output achieved by the proposed AATC-HTHDL model on 80% of Training (TR) data and 20% of Testing (TS) datasets. The obtained values denote that the proposed AATC-HTHDL model produced an improved performance. With 80% of TR data, the presented AATC-HTHDL model achieved an average accuy of 98.75%, precn of 97.27%, recal of 97.18%, Fscore of 97.23% and a Gmean of 98.15%. Also, with 20% of TS data, the proposed AATC-HTHDL method accomplished an average accuy of 98.77%, precn of 96.69%, recal of 97.22%, Fscore of 97.06% and a Gmean of 98.19%.

images

Table 3 shows the detailed classification outcomes achieved by the proposed AATC-HTHDL model on 70% of TR data and 30% of TS datasets. The obtained values indicate that the proposed AATC-HTHDL model achieved an improved performance. With 70% of TR data, the proposed AATC-HTHDL model provided an average accuy of 97.30%, precn of 94.13%, recal of 93.79%, Fscore of 93.95% and a Gmean of 95.89%. Also, with 30% of TS data, the AATC-HTHDL technique offered an average accuy of 97.80%, precn of 94.85%, recal of 94.70%, Fscore of 94.77% and a Gmean of 96.56%.

images

Both Training Accuracy (TA) and Validation Accuracy (VA) values, achieved by the proposed AATC-HTHDL methodology on dataset-1, are established in Fig. 4. The experimental results imply that the proposed AATC-HTHDL algorithm accomplished the highest TA and VA values whereas the VA values were higher than the TA values.

images

Figure 4: TA and VA analyses results of the AATC-HTHDL approach on dataset-1

Both Training Loss (TL) and Validation Loss (VL) values, obtained by the proposed AATC-HTHDL approach on dataset-1, are illustrated in Fig. 5. The experimental results infer that the proposed AATC-HTHDL technique achieved the minimal TL and VL values whereas the VL values were lesser compared to TL values.

images

Figure 5: TL and VL analyses results of the AATC-HTHDL approach on dataset-1

The details related to the dataset-2 are given in Table 4. The dataset has 1,445 samples under eight class labels.

images

Fig. 6 exhibits the confusion matrices generated by the proposed AATC-HTHDL approach on the applied dataset i.e., dataset-2. The obtained values denote that the proposed AATC-HTHDL technique proficiently recognized the samples under eight dissimilar class labels.

images images

Figure 6: Confusion matrices of the AATC-HTHDL approach under dataset-2 (a) 80% of TR data, (b) 20% of TS data, (c) 70% of TR data, and (d) 30% of TS data

Table 5 depicts the detailed classification outcomes of the proposed AATC-HTHDL method on 80% of TR data and 20% of TS datasets. The figure reports that the presented AATC-HTHDL method achieved a better performance. With 80% of TR data, the AATC-HTHDL approach achieved an average accuy of 98.69%, precn of 92.92%, recal of 91.73%, Fscore of 92.28% and a Gmean of 95.35%. Also, with 20% of TS dataset, the proposed AATC-HTHDL technique accomplished an average accuy of 98.92%, precn of 95.11%, recal of 93.48%, Fscore of 94.21% and a Gmean of 96.34%.

images

Table 6 shows the detailed classification outcomes of the proposed AATC-HTHDL method on 70% of TR dataset and 30% of TS dataset. The values obtained denote that the proposed AATC-HTHDL approach achieved a better performance. With 70% of TR data, the proposed AATC-HTHDL technique accomplished an average accuy of 98.37%, precn of 92.78%, recal of 89.42%, Fscore of 90.82% and a Gmean of 93.99%. Also, with 30% of TS data, the proposed AATC-HTHDL method yielded an average accuy of 98.57%, precn of 93.58%, recal of 91.87%, Fscore of 92.59% and a Gmean of 95.38%.

images

Both TA and VA values, attained by the proposed AATC-HTHDL approach on dataset-2, are established in Fig. 7. The experimental results imply that the proposed AATC-HTHDL algorithm accomplished the highest TA and VA values whereas the VA values were higher than the TA values.

images

Figure 7: TA and VA analyses results of the AATC-HTHDL method on dataset-2

Both TL and VL values, achieved by the proposed AATC-HTHDL approach on dataset-2, are illustrated in Fig. 8. The experimental results infer that the proposed AATC-HTHDL technique achieved the minimal TL and VL values while the VL values were lesser than the TL values.

images

Figure 8: TL and VL analyses results of the AATC-HTHDL approach on dataset-2

In order to validate the improved performance of the proposed AATC-HTHDL model, a comparative analysis was conducted between the proposed model and the rest of the existing models such as Logistic Regression (LR), Support Vector Machine (SVM), Generative Adversarial Network (GAN), Artificial Neural Network (ANN) and Ant Colony Optimization (ACO) on dataset-1 and the results are portrayed in Table 7 and Fig. 9 [24]. The results demonstrate that the proposed AATC-HTHDL model achieved an excellent performance. For instance, in terms of accuy, the proposed AATC-HTHDL model achieved a maximum accuy of 98.77%, whereas the LR, SVM, GAN, ANN and the ACO models accomplished the least accuy values such as 97.53%, 94.29%, 96.22%, 92.26% and 91.79% respectively. Next, in terms of Fscore, the proposed AATC-HTHDL method yielded a maximum Fscore of 97.06%, whereas the LR, SVM, GAN, ANN and the ACO approaches provided the least Fscore values such as 90.58%, 92.55%, 96.17%, 92.36% and 95.74% correspondingly.

images

Figure 9: Comparative analysis results of the AATC-HTHDL approach upon dataset-1

In order to validate the enhanced outcomes of the proposed AATC-HTHDL approach, a comparative analysis was conducted between the proposed model and the existing models on dataset-2 and the results are demonstrated in Table 8 and Fig. 10. The results illustrate that the presented AATC-HTHDL approach achieved an excellent performance. For instance, in terms of accuy, the AATC-HTHDL system achieved a maximum accuy of 98.92%, while the LR, SVM, GAN, ANN and the ACO techniques scored the least accuy values such as 96.46%, 94.99%, 97.62%, 95.30% and 94.78% correspondingly. Next, in terms of Fscore, the presented AATC-HTHDL methodology reported a maximum Fscore of 94.21%, whereas other models such as LR, SVM, GAN, ANN and ACO accomplished less Fscore values such as 89.37%, 91.33%, 92.86%, 92.05% and 93.26% correspondingly.

images

Figure 10: Comparative analysis results of the AATC-HTHDL approach on dataset-2

Based on the analytical results and the discussions made above, it can be inferred that the proposed AATC-HTHDL model achieved a better performance compared to other models.

4 Conclusion

In the current study, the AATC-HTHDL model has been developed to identify different class labels for the Arabic text. It encompasses different stages of operations. At first, the data is pre-processed to transform the input data into a useful format. Secondly, the TF-IDF model is applied to extract the feature vectors. Next, the CRNN model is utilized for the Arabic text classification process. Finally, the CSA is used for fine-tuning the hyperparameters involved in the CRNN model. The proposed AATC-HTHDL model was experimentally validated and the outcomes were investigated under distinct aspects. The comparison study outcomes highlight the promising performance of the presented AATC-HTHDL model over other approaches. In the future, feature selection and feature reduction methodologies can be applied to improve the classification performance of the projected technique.

Funding Statement: Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2022R263), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia. The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code: (22UQU4210118DSR31). The author would like to thank the Deanship of Scientific Research at Shaqra University for supporting this work.

Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.

References

1. A. M. F. A. Sbou, “A survey of arabic text classification models,” International Journal of Electrical and Computer Engineering, vol. 8, no. 6, pp. 4352, 2018. [Google Scholar]

2. A. Wahdan, S. A. L. Hantoobi, S. A. Salloum and K. Shaalan, “A systematic review of text classification research based on deep learning models in arabic language,” International Journal of Electrical and Computer Engineering, vol. 10, no. 6, pp. 6629, 2020. [Google Scholar]

3. F. N. Al-Wesabi, “Proposing high-smart approach for content authentication and tampering detection of arabic text transmitted via internet,” IEICE Transactions on Information and Systems, vol. E103.D, no. 10, pp. 2104–2112, 2020. [Google Scholar]

4. H. Chantar, M. Mafarja, H. Alsawalqah, A. A. Heidari, I. Aljarah et al., “Feature selection using binary grey wolf optimizer with elite-based crossover for arabic text classification,” Neural Computing and Applications, vol. 32, no. 16, pp. 12201–12220, 2020. [Google Scholar]

5. F. N. Al-Wesabi, “A hybrid intelligent approach for content authentication and tampering detection of arabic text transmitted via internet,” Computers, Materials & Continua, vol. 66, no. 1, pp. 195–211, 2021. [Google Scholar]

6. A. E. Kah and I. Zeroual, “The effects of pre-processing techniques on arabic text classification,” International Journal of Advanced Trends in Computer Science and Engineering, vol. 10, no. 1, pp. 41–48, 2021. [Google Scholar]

7. F. N. Al-Wesabi, A. Abdelmaboud, A. A. Zain, M. M. Almazah and A. Zahary, “Tampering detection approach of arabic-text based on contents interrelationship,” Intelligent Automation & Soft Computing, vol. 27, no. 2, pp. 483–498, 2021. [Google Scholar]

8. F. N. Al-Wesabi, “Entropy-based watermarking approach for sensitive tamper detection of arabic text,” Computers, Materials & Continua, vol. 67, no. 3, pp. 3635–3648, 2021. [Google Scholar]

9. M. A. Ahmed, R. A. Hasan, A. H. Ali and M. A. Mohammed, “The classification of the modern arabic poetry using machine learning,” TELKOMNIKA, vol. 17, no. 5, pp. 2667, 2019. [Google Scholar]

10. F. S. Al-Anzi and D. AbuZeina, “Beyond vector space model for hierarchical arabic text classification: A markov chain approach,” Information Processing & Management, vol. 54, no. 1, pp. 105–115, 2018. [Google Scholar]

11. X. Luo, “Efficient English text classification using selected machine learning techniques,” Alexandria Engineering Journal, vol. 60, no. 3, pp. 3401–3409, 2021. [Google Scholar]

12. A. Adel, N. Omar, M. Albared and A. Al-Shabi, “Feature selection method based on statistics of compound words for arabic text classification,” The International Arab Journal of Information Technology, vol. 16, no. 2, pp. 178–185, 2019. [Google Scholar]

13. H. N. Alshaer, M. A. Otair, L. Abualigah, M. Alshinwan and A. M. Khasawneh, “Feature selection method using improved CHI square on arabic text classifiers: Analysis and application,” Multimedia Tools and Applications, vol. 80, no. 7, pp. 10373–10390, 2021. [Google Scholar]

14. R. Janani and S. Vijayarani, “Automatic text classification using machine learning and optimization algorithms,” Soft Computing, vol. 25, no. 2, pp. 1129–1145, 2020. [Google Scholar]

15. S. Boukil, M. Biniz, F. E. Adnani, L. Cherrat and A. E. E. Moutaouakkil, “Arabic text classification using deep learning technics,” International Journal of Grid and Distributed Computing, vol. 11, no. 9, pp. 103–114, 2018. [Google Scholar]

16. N. Aljedani, R. Alotaibi and M. Taileb, “HMATC: Hierarchical multi-label arabic text classification model using machine learning,” Egyptian Informatics Journal, vol. 22, no. 3, pp. 225–237, 2021. [Google Scholar]

17. A. Elnagar, R. A. Debsi and O. Einea, “Arabic text classification using deep learning models,” Information Processing & Management, vol. 57, no. 1, pp. 102121, 2020. [Google Scholar]

18. S. Bahassine, A. Madani, M. A. Sarem and M. Kissi, “Feature selection using an improved Chi-square for arabic text classification,” Journal of King Saud University-Computer and Information Sciences, vol. 32, no. 2, pp. 225–231, 2020. [Google Scholar]

19. S. L. M. Sainte and N. Alalyani, “Firefly algorithm based feature selection for arabic text classification,” Journal of King Saud University-Computer and Information Sciences, vol. 32, no. 3, pp. 320–328, 2020. [Google Scholar]

20. E. Al-Thwaib, B. H. Hammo and S. Yagi, “An academic arabic corpus for plagiarism detection: Design, construction and experimentation,” International Journal of Educational Technology in Higher Education, vol. 17, no. 1, pp. 1, 2020. [Google Scholar]

21. S. Douzi, F. AlShahwan, M. Lemoudden and B. Ouahidi, “Hybrid email spam detection model using artificial intelligence,” International Journal of Machine Learning and Computing, vol. 10, no. 2, pp. 316–322, 2020. [Google Scholar]

22. S. Yusuf, A. Alshdadi, M. Alassafi, R. AlGhamdi and A. Samad, “Predicting catastrophic temperature changes based on past events via a CNN-LSTM regression mechanism,” Neural Computing and Applications, vol. 33, no. 15, pp. 9775–9790, 2021. [Google Scholar]

23. T. Gadekallu, M. Alazab, R. Kaluri, P. K. R. Maddikunta, S. Bhattacharya et al., “Hand gesture classification using a novel CNN-crow search algorithm,” Complex & Intelligent Systems, vol. 7, pp. 1855–1868, 2021. [Google Scholar]

24. K. Sundus, F. Al-Haj and B. Hammo, “A deep learning approach for arabic text classification,” in 2019 2nd Int. Conf. on new Trends in Computing Sciences (ICTCS), Amman, Jordan, pp. 1–7, 2019. [Google Scholar]

Cite This Article

APA Style

Al-onazi, B.B., Alotaib, S.S., Alshahrani, S.M., Alotaibi, N., Alnfiai, M.M. et al. (2023). Automated arabic text classification using hyperparameter tuned hybrid deep learning model. Computers, Materials & Continua, 74(3), 5447-5465. https://doi.org/10.32604/cmc.2023.033564

Vancouver Style

Al-onazi BB, Alotaib SS, Alshahrani SM, Alotaibi N, Alnfiai MM, Salama AS, et al. Automated arabic text classification using hyperparameter tuned hybrid deep learning model. Comput Mater Contin. 2023;74(3):5447-5465 https://doi.org/10.32604/cmc.2023.033564

IEEE Style

B.B. Al-onazi et al., "Automated Arabic Text Classification Using Hyperparameter Tuned Hybrid Deep Learning Model," Comput. Mater. Contin., vol. 74, no. 3, pp. 5447-5465. 2023. https://doi.org/10.32604/cmc.2023.033564

BibTex EndNote RIS

This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Automated Arabic Text Classification Using Hyperparameter Tuned Hybrid Deep Learning Model

Abstract

Keywords

References

Cite This Article

902

607

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Share Link