Emotional Analysis of Arabic Saudi Dialect Tweets Using a Supervised Learning Approach

Social media sites produce a large amount of data and offer a highly competitive advantage for companies when they can benefit from and address data, as data provides a deeper understanding of clients and their needs. This understanding of clients helps in effectively making the correct decisions within the company, based on data obtained from social media websites. Thus, sentiment analysis has become a key tool for understanding that data. Sentiment analysis is a research area that focuses on analyzing people’s emotions and opinions to identify the polarity (e.g., positive or negative) of a given text. Since we need to analyze emotions and opinions more deeply, emotional analysis (EA) has emerged. This analysis means deeply categorizing words into emotional categories, such as anger, disgust, fear, joy, sadness and surprise, rather than positive or negative. Studies on EA field for the Arabic language are limited, and our research is a contribution to this area. We built a system that classifies the emotions of Arabic tweets (mainly Saudi-based tweets) under the appropriate emotional categories using a supervised machine learning (ML) approach. The six basic emotion categories are anger, disgust, fear, joy, sadness and surprise. The multinomial naïve bayes (MNB), support vector machine (SVM) and logistic regression classifiers were used as the classification methods. A comprehensive comparison between these classifiers was performed in terms of accuracy, precision, recall and F-measure. Saudi tweets were collected and used as the dataset. A corpus of Saudi dialect tweets was created from this dataset as part of this study. The experimental results indicate that SVM and logistic regression achieved the best results, with an overall accuracy of 73.39%.


Introduction
Social media websites have developed significantly in recent years and have become an important source of information. These sites provide an opportunity for people around the world to communicate and share information. The vast majority of data available from the large real-world social network can Recent research has been motivated by the possibility of detecting emotions of various classes (rather than simply positive or negative sentiments) in texts. There has been limited research focusing on EA in the Arabic language, and especially in the Saudi dialect. However, it is hypothesized that the EA of Arabic tweets will have a significant effect on various Saudi Arabian economic, scientific, and social sectors. For business, emotional data is a strategic marketing tool that helps to get to know the customer and develop a quality product. For health care, it can be used to understand patients' emotions in order to improve the patient experience. For government, it can provide an overview of people's emotions regarding certain topics. These examples of its broad application demonstrate why EA is so powerful [6].
Its versatility and potential benefits have made EA classification very attractive to researchers. In the literature, a variety of methods have been proposed to address the problem of classifying emotions in different languages. However, EA classification for the Arabic language is still a new problem-one that remains open to more thorough investigation. To the best of our knowledge, no existing work has been performed in this field with Arabic textual content-based information retrieved from Twitter (mainly Saudi-based tweets). In this study, we will build a system to detect the emotional underpinnings of Saudi dialect tweets and classify them into appropriate emotional categories using a supervised machine learning (ML) approach. Our classification was influenced by the aforementioned six basic emotion categories: anger, disgust, fear, joy, sadness, and surprise. Furthermore, a new dataset of Saudi dialect tweets will be collected to expand the corpus of these tweets. This corpus will then be subjected to the proposed emotional classification model.
The classification of texts in general as positive or negative is the focal point of setiment analysis (SA), and another progression of it, while perceiving the specific emotion expressed in the text, is the undertaking of EA. Therefore, in certain contexts, EA is more suitable than a polarity SA. In a sentiment classification, the categories, such as positive and negative, are the poles, whereas in an emotion classification the categories are broader and may include happiness, relief, pleasantness, fear, sadness, and anger.
The paper is organized as follows: In Section 2, we present an overview of previous works related to solving the problem of the classification of emotions. Section 3 provides a detailed description of our approach, and Section 4 details the results. Section 5 presents the discussion. Finally, the conclusion and future work are explained in Section 6.

Related Work
Currently, the automatic detection of feelings and emotions is utilized by numerous applications in various fields, including security informatics, e-learning, humor detection, and targeted advertising, among others. Many applications center around social media. Twitter is currently an abundant source of individual opinions on a variety of topics-so abundant, in fact, that researchers are interested in developing an automated EA tailored specifically to Twitter. However, most current research focuses on English [3]. Also, two different methods could be used to solve the EA problem. The first is the lexiconbased method, whereas the second is the ML method. Therefore, in our review of the literature, we divided previous related studies into two main sections: EA in text and EA in emoticons.

Lexicon-based Approach
The focal point in a study proposed by Al'abed et al. [7] was to address the EA problem for Arabic text composed primarily in modern standard Arabic with the occasional use of dialectical Arabic (DA). They created a lexicon-based approach for the EA of Arabic text. For this purpose, they consulted an existing emotion lexicon called the National Research Council word-emotion association lexicon (also called EmoLex) and established their own lexicon-based tool. This lexicon was initially developed for the English language and contained 14,182 terms. It was later translated into over 20 additional languages, including Arabic, by Google Translate using Google's in-house dataset. As a lexicon-based approach, the tool for the emotion detection of Arabic text must begin with a lexicon (dictionary) of terms, relevant to at least one emotion. EmoLex contained various terms representing no feelings. Finally, the result revealed that Al'abed et al.'s lexicon-based approach was effective, with an accuracy rate of 89.7% [7]. According to Wani et al. [8], within the constraints of the client information accessible on Facebook, in recognizing the feelings of netizens in conflict and non-conflict areas, individuals living in serene areas are less negative than those living in conflict areas. Plutchik's eight fundamental emotions method was applied to a set of Facebook posts with the help of two users' source accounts, with one user living in Kashmir (conflict area) and the other in Delhi (non-conflict area). In addition, a new emotion dictionary called MoodBook was created to decide the emotional state of a user based on the EmoLex and Empath lexicons. There is another approach proposed by El Ghoary et al. [9] for six basic emotions using children's stories as their dataset. The authors found that approximately 65% of the six emotions could be detected in Arabic emotional sentences [9].

Machine Learning Approach
The objective behind a study proposed by Alsmearat et al. [10] was to determine whether female writers are more emotional than male writers. They concentrated on Arabic articles to study gender identification. They employed NB, decision tree (DT), SVMs, and K-nearest neighbor classifiers in their approach. For the feature extraction and selection, they used the bag of words (BOW) model and term frequencyinverse document frequency (tf-idf) as a weighting technique. The results showed no evidence for identifying gender through emotional text [10].
Tocoglu et al. [11] used a dataset of Turkish texts called "TREMO," and classified emotional data into six categories: happiness, fear, anger, sadness, disgust, and surprise. Their dataset was compiled by surveying 4,709 individuals and was composed of 27,350 entries. Five thousand individuals participated in the survey, which involved analyzing the six emotions. This approach employed complement naïve Bayes (CNB), random forest (RF), DT C4.5 (J48), and an updated version of SVM. For feature extraction and selection, they used mutual information (MI) and tf-idf as a weighting scheme in the vector space model. According to their experiments, SVM was the preferred classifier, and the results indicated that the proven dataset was more effective than the trained, non-proven one [11].
Yang et al. [12] used SVM and conditional random field (CRF) as their methodologies to analyze a corporate web blog. Emotions were examined based on written words, which also indicated the emotional level. For feature selection, they used emotion keywords as features. The research revealed that CRF surpassed SVM in terms of accuracy [12].
Vo et al. [13] concentrated on the earthquakes in Tokyo, Japan and individuals' reactions on social media in the aftermath of these natural disasters, especially the mental aspects. The dataset consisted of Balabantaray et al. [14] focused on the use of Twitter for EA. They employed the SVM algorithm to classify the emotions in users' posts: happiness, sadness, anger, disgust, surprise, and fear. They used the Stanford Penn bank Part-of-Speech Tagger (POS-Tagger), emoticons, and WordNet-Affect emotion lexicons for feature extraction and selection. The SVM showed an accuracy of 73.24% [14].
Nagarsekar et al. [15] performed in-depth analyses. Two different ML algorithms, namely MNB and SVM, were applied to three different collections of data from Twitter. Afterward, the results were examined. For feature extraction, they used the bag-of-features framework. The results revealed that the MNB classifier achieved more accurate results than SVM with an accuracy of 82.73% [15].
Using data from a variety of sources, including blogs, works of fiction, and news headlines, Chaffar et al. [16] identified the following six emotions: anger, disgust, fear, happiness, sadness, and surprise. They compared the performance of three classifiers (DT (J48), NB, and SVM) to identify the best classification model for the EA of text. The feature selection techniques, BOW, and N-grams were the SVMs that outperformed the other indicators [16].
Abdul-Mageed et al. [17] used a deep learning algorithm to identify the 24 fine-grained types of emotions proposed by Robert Plutchik. They compiled a large dataset of English tweets called "EmoNet" and then developed deep learning models using a gated recurrent neural network (GRNN). It is a modern variation of recurrent neural networks used specifically for modeling sequential information. Next, they expanded the classification to Plutchik's eight basic emotions: joy, trust, fear, surprise, sadness, anticipation, anger, and disgust. The results revealed an average accuracy of 87.58% for the 24 types of emotions. In addition, GRNNs using Plutchik's eight basic emotions achieved 95.68% accuracyoutperforming those that used the 24 types of emotion.

Emotional Analysis in Emoticons
Emoticons, short for "emotion icons," are images consisting of symbols, including punctuation marks. They are used in instant messages, emails, and other written forms to express a specific emotion [18]. Hussien et al. [19] focused on the problem of emotion detection using emoticons in Arabic tweets using a supervised approach in which the classifier was first trained using a labeled dataset, and the training dataset was manually annotated. The objective was to propose an automated annotation approach to the training data based on the use of emojis. A dataset of emojis in Arabic tweets was compiled, and four basic emotional categories were applied: joy, anger, disgust, and sadness. The dataset contained 134,194 Arabic tweets that focused on the four categories. SVM and MNB were used as the two ML classifiers for these experiments. They used BOW and tf-idf as feature extraction and selection techniques, respectively. The results demonstrated that the proposed automated annotation approach was superior to the manual labeling approach.
Taking into account these related studies, it is evident that the majority of the recent research has focused on different foreign languages, few of which have utilized Arabic tweet texts for EA using the ML approach. We believe that ours is the first study to analyze emotions using the ML approach with data composed entirely of Arabic -Saudi dialect tweets.

Material and Methods
The proposed system includes five stages: data collection, data preprocessing, feature extraction, feature selection, classification and evaluation. Fig. 1 depicts an overview of the proposed system.

Data Collection
We could access the data (tweets) available on Twitter through the Twitter API, which is how Twitter provides access to data stored on the site. Two datasets were used in this study; both were made up of public Twitter data extracted through the Twitter API. The first dataset was collected with the help of the Saudi Telecom Company (STC) in order to obtain a large number of tweets. To create our second dataset, a pool of tweets was gathered from a hashtag keyword search, utilizing the Twitter Streaming API. A list of words and expressions for each emotional class was aggregated, and the keyword-based search was applied to create a dataset with ready-to-use classes. Also, the data was collected randomly and in Arabic. To retrieve Arabic tweets, "lang:ar" was used. We filtered the tweets by user location to identify Saudi tweets, and various Saudi regions were considered. The collected dataset contained significant noise. For that reason, a data preprocessing step was required.

Data Description
The total number of tweets in the first dataset was 13,425. After the data preprocessing step, a number of tweets from the pool were randomly chosen for each emotion class, to form the dataset. There were 619 tweets in the anger class, 218 tweets in the disgust class, 191 tweets in the fear class, 971 tweets in the joy class, 522 tweets in the sadness class, and 331 tweets in the surprise class.
The dataset categories needed to be balanced to prevent any bias induced by skewed data. The balancing dataset could be obtained using hashtags of emotion-associated words, as well as using Arabic words as keywords to extract tweets. Multiple queries used groups of terms as Arabic keywords, which were used in the search query to collect tweets, as described in Tab. 1. This approach was effective for the fear and disgust classes, as it was difficult to find tweets expressing fear and disgust by sampling randomly. Using this approach, 8,536 tweets were collected. As such, the previous dataset was balanced by adding 159 tweets in the anger class, 621 tweets in the disgust class, 543 tweets in the fear class, 272 tweets in the sadness class and 702 tweets in the surprise class; these results served as a second dataset. The same data preprocessing steps described in Section 4.2 were also applied to these tweets. We combined our collected dataset (keywords dataset) with the previous dataset (collected by STC) to generate our new Saudi tweets corpus, which resulted in a corpus of 5,149 tweets out of 21,961 tweets. The collected tweets were filtered using Saudi Arabia's geo-location system in the period between Dec. 15, 2018 and Mar. 13, 2019.We combined our collected dataset (keywords dataset) with the previous dataset (collected by STC) to generate our new Saudi tweets corpus, which resulted in a corpus of 5,149 tweets out of 21,961 tweets. The collected tweets were filtered by Saudi Arabia's geo-location between the period of 15/Dec/2018 to 13/Mar/2019.
During the labeling stage, retweeted tweets; tweets including URLs; tweets including non-Arabic words; advertisement tweets; tweets including Quranic verses, prayers or prophet sayings; tweets including more than one clear emotion; and tweets including any other dialect (e.g., Egyptian) were all excluded from the dataset. The total number of collected tweets is listed in Tab. 2, which presents the data distribution (classes) and number of tweets in each class.
The words were not checked grammatically, and each group of letters that did not contain any white space characters was considered a word. The generated Saudi tweet corpus conclusively contained a total of 5,149 tweets for the entire dataset, which were manually labeled with the following emotion classes: anger, disgust, fear, joy, sadness and surprise.

Data Preprocessing
The processing of data retrieved from Twitter involved cleaning and preparing the text for the classification process. Data preprocessing involved several steps, arranged as follows: 94 IASC, 2021, vol.29, no.1 Table 1: Arabic keywords used in the search query to collect tweets

Preprocessing on Text
All tweets are limited to 280 characters in length. They may contain considerable noise and information that is irrelevant to the EA tasks, such as URLs, ads, links, email addresses, pictures or other media, and the presence of many words that do not affect the general meaning of the text or sentence. In addition, a Twitter user may mention another user by utilizing the format (@<username>) in a tweet. Retaining such information in the tweets would further complicate the problem of classification. Removing it is part of preprocessing. It is important to polish and process the data before starting the classification, as this streamlines the performance of the classifier and accelerates the classification process. In doing so, the outcome of the analysis will be more accurate. The tweets were processed using the Python programming language for NLP tasks-Python can deal with Arabic texts because it supports UTF-8 Unicode. It features some NLP techniques for analyzing Arabic text using the natural language toolkit (NLTK). The overall process of data preprocessing and cleaning included the following: removing URLs, hashtag symbols (#), Twitter shortcuts, such as <@username>, Retweet (RT), and reply, retweets and duplicate tweets, numbers in the text because they do not affect the direction of emotion, unwanted punctuation and special characters, media and emoticons, non-Arabic words, repeated letters, for example, " ‫ﻫ‬ ‫ﻼ‬ ‫ﺍ‬ ‫ﺍ‬ " would be " ‫ﻫ‬ ‫ﻼ‬ ", Arabic diacritics (Harakat), Arabic stop words, such as " ", dates and months from the tweets.
For all tweets, the previous steps were executed.
Next, additional processing steps were performed on the tweets through NLTK for NLP tasks. These tasks were normalization and tokenization. Normalization means converting a list of words to a more uniform sequence, for example, changing the " ‫ﺁ‬ ، ‫ﺃ‬ ، ‫ﺇ‬ " into ‫"ﺍ"‬ and " ‫ﺃ‬ ‫ﻧ‬ ‫ﺎ‬ " to " ‫ﺍ‬ ‫ﻧ‬ ‫ﺎ‬ " After the normalization step, " ‫ﺍ‬ ‫ﻟ‬ ‫ﻠ‬ ‫ﻪ‬ " would be " ‫ﺍ‬ ‫ﻟ‬ ‫ﻪ‬ ," but we accepted the loss of a small number of tweets removed by this rule, despite the effect their retention would have on the outcome.

Data Annotation Process
The collected dataset or corpus was then manually annotated by human experts. This corpus has to be annotated first by people who had mastered the Saudi dialect before it could be utilized for training the proposed classifiers. The annotators usually added their own comments and notes to facilitate machine learning.
The process of extracting tweets from the datasets (the STC dataset and keywords dataset) was conducted in two phases. First, we completed a manual inspection of the datasets and found that we could not choose tweets randomly. We needed to extract those that included emotions so as not to confuse the annotators. After manually inspecting 21,961 tweets, 5,149 of them were found to be eligible for inclusion in the dataset. In this study, we proposed a six-way classification of emotion: anger, disgust, fear, joy, sadness, and surprise. As such, the same labels were used for annotation. We also included one more label, "none," to be used in situations in which the annotator could not identify the emotion of the tweet.
The annotation process was carried out by two annotators who were Saudi native speakers. To annotate this dataset, the annotators used Ekman's list of emotion classes (anger, disgust, fear, joy, sadness, and surprise) and the "none" term. If the two annotators disagreed on a tweet, it was flagged and a third annotator decided the final emotion. We found that 89% of the tweets were classified by the two annotators in agreement, whereas 11% of them were classified by the third annotator. Furthermore, 41 of the tweets were classified as "none" and were excluded.
After completing the previous tasks, the dataset contained 5,108 tweets. The number of tweets in each emotion class after the annotation process is illustrated in Tab. 3 and by the bar graph in Fig. 2. The Arabic -Saudi tweet corpus was then clean and ready for the next step: feature extraction and selection.

Feature Extraction
An N-gram model is employed for feature extraction [20]. It considers the tokens as sequences of words with fixed lengths that preserve some ordering to represent the text as a vector of features. The purpose of N-grams is to consider tokens as pairs, triplets, or other combinations. Unigram (1-gram) represents tokens (one word), bigram (2-gram) signifies token pairs, and so on. The statistics of the Saudi tweet corpus are illustrated in Tab. 4.
Word frequencies can be calculated in N-grams, and the importance of a word in a document is the term frequency-inverse document frequency (tf-idf). This is a numerical statistic value that describes the importance of a word to a tweet in a corpus (tweets), and it is used as a weighting factor. Its value   increases proportionally to the number of times a word appears in a document. When a term frequently appears in tweets that belong to a certain category, the more likely it is to appear in that category. Term frequency refers to the number of times a term appears in a document, whereas inverse document frequency determines how much information the word conveys. The tf-idf algorithm was calculated to generate the feature scores.

Feature Selection
Calculating word frequency did not impact the emotion classification process. Therefore, we must reduce these features and select only the significant ones to improve the predictive accuracy. The chisquare statistical (X2) technique was employed in our study to identify the most significant features and remove the others. This technique was chosen for its superiority over other algorithms, such as information gain and MI [21]. The chi-square technique was performed with the Python programming language. When the relevant training dataset was fed into the feature selection algorithm, the chi-square (X2) was calculated for each feature. These features were sorted in incremental order based on the chisquare (X2) scores, and the top-ranking k-features were selected (k is the method parameter that indicates how many features to select).

Experimental Setup and Results
The experiments were performed on our Saudi tweet corpus. The aim was to classify tweets into one of the six basic emotion classes: anger, disgust, fear, joy, sadness, and surprise. In the beginning, the classifiers were trained on the labeled dataset; then, the performance of these classifiers was measured and evaluated based on the following metrics: accuracy, precision, recall, and F-measure. The accuracy was prioritized over the other metrics as it was the percentage of tweets that were correctly categorized out of all of the tweets. MNB, SVM, and logistic regression classifiers were employed in our experiments. These classifiers were selected because of their ability to classify texts [22]. Python programming language is used to implement the classification models.
Because the effect of the features in the training model is commonly found in a combination of these features, we started with all of the features and then completed a backward selection. This process allowed us to perform experiments to determine the impact of the features. Each set of features was tested, and the change in the classifier's performance was observed. First, all the features were ranked and only the most significant features were selected. The rest were removed using chi-square (X2). We tested eight sets of features {all features; 800; 1,200; 2,500; 3,600; 5,000; 7,000; 8,500}. Next, the classification process was evaluated and compared using the percentage split technique, which we used to split the data into 70% training and 30% testing. In all of the experiments, the tf-idf algorithm was employed as the weighting factor. Because the feature space was large, the chi-square technique was used to select the most significant features from the feature set.

Multinomial Naïve Bayes (MNB) Classifier Experiments
The accuracy results of the MNB classifier, based on the number features used in the classification process, are presented in Tab. 5-7 for unigrams, bigrams, and trigrams, respectively. It is obvious that using the tf-idf weighting algorithm produced better accuracy results in terms of accuracy than using all distinct words (17,278 features) for all of the unigram cases with the MNB classifier. For this classifier, the best accuracy result was 73.32%, which was obtained with a 3,600-feature unigram. Tab. 8. lists the precision, recall, and F-measure results for this model's best result.

Support Vector Machine (SVM) Classifier Experiments
The accuracy values of the SVM classifier are presented in Tab. 9-11 for unigrams, bigrams, and trigrams, respectively. As displayed in the tables, the best accuracy result of the SVM classifier was achieved with a 8,500-feature unigram. The highest accuracy accomplished was 73.39%. Details of other metrics are listed in Tab. 12.

Logistic Regression Classifier Experiments
The values of the accuracy of the Logistic Regression classifier are displayed in Tabs. 13-15 for unigrams, bigrams, and trigrams respectively. It can be seen that the best accuracy value of 73.39% was obtained with 5,000 and 8,500-feature unigrams. Results of other metrics are shown in Tabs. 16 and 17.

Discussion
It is important to notice that, compared with the previous results, the highest accuracy achieved for each classifier was 73.32% for MNB, 73.39% for SVM, and 73.39% for logistic regression. Fig. 3 presents the accuracy of the three proposed classifiers. As illustrated by the bar graph, the logistic regression and SVM classifiers have similar performance and, in terms of accuracy, outperformed the MNB classifier in most cases. They obtained the best accuracies with the 8,500-feature unigram, tf-idf, and chi-square for SVM classifier, and with the 5,000-and 8,500-feature unigram, tf-idf, and chi-square for the logistic regression classifier. 73.39% is regarded as a good accuracy result for emotion classification with six emotion classes [23].
As demonstrated in Tabs. 7, 11, and 15, the unigram outperformed the other N-gram models in all cases in terms of accuracy. Because tweets are short in length, the N-grams become more unique with an increase in N. Therefore, the performance of all classifiers with bigrams and trigrams declined.
Furthermore, the evaluation metrics (precision, recall, and F-measure) for all of the classifiers are presented in Fig. 4. As illustrated in the figure, SVM and logistic regression have higher precision values  In addition, the feature set that included {800} features led to lower accuracy than when using all of the features. On the other hand, the feature sets that included {1,200; 2,500; 3,600; 5,000; 7,000; and 8,500} increased the classifier performance and enhanced emotion classification in most cases. Fig. 5 shows the highest and lowest performing features for each classifier. It can be observed from Fig. 5 that increasing the number of features did not always lead to an increase in accuracy, as was seen with the MNB classifier. Moreover, the extraction of useful and significant features is a crucial step. The chi-square was proven to obtain the most significant features that best represented the class distinctions. The number of features that should be used depends on the dataset size.
The confusion matrix for the best cases of the SVM and logistic regression classifiers are listed in Tabs. 18-20, respectively. According to the confusion matrix of the SVM classifier in Tab. 18, the highest confusion occurs between the surprise and anger classes; 58 of the surprise class items are labeled as the anger class. There is no confusion between joy and disgust classes, between sadness and fear, or between surprise and fear. Fig. 6 presents more clearly the SVM confusion matrix.
As can be seen from the confusion matrix of Logistic Regression with 5,000 features in Tab. 19, the highest confusion occurs between sadness and joy.78 of the sadness class items were classified as the joy class. It is also be noticed that there is no confusion between fear and surprise, between joy and disgust, or between sadness and fear.
Based on the confusion matrix of Logistic Regression with 8500 features presented in Tab. 20 and Fig. 7, there is no confusion between anger and disgust, between joy and disgust or between sadness and fear. However, 70 of the sadness class items were labeled as the joy class.      The main goal of our study is to develop an efficient and accurate model for analyzing the emotions of Arabic tweets by making it capable of handling the Saudi DA using supervised ML algorithms. In addition, a new dataset of Arabic -Saudi dialect tweets was collected to build our corpus of Saudi dialect tweets. This new dataset is utilized for the emotion classification in this study. Due to the complexity of the Arabic language, a data preprocessing step was required. Different classification methods were employed-MNB, SVM and logistic regression-to classify the Saudi tweets into six emotion classes: anger, disgust, fear, joy, sadness and surprise. Both SVM and logistic regression produced the highest accuracy of 73.39%.
For future work, we would aim to use deep learning with word embeddings since word embeddings can be used as pre-trained vector representations of words and are useful when training any corpus without the need for human annotation. In addition, an enhanced technique for handling repeated characters is needed. Furthermore, we plan to increase the emotion classes and use Plutchik's wheel of emotions [24], which includes eight emotion classes rather than six. Finally, we would aim to increase the dataset size to contain more diverse data from different dialects, as the current dataset only contains Saudi dialect tweets.
Acknowledgement: The authors would like to acknowledge the Researchers Supporting Project Number (RSP-2020/287), King Saud University, Riyadh, Saudi Arabia for their support in this work.
Funding Statement: The authors received no specific funding for this study.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.