Detecting and Analysing Fake Opinions Using Artificial Intelligence Algorithms

In e-commerce and on social media, identifying fake opinions has become a tremendous challenge. Such opinions are widely generated on the internet by fake viewers, also called fraudsters. They write deceptive reviews that purport to reflect actual user experience either to promote some products or to defame others. They also target the reputations of e-businesses. Their aim is to mislead customers to make a wrong purchase decision by selecting undesired products. Such reviewers are often paid by rival e-business companies to compose positive reviews of their products and/or negative reviews of other companies’ products. The main objective of this paper is to detect, analyze and calculate the difference between fake and truthful product reviews. To do this, the methodology has planned to have seven phases: reviewing online products, analyzing features through linguistic enquiry and word count (LIWC), preprocessing the data to clean and normalize them, embedding words (Word2Vec) and analyzing performance using artificial deep-learning algorithms for classifying fake and truthful reviews. Two deep-learning neural network models have been evaluated based on standard Yelp product reviews. These models are bidirectional long-short term memory (BiLSTM) and convolutional neural network (CNN). The results from comparing the performance of the two models showed that the BiLSTM model provided higher accuracy for detecting fake reviews than the CNN model.


Introduction
Fake opinions detection is a subfield of natural language processing (NLP) that aims to analyse deceptive products reviews on e-business platforms. A deceptive products review are fabricated opinions, intentionally written by fraudulent people to seem trustworthy [1]. Consumers and e-companies often use online product reviews for procurement and organizational decisions because they include a wealth of knowledge. This knowledge is also a valuable resource for public opinion; it can affect resolutions over a wide spectrum of everyday and professional pursuits. Positive reviews can lead to significant financial gains, improve a company's reputation, while negative reviews can cause financial loss, and defame an ebusiness [2]. As a result, fraudsters have significant incentives to manipulate an opinion mining system by writing bogus reviews to support or disparage certain products or businesses [3]. These fraudsters are also known as opinion scammers, and their behaviours are referred to as opinion spoofing or opinion spamming. Spam reviews have become a pervasive problem in e-commerce, with numerous high-profile incidents published in the media [4]. Developing intelligent opinion mining system is essential for any ecommerce company to identify an indication of deceptive review and help consumers to avoid bad choices during purchasing process [5]. Electronics products are the most frequently evaluated on online ecommerce websites [6], Such products often involve an important investment, and they are considered precious and valuable items. Therefore, they are frequently researched on e-commerce websites. As noted in [7], the decision to buy electronics product can be depended on the choices that are motivated by internet reviews, determining 24% of products obtained under this category. This is followed by word-ofmouth information as the second most famous source of opinion after using search engines to research these products. Moreover, shoppers commonly choose to buy electronics products because such products are changing continually by introducing smart products and improving existing ones [8]. Consequently, people frequently rely on online reviews to evade making faulty buying decisions [9]. Fake opinions include dishonest or inaccurate information. They are used mainly to misinform consumers so they make wrong purchase decisions, thus affecting the revenues for products. Spam product reviews are three types: Deceitful (fake) reviews, Reviews of a brand only and Non-reviews. 1) Deceitful (fake) reviews of products that have been written purposefully to mislead readers or systems of opinion mining. They include undeserving positive reviews to promote the online trade of specific products and negative reviews to defame worthy products. This type of spam product review is called hyperactive spam products reviews. 2) Reviews of a brand only: these opinions target the manufacturer brands instead of the product itself. 3) Non-reviews, which have two sub-kinds: (a) announcements and (b) unrelated reviews that contain no opinions, such as interrogations, responses or undefined text [10].
In this analysis, we focus on the linguistics of the review and behavioural of the reviewer features to identify, analyse and classify fake and truthful product reviews. The objective of this study is to demonstrate that word embeddings that incorporate artificial deep learning algorithms are an encouraging method for identifying fake opinions about online electronic consumer products. The remaining sections of this paper are Section 2 introduce related work, Section 3 gives details of the framework for the proposed methodology, and Section 4 presents wordcloud for most frequency words in the dataset, Section 5 section explore a comparative analysis of the obtained results and Section 6 gives conclusions and future work.

Related Work
Identifying fake reviews has become a prevalent research area in the last decade. Many researches have proposed different methodologies for identifying fake/spam/deceptive reviews because of their substantial influence on customers and e-commerce businesses. Different methods to do this have been presented, depending on the forms of data being analysed. These have included annotated data with supervised learning techniques (classification), unmarked data with unsupervised learning (clustering) and partially annotated data based on semi-supervised learning. The first research to analyse opinion spam was reported by Jindal et al. [11], who used duplicate or nearly duplicate reviews from the Amazon website to conduct their experimental work. Based on the review, reviewer and product features, they applied a supervised logistic regression algorithm to classify the reviews as fake or trusted reviews, and they attained 78% accuracy.
Ahmed et al. [12] focused on N-grams as features of the contents of reviews and a linear support vector machine (L-SVM) to detect fake reviews. They used a benchmark dataset, which contained fake hotel reviews collected from the TripAdvisor.com website. For feature transformation and representation, they used Term Frequency-Inverse Document frequency (TFIDF) method. Their experimental result was an F1 score of 90%. Ott et al. [13] tried to delineate a general rule for classifying unreliable reviews. In their approach, they used multi-label class cross-domain datasets that contained reviews of 800 hotels assembled from the Amazon Mechanical Turk (AMT) website and 400 deceptive doctor reviews collected from experts' domains. Concerning features, they adopted unigram, LIWC and Part of Speech (POS) in their experiments. For classification, they applied an SVM and the sparse additive generative (SAGE) model, which included an aggregate of the generalized additive topic models. Their results were 78% and 81% accuracy, respectively.
Savage et al. [14] proposed a method for considering and spotting opinion spammers based on the abnormality of the rating of products that was given by specific persons. The authors focused on variations among the deviated ratings and the evaluation ratings of a large number of opinions of honest reviewers. They computed the spamicity and honesty for each reviewer using binomial regression to detect reviewers who had an abnormal attribution of ratings that deviated from public opinion. Narayan et al. [15] applied PU-learning, using several models. In this regard, the authors applied six dissimilar classifiers to discover fake reviews. The classifiers were: decision tree, random forest, naive Bayes, logistic regression, SVM and k-nearest neighbour. They found that the logistic regression model classifier obtained the best performance of the six diverse algorithms. Rosso et al. [16] evaluated classical PUlearning in addition to adjusted PU-learning. Applying a modified PU-learning method, the authors found it feasible to discover fewer samples from the unmarked dataset. In each phase, only unseen negative cases (which are produced by yielding earlier iterations) were recognized, and the classifier was just involved in novel negative samples. Therefore, in every repetition, negative instances were diminished and ultimate instances were accurately classified as fraudulent or legitimate. They adopted Naïve Bayes (NB) and SVM classifiers with N-gram features. Lau et al. [17] proposed a probabilistic language model for detecting fraudulent reviews, adopting an Linear Discriminant Analysis (LDA) technique called the Topic Spam model, which classified reviews by computing the probability of words frequency in spam and non-spam contents.
Based on deep learning, Goswami et al. [18] proposed a model to examine the effects of behavioural associations of reviewers to identify fake online reviews. The dataset they used was from the Yelp website. The data were collected and preprocessed for analysis. Further, they extracted social and behavioural indications of customers and fed them to a backpropagation neural network in order to categorize reviews as trusted or deceptive. They had a detection rate that was 95% accurate. Ren et al. [19] combined two neural network models that were constructed to detect opinion spam using in-domain and cross-domains reviews: the gated recurrent neural network (GRNN) and the convolutional neural network (GRNN-CNN). These datasets were related to doctors, restaurants, and hotel domains with corresponding sizes of 432, 720 and 1280 reviews. By merging all these reviews, they implemented their approach to classify reviews as truthful or fake. Their experimental results attained an accuracy of 83%.
Zeng et al. [20] used an RNN-BiLSTM technique to recognise counterfeit reviews. They split the review contents into three portions: first sentences, middle sentences, and last sentences. They found that the middle sentences had more fake clues than other portions. Their results had a value of 85% accuracy. In [21] analysis, the authors focused on singleton review junk recognition. The authors perceived that the configuration of truthful reviewers' appearance was stable and unconnected to their evaluation pattern, while spammers demonstrated contradictory actions. Those actions could be distinguishable to complete unusually connected temporal indications. However, the single review junk discovery issue was deemed tricky because of irregularly linked pattern recognition. They reached 75.86% and 61.11% accuracy, based on recall and precision, respectively. Alsubari et al. [22] proposed machine learning based models using supervised learning techniques such as , AdaBoost, Random Forest and decision tree. Further, the authors used the Linguistic inquiry and word count (LIWC) tool to extract and calculate deep linguistic features from products reviews to distinguish between truthful and fake reviews. Their models were evaluated based on standard benchmark Yelp products reviews dataset. For selecting important set of features, they used an Information Gain (IG). It was observed that the Ada Boost model provided the best performance for detecting fake reviews, reaching 97% accuracy.

Materials and Methods
This section presents the methodology used in this study. This methodology has seven phases: product reviews dataset, analysing features, preprocessing, word embedding, CNN/BiLSTM, performance metrics and results. The below Fig. 1 depicts the proposed methodology.

Product Reviews Dataset
The demand for e-commerce has grown rapidly in recent years because of the ease of internet access in addition to innovative technologies. Various factors represent the fame of e-commerce businesses as well as their standings. They include credibility, product quality, and customer recommendation systems. Product reviews are opinions about products written by shoppers or customers. They are considered the most important factors for improving the credibility, standards and assessment of an e-commerce store. Using product reviews, owners of e-commerce businesses can detect and identify imperfections of their products and analyse the feelings and satisfaction of consumers [23]. The dataset used in this study comprised benchmark Yelp product reviews that were collected [24]. It comprised 9456 fake electronic product reviews gathered from four American cities: Los Angeles, Miami, New York and San Francisco. This dataset was labelled using a labelling algorithm that is utilized to filter fake reviews on the Yelp.com website [25].

Analyzing the Features of the Dataset
To analyse and find the difference between fake and truthful reviews, all reviews of the used dataset were inserted into the linguistic inquiry and word count (LIWC) dictionary [26], which is mostly used in text mining for natural language processing (NLP) tasks. Each review structure in the dataset had metafeatures such as rating value, review text, reviewer name, product ID and class label. Furthermore, we extracted additional significant linguistic features from the review content using the LIWC dictionary: authenticity of the review text, analytical thinking of the reviewer, word count, negative words, positive words, and negation words. Tab. 1 shows the statistical averages of the features set for 4790 fake reviews and 4666 truthful reviews.

Preprocessing
Before carrying out the transformation and vectorization of the sentences of the reviews, preprocessing steps were used to clean the data and remove noise. The goal of text preprocessing is to convert the texts of the reviews to a form that deep learning algorithms can understand and analyse. The preprocessing steps are as follows: a) Removing punctuation: deleting punctuation marks from the reviews. b) Removing stopwords: This process cleans articles from the text; for example, 'the', 'a', 'an' words are removed from text. c) Stripping useless words and characters from the dataset. d) Word stemming: converting each word of a sentence into its root; for instance, 'undesired' becomes 'desire' e) Tokenizing: splitting whole sentences in the text into separate words, keywords, phrases, and pieces of tokens. f) Padding sequences: using deep learning neural networks to ensure that the input data have equal sequence length. However, we implemented a pre-padding method to add zeros to the beginnings of the vector representation.

Word Embedding's
Word2Vec is a technique that is used for attaining word embedding representation of text [27]. It can obtain and learn the semantics between word representation vectors in textual sentences [28]. This method was created by Toke et al. [29] in 2013 to map words of a text based on their associations and meanings. The Word2Vec has two types of approaches: continuous bag-of-word (CBOW) and skip-gram. By using this technique, the relationship between words is calculated through a cosine similarity function. Furthermore, it map each word of a review text into real-valued vector named word embedding.

Classification
In this section, deep learning neural network models are presented for analysing, detecting and classifying fake reviews based on n-gram features of review text into a fake or truthful review. These are

Bidirectional Long Short-Term Memory (BiLSTM) Model
Recurrent neural networks (RNN) are types of deep leaning techniques that have a repetitive deep hidden nature, and the hidden state is stimulated by the earlier states at a specific time. Hence, RNNs have the proficiency to model the contextual data effectively and process the changeable-length series of input data. LSTM is a form of the RNN structure, and it has recently enhanced the design of RNNs. LSTM solved the issue of vanishing gradient points by substituting the self-linked hidden layers with memory units. The memory units are utilized for storing long-range information of input data when processing [30]. In the case of the LSTM, the process of data handling can take place in only one direction, which means in a forward direction. This disregards backward production, so it reduces the performance of the system. To mitigate this shortcoming, the input data are handled in bidirectional recurrent neural networks by concatenating the production of forward and backward directions. Fig. 2 describes the structure of the BiLSTM model for text classification to detect fake reviews. In this model, the sentences process of embedding sentences' words was performed based on the Word2Vect method that is explained in the previous section.
Every LSTM unit has four gates: input i t , forget f t , cell state c t and output gate o t which are formulated as follows [31]. Figure 2: Structure of the bidirectional long short term memory model where h t states the production of the LSTM cell, H t is the output of bidirectional concatenation of h t ! forward and h t backward LSTM layers at the current time state t.

The CNN Model
Convolutional neural networks (CNNs) are a kind of artificial forward neural network broadly applied for composing a semantic pattern [32], apprehending n-gram information in an automated manner. In this model, an embedding layer-based Word2Vec technique is organized through three parameters: vocabulary size, length of input sequence and embedding dimension. The vocabulary size signifies the most frequent words in the dataset, and it is set to 40,000 words. Word embeddings gives the dimensions vectors for each word of a review. A maximum input sequence length is the maximum length of the review text. The convolution layer leads an operation that uses filters on the input embeddings matrix E(w) 2 R V ÂD , where V is vocabulary size, and D indicates the word dimensions. We set 100 filters with a window size of 3. The max pooling layer takes the input sequences from the convolutional layer, implements a downsampling process in addition to reducing spatial dimensionality for the given sequences, and selects a maximum value of feature sequences from the pool of each filter kernel. The CNN model has been used to learn word embeddings of sentences and forward their real-valued vectors to the classification layer. This is a sigmoid activation function layer applied to calculate the probability that each input sequence of the review text would be classified as fake or truthful. Fig. 3 illustrates the architecture of the CNN model.

Experimental Results
We performed several experiments on the standard Yelp dataset, which includes 9456 electronic product reviews. The dataset was divided into 70% as a training set, 10% as a validating set, and 20% as the testing set. This subsection presents the experimental results of the CNN-and BiLSTM-based deep learning models for detecting fake reviews using learning word embedding representations. Both models have been trained and validated five times with a batch size of 300 samples of review texts per time step. By comparing the performance of the CNN and BiLSTM techniques, the results showed that the BiLSTM model was more accurate than the CNN model, while the CNN model required less data processing time. Fig. 6 displays the classification results for both models.

Metrics for Model Performance
To evaluate the performance of the proposed CNN and BiLSTM models for detecting and predicting fake and truthful reviews, various performance metrics can be calculated from the confusion matrixes based on the rates of false-positive and false-negative items. Assessment metrics like specificity, precision, recall, F1-score, and accuracy were computed from two confusion matrixes, as shown in Figs. 4 and 5.
where TP designates the number of reviews that were properly classified as truthful reviews, FP indicates the number of reviews that were incorrectly identified as fake reviews, TN represents the total number of reviews that were successfully classified as fake reviews, and FN is the number of reviews properly classified as truthful reviews.
In Fig. 7a shows Y-axis accuracy of a BiLSTM model to fake reviews detection and X-axis represents the epochs that are number of iterations of training and testing the model on the used dataset whereas in the Fig. 7b, illustrates Y-axis the training model loss and X-axis gives the number of epochs.
In Fig. 8c shows Y-axis accuracy of a CNN model to fake reviews detection and X-axis represents the epochs that are number of iterations of training and testing the model on the used dataset whereas in the Fig. 8b shows displays Y-axis illustrates the training model loss and X-axis gives the number of epochs.

Word Cloud
A wordCloud is technique used widely to visualize a textual data and allows the researchers to observe in a single scan the words that are having the maximum occurrence in a given body of text. Fig. 9 visualizes the most repeated words in the used dataset.

Comparative Analysis
This section presents comparative analysis between the results obtained by the proposed deep learning based models and machine learning based model presented. Furthermore, it provides comparing between transformation methods adopted to map the words of text into numerical form are introduced. In case of TF-IDF, the word in the review text is converted into single column vector whereas in Word2Vec method, it is transformed into N-dimensional vectors. Tab. 2 compares the proposed models with some existing work based on an accuracy metric and the same dataset used in this study.

Conclusions
Fake reviews are deceptive opinions posted by fraudulent reviewers on the websites of e-business companies to mislead customers. The aim is to have the customers make wrong purchases by selecting an inferior product. In this paper, we used standard Yelp product reviews to conduct empirical tests of the performance of two neural network models, CNN and BiLSTM, for identifying fake opinions. Based on the learning word embeddings of the text of reviews, the proposed models were used to classify those product reviews as fake or truthful. By comparing the results obtained from several experiments, we found that the BiLSTM model provides higher performance than the CNN model. In text classification tasks, neural networks can appropriately catch global semantic information over sentence vectors. As a result, deep learning-based models outperform the baseline models in terms of accuracy. The experimental results also showed that the CNN model was superior to the BiLSTM model in data processing time. In future work, we will attempt to develop a hybrid neural networks-based model using behavioural and linguistic features for detecting fake reviews.