Current works on spam detection in product reviews tend to ignore the temporal relevance among reviews in the user or product entity, resulting in poor detection performance. To address this issue, the present paper proposes a spam detection method that jointly learns comprehensive temporal features from both behavioral and text features in user and product entities. We first extract the behavioral features of a single review, then employ a convolutional neural network (CNN) to learn the text features of this review. We next combine the behavioral features with the text features of each review and train a Long-Short-Term Memory (LSTM) model to learn the temporal features of every review in the user and product entities. Finally, we train a classifier using all of the learned temporal features in order to predict whether a particular review is spam. Experimental results demonstrate that the proposed method can effectively extract the temporal features from historical activities, and can further jointly analyze the activity trajectories from multiple entities. Thus, the proposed method significantly improves the spam detection accuracy.
Due to the development of mobile communication technology and the widespread use of smartphones, it is now normal for people to share their reviews of products or the services they have purchased. As a result, many customers therefore consider the user ratings of a product before making a purchasing decision. For merchants, the quality of online evaluation is consequently closely related with their profit level [
With this in mind, researchers have conducted extensive studies in the challenging field of spam detection. Many of them take advantage of the machine learning’s effectiveness, which have been widely used in dealing with optimization problems [
Previous studies have shown that many fake reviews cannot even be identified by humans due to the highly imperceptible nature of fake reviews. Thus, the use of text features does not perform well on real data; however, the behaviors of the spammers contain a large number of clues that reveal suspicious patterns. As a result, many researchers have explored the extraction of behavioral features from the historical traces left by the reviewers. These traces include registration time, posting a review, giving a score for a product, etc. From these traces, certain behavioral features can be extracted: these include the maximum number of reviews posted by the reviewer in a day, the proportion of positive reviews among all reviews made by the user, the rank of these reviews among all reviews of the product, the average product score, etc. The work in [
Unfortunately, behavioral features extraction is more reliant on expert knowledge, and is not always supported by rich trace information. The use of behavioral features can thus only improve spam detection performance to a certain extent. Therefore, many researchers choose to employ both text features and behavioral features when performing spam detection tasks, and consequently achieve better results. For example, the work in [
The aforementioned works, using either text features or behavioral features, tackle the spam detection problem as a binary (i.e., either fake review or normal review) classification problem. A classifier is trained on features extracted from current published reviews in order to predict whether a new review is fake or normal. However, most of these works do not consider the temporal features of the reviews; these are features related to the time interval between the related reviews of the same entity. Here, ‘related reviews’ are reviews posted by the same user, or for the same product, within a certain period of time. If the opinion in these related reviews changes abnormally, these reviews may contain fake opinions. Accordingly, the leveraging of temporal features may significantly contribute to spam detection. Moreover, existing studies also extract features from the relationships between users and reviews, or between products and reviews, independently. It would be preferable to adopt a holistic approach that considers the relationship among users, reviews and products to extract behavioral and text features.
Accordingly, in this paper, we propose a multi-entity temporal feature-based spam detection model (MTFSD). After obtaining the text and behavioral features of reviews, MTFSD uses LSTM to automatically learn temporal features for each review by considering the latest reviews for the user entity or product entity. Finally, MTFSD merges these learned features to mine the relationships between different entities so as to distinguish between normal and fake reviews. Experimental results show that MTFSD achieves better spam detection performance compares with other methods.
In summary, the main contributions of this paper include the following: The proposed method effectively learns the dynamic temporal features via LSTM. These temporal features can effectively capture the relevance among the related reviews and the internal characteristics of a review. The proposed method provides an effective way to learn comprehensive fusion features from the perspective of multiple entities in order to detect fake reviews. These comprehensive fusion features enable significantly better fake review detection performance.
With the development of the internet technology and social networks, network and information security have attracted widespread attention [
The study of text features can be divided into two categories: Those based on grammatical analysis and those based on semantic analysis. From the grammatical analysis perspective, text features are primarily extracted from the word frequency statistics of words or phrases, which are typically processed using a bag-of-word (BOW) model or an N-gram model. The BOW model regards the content of each review as a ‘bag’ and assumes that each word in the bag is independent, while the word order, grammar and syntax in the text are ignored; the reviews are then classified according to the words contained in the bag. For its part, N-gram uses a window of size n to slide on the text content, forming a word fragment sequence of length n. Each of these word segments is called a ‘gram’. Subsequently, the frequencies of all grams are counted and filtered in order to obtain the vector feature space of the text. Unigram, bigram, and trigram methods are commonly used in bag-of-words features [
To overcome the disadvantages associated with BOW features, subsequent works have focused on extracting text features from the part-of-speech feature analysis, which is achieved by tagging the part-of-speech of the text and counting the frequency of occurrence. While various part-of-speech features were employed in some early works [
Not only in spam detection tasks, but also in most other text classification tasks, grammatical analysis is a simple and powerful tool; however, it also relies heavily on expert knowledge. Moreover, the performance of grammatical analysis is largely dependent on the datasets being employed, whose semantics cannot be intuitively understood with the aid of grammatical analysis. The work in [
Behavioral information can also provide clues for use in identifying review spam. Through data screening, the work in [
Most of the above works analyze users (who write reviews), reviews, and products separately. The work in [
However, not all reviews contain rich behavioral information; in fact, this phenomenon is relatively common in the datasets. Therefore, the fusion of text features and behavioral features to detect fake reviews has become the trend among most researchers. For example, the work in [
In this subsection, we first define three important entities in a review system: namely, the review, user and product.
Examining the definitions of these three entities, it can be seen that an individual review will be associated with these three entities from different perspectives. In the user and product entity, a review will have strong relevance to other reviews posted within the same time period. Regarding the review entity, the text and behavior of a review will contain clues that indicate whether or not the review is fake. However, reviews in the review entity are independent of each other while their relevance is commonly ignored. Regarding the user and product entities, the relevance of reviews and the behavior of users and products may reflect spam patterns, which are able to be captured by temporal features. Therefore, extracting temporal features of both text and behavior from the user and product entity is a subject we deem worthy of significant attention. Here, we use the LSTM model to automatically learn deeper temporal features from the extracted CNN-based text features and hand-crafted behavioral features of reviews in the user and product entities. Finally, we integrate the features derived from different entities and input these features into a classifier in order to determine whether or not the reviews are fake.
The framework of the proposed method is outlined in
It is widely understood that users mostly write reviews subjectively; therefore, reviews are affected not only by the product itself, but also by the users’ opinions and emotions. To effectively learn the temporal features of a review in the user and product entity, other reviews in the same entity should be employed in order to capture their relevance with the learned review.
In fact, for the review in the review entity to be detected, we opt to consider only the most recent reviews made by the same user. Examining a user’s most recent reviews rather than all the reviews can not only facilitate analysis of the user’s recent status, but also avoids any influence being exerted from the user's status too long ago on the current review.
On the other hand, reviews for the same product in a given period always tend to cluster around a similar type of opinion (positive or negative). When a product is attacked by a spammer, the reviews it receives during the period of attack will fluctuate significantly. Therefore, the proposed method takes the latest several reviews into account in order to learn the temporal features of a review in the product entity.
As to the analysis above, the proposed method initially organizes the reviews in the user and product entity according to the time at which they were posted. As a result, each review within the same entity can be assigned a definite location.
In the user entity, the latest
In the product entity, moreover, the latest
For convenience, we add
For each single review in the review, user, and product entity, we next perform feature extraction from two aspects (text and behavior) to facilitate the subsequent operations.
In previous studies, behavioral features have been considered more indicative than text features of whether a review was real or fake. With the help of previous works and expert knowledge, the proposed method extracts the behavioral features of a single review from the three entities; this process is described in more detail in the next subsection, Behavioral Feature Extraction.
Existing studies have demonstrated that CNN can be effectively applied to text classification tasks [
To comprehensively represent a single review as a vector for use in distinguishing fake reviews from normal ones, it is necessary to integrate all the extracted features for each review. Different feature types reflect different aspects of review characteristics, as outlined in [
Generally speaking, a review includes the review text along with other attributes such as author, product, rating, etc. Thus, the feature representation of a given review should include not only the text features extracted from the review text, but also the behavioral features extracted from other attributes.
Accordingly, we concatenate the CNN-based text features and behavioral features of each review to form the joint feature representation used for subsequent operations.
The reviews in the user entity are often written for different products. Their statistical characteristics always reflect changes in the user’s personal sentiment and writing style. By contrast, those in the product entity are mostly written by different users, and are closely related to the real-time status of the product.
In the user and product entity, each review will most likely be influenced by the previous reviews in its user-related review set and product-related review set, which can be regarded as the states of these entities at different times. In addition, the closer together the posting/obtaining times for reviews in the same entity, the more relevant they are to each other. As existing researches [
By considering the feature vector of each review as a moment on the time axis, each entity is represented can be a time series comprising several moments. Thus, with the help of LSTM, the temporal feature representation of a review in entities at a deeper level can be automatically learned from the fused features of its user-related and product-related reviews.
Through Feature Extraction, the review to be detected in the review entity is represented as a joint feature vector made up of its behavioral feature and text feature. Moreover, after the Temporal Feature Learning step is complete, the review to be detected is represented as two learned temporal feature vectors (for both the user and the product entity). Finally, the three feature vectors of reviews from different entities are merged and input into a classifier, which facilitates determining whether or not it is a fake review.
Regarding the attributes of a review in the review entity, researchers have analyzed a large body of data and consequently identified many clues that could indicate the presence of a fake reviews. For example, the first review on a product usually attracts the most attention, as people tend to pay attention to the top reviews with the highest ratings. Accordingly, spammers often try to ensure that their reviews are placed as high on the list as possible. Thus, how high a given review is on the list can represent a clue as to whether this review is fake.
Five behavioral features obtained from the attributes of reviews in the review entity are employed in the proposed method: namely, the order of the review (Rank), the absolute value of the score deviation rate (RD), the extremeness of the score (EXT), the score deviation rate with threshold (DEV), and whether the review is a singleton (ISR). Further details are presented in
Features | Meaning |
---|---|
Rank | The order of the review [ |
RD | The absolute value of score deviation rate [ |
EXT | Extremes of score. 1, if the score is 4,5; 0 otherwise. |
DEV | Score deviation rate with threshold of |
ISR | If the user posts only one review, |
In addition to the behavioral information contained in reviews, spammers also tend to leave some traces of their corresponding behavior trajectories when they post fake reviews to attack other products. For example, according to the statistics in [
The following six features based on user behaviors listed in
Features | Meaning |
---|---|
uMNR | Maximum number of reviews that a user posted within a day [ |
uPR | The ratio of positive reviews (4–5 star) in all reviews posted by this user [ |
uNR | The ratio of negative reviews (1–2 star) in all reviews posted by this user [ |
uERD | Distribution entropy of user evaluation scores [ |
uavgRD | Average deviation rate [ |
uBST | Burstiness [ |
Features | Meaning |
---|---|
pMNR | Maximum number of reviews that a product received within a day [ |
pPR | The ratio of positive reviews (4–5 star) in all of the product’s reviews [ |
pNR | Ratio of negative reviews (1–2 star) in all of the product’s reviews [ |
pavgRD | Average deviation rate [ |
pERD | Distribution entropy of the average evaluation score obtained [ |
In summary, for each review
The acquisition of text features depends on the text content of the review itself. For each review, the proposed method uses CNN to learn the features of the global semantic information from the review’s text content. This learning process is illustrated in
Suppose that the review
where
Each convolution kernel with a fixed
where
Most reviews written by spammers are based on templates or existing reviews and the slightly modified. These highly similar reviews are posted continuously. By contrast, in the case of non-spammers, reviews are usually written with reference to the features of a specific product, and the differences among these normal reviews tend to be large. When a merchant hires spammers to either post positive reviews for their own products or to leave malicious fake reviews for competitors’ products, the spammers will leave traces in the process that provide clues for the detection of fake reviews. This paper analyzes the historical traces of reviews from two aspects, namely users and products, and automatically learns the temporal features from reviews in the user and product entity through LSTM.
By performing feature extraction and feature fusion, the extracted text feature vector
Similarly, for each review in the user-related review set
Consequently,
where
The
here, the input gate
moreover, the current neuron state
additionally, the output gate
finally, the neuron outputs
where
As a result, by employing LSTM and the input of
Through the above operations, four types of features—namely the text features
Finally, a classification model is constructed using softmax for
where
To verify the effectiveness of the proposed method, we select the Yelp dataset for our experiments. The Yelp dataset is a publicly available commercial website dataset that offers a good balance between commercial authenticity and ground truth, and has thus been widely used in the works of many predecessors. In this paper, we focus on hotels in the Yelp dataset for our experiments.
The Yelp Hotel dataset contains 6,883,290 reviews for 3,680,118 hotels from 66,599 reviewers. Of these, 5,679 reviews are labeled. Among the labeled reviews, 803 are fake reviews, while 4,876 are normal. Each review contains written time (date), content (reviewContent), the ID of the reviewer (reviewerID), the rating provided by the reviewer (rating), the hotel ID (hotelID), and the label (flagged), along with some other attributes. Moreover, the dataset records the username, registered address, registration time, and the number of reviews posted by each user. Relevant information about the hotel (such as its name, registration date, registered address, price, telephone, and other information) is also recorded.
We mainly use the attribute date, reviewContent, reviewerID, rating, hotelID, and flagged elements of a given review to extract features. Here, date and rating are employed to extract the behavior information of a single review. Moreover, the combination of reviewerID and date can be used to construct the reviewer’s historical behavior trajectory, which in turn serves as the basis for extracting the temporal feature from the user entity. Similarly, hotelID associates reviews to different hotels and also incorporates the attribute date, making it possible to extract temporal features from the product entity.
We use Precision (P), Recall (R), F1-Score (F1) and Accuracy (A) as the metrics for evaluating the spam detection performance. These are defined as follows:
Here,
We use a pre-trained word2vec model, which is the model introduced in [
In the process of learning various features, the number of convolution filters is set to 100, moreover, the number of feature dimensions finally output by the convolutional layer
We developed the proposed method by using the TensorFlow framework with three important libraries Numpy 1.18.5, Keras2.3.1 and Tensorflow-gpu1.14.0 in the Python programming. The implemented model is trained on a computer with windows operating system. Moreover, the computer has 32G memory, RTX2080 Super GPU and Intel Core i7-9700k Processor.
To validate the effectiveness of the proposed MTFSD, we compare the performance of four similar fake review detection methods with that of our method.
SPEAGLE+, proposed by Rayana et al. [
MK, proposed by Mukherjee et al. [
W_BF+Bigram, proposed by Wang et al. [
Method | P | R | F1 | A |
---|---|---|---|---|
SPEAGLE |
26.5% | 56.0% | 36.0% | 80.4% |
MK_BF | 41.4% | 74.6% | 55.6% | 82.4% |
MK_BF + Bigram | 46.5% | 82.5% | 59.4% | 84.9% |
W_BF + Bigram | 48.2% | 61.5% | 85.9% | |
MTFSD | 58.3% |
In addition to the above comparison experiments, we conduct further experiments to validate the impact of the different types of features we employ, particularly the temporal features and multi-entity fused features.
We first construct a spam detection model Te + Be, which uses only
Secondly, an MU detection model is constructed by combining
Compared to MU model, the proposed MTFSD uses LSTM on
The spam detection results of the three models described above are listed in
Method | P | R | F1 | A |
---|---|---|---|---|
Te+Be | 91.7% | 27.8% | 42.7% | 89.9% |
MU | 64.3% | 39.7% | 49.1% | 90.5% |
MTFSD | 70.0% | 58.3% | 63.6% | 91.8% |
In this paper, an LSTM-based spam detection model is proposed that can effectively extract the temporal features of different entities and conduct fusion analysis of these features. The model obtains the temporal embedding representation of multiple entities by learning the correlation features from the perspective of users and products based on posting time, then uses a classifier to complete the spam detection task. Experimental results demonstrate that out proposed method effectively improves the accuracy of spam detection. At present, the extraction of behavioral features relies solely on expert knowledge; this suggests that it would be fruitful to apply machine learning techniques in order to automate the feature extraction process in future work.