Nowadays, as the number of textual data is exponentially increasing, sentiment analysis has become one of the most significant tasks in natural language processing (NLP) with increasing attention. Traditional Chinese sentiment analysis algorithms cannot make full use of the order information in context and are inefficient in sentiment inference. In this paper, we systematically reviewed the classic and representative works in sentiment analysis and proposed a simple but efficient optimization. First of all, FastText was trained to get the basic classification model, which can generate pre-trained word vectors as a by-product. Secondly, Bidirectional Long Short-Term Memory Network (Bi-LSTM) utilizes the generated word vectors for training and then merges with FastText to make comprehensive sentiment analysis. By combining FastText and Bi-LSTM, we have developed a new fast sentiment analysis, called FAST-BiLSTM, which consistently achieves a balance between performance and speed. In particular, experimental results based on the real datasets demonstrate that our algorithm can effectively judge sentiments of users’ comments, and is superior to the traditional algorithm in time efficiency, accuracy, recall and F1 criteria.
In recent years, sentiment analysis has received much more attention from both academia and industry than ever. Sentiment analysis is a common application of NLP, which uses sentiment vocabulary to determine the sentiment category. A multitude of sentiment vocabulary analysis methods have been proposed in the past decades. As an instance, according to the emotional attributes of words, Turny [
However, the features of sentiment words are largely neglected. Furthermore, Yang et al. [
A multitude of researchers used supervised learning to exploit contextual knowledge regularization for sentiment analysis. Wu et al. [
Inspired by recent advance of deep neural networks, Vaswani et al. [
Most of the current researches tend to exploit the sentiment relationship between vocabulary, and do not make full use of the semantic relationship in context. Moreover, when convert the text into a word vector, a time-consuming word vector algorithm is required, which has poor time efficiency. Towards this end, we introduce a new fast sentiment classification method called FAST-BiLSTM. First of all, FastText [
FastText is a supervised word vector algorithm, its model structure is consistent with the Continuous Bag-of-Words (CBOW) model. The CBOW model structure is shown in
There are two main differences between FastText and CBOW model. First of all, the input data and prediction target are different. CBOW inputs the sum or average of the vectors which have removed the target word. While FastText turns bag-of-words into bag-of-features to use more order information. That is, the input
The FastText model structure is shown in
The objective function of FastText is as shown below:
where
The traditional word vector tools like word2vec treat each word as an atom, ignoring the morphological features inside the word. Such as “apple” and “apples”, the pair words have a host of shared characters, that is, they have similar internal morphology. In traditional word2vec models, the internal morphological information of words is lost because they are converted into different ids. FastText solved this problem by utilizing character-level n-grams to represent a word. For the word “apple”, assuming the value of
In our FAST-BiLSTM model, Bi-LSTM network is used as one of the fusion models. The unidirectional LSTM infers the later part from the previous information, but sometimes only focusing at the previous words cannot make full use of the global information. The bidirectional LSTM network is designed for this case, it is a kind of time series Recurrent Neural Network (RNN), which is specially designed to solve the long-term dependence problem of general RNN. The key to solve the problem of gradient dissipation is that LSTM introduces the cell state in the time series and the structure of the “gate” between cells. State refers to the carrier of information circulation in the neural network, which allows the information in the data to be conveyed to the next cell. LSTM has three types of gate structures: Forget gate, input gate and output gate [
In order to demonstrate the operation mechanism of LSTM vividly and concisely, the definition of symbols concerned with
Forget gate is in charge of making decisions on retaining or discarding information. This process needs to take advantage of the property that the output of the sigmoid function is between 0 and 1. The specific process is that the current cell passes the input of the current state and the hidden information of the previous cell to the sigmoid function. The closer the output value of sigmoid function is to 0, the more likely it should be discarded. The information transmission in forget gate can be expressed by
Input gate is in charge of updating the cell state. This process also requires the property of the sigmoid function, but the input is the hidden information of the previous layer and the current input information. The output value determines which information needs to be updated. Analogously, the closer the output value is to 1, the more crucial it is and the more it needs to be updated. The information transmission in input gate can be expressed by
Output gate is in charge of determining the next hidden state which refers to the previous input information. What the sigmoid function receives at this moment is the previous hidden state information and the current input information. The new cell state information will be passed to the
Although LSTM solved the long-term dependency problem, it cannot use the later information of the text. Bi-LSTM considers the global information in context. Its implementation principle is: the word vectors generated by the embedding layer are used as the input of two LSTM networks with opposite timing. The forward LSTM can obtain the forward information of the input sequence, and the backward LSTM can obtain the backward information. The following information of the sequence is then calculated by vector splicing to obtain the final hidden layer representation. In this way, the hidden layer of each word in the text sequence contains complete context information.
The fast text sentiment analysis algorithm based on double-model fusion is shown in Data cleaning, invalid or empty data will be removed in this step. Word segmentation tool is used to segment the text. Here, we use the Harbin Institute of Technology’s LTP word segmentation tool. Input the data into the FastText model to train the binary classification model and generate word vectors at the same time. Load the word vector in Step 3 and use the Bi-LSTM model to train a binary classification model. Combine the FastText model in Step 3 with the Bi-LSTM model in Step 4 for stacking model fusion.
For machine learning and deep learning, the performance of a single model is often worse than the model fusion. In proposed algorithm, the stacking model fusion is used to combine the Bi-LSTM and FastText. The training and prediction processes are shown in
The whole processes of model fusion are as follows: Vector representations of text training data are divided into five parts. Use the Bi-LSTM model and the FastText model to train four of all parts, then predict the training set that are not used for training and test set. Next, change the selected training sets for training and the setting used for verification, and repeat Step 2 until the prediction results of complete training sets are obtained. For the five combinations, perform Step 2 respectively to obtain five models and the corresponding cross-validation prediction results, namely P1, P2, P3, P4 and P5. Use five models to predict the corresponding testing set respectively, and the prediction results of the testing set are obtained as T1, T2, T3, T4 and T5. Take P1~5 and T1~5 as the training sets and testing sets of the next layer.
The two datasets used in the experiment are all crawled from real Internet e-commerce data, one is from the online hotel reservation reviews, and the other is from takeaway platform reviews. The description of the datasets is shown in
Dataset | Source | Number of texts | Positive comments | Negative comments |
---|---|---|---|---|
1 | Hotel Reviews | 10000 | 7000 | 3000 |
2 | Takeaway Reviews | 11988 | 4000 | 7988 |
When evaluating the results of the classification, we use three common evaluation indicators, they are accuracy, recall and F1 score. Firstly, the following definitions are given: True Positive (TP), which is a positive sample predicted by the model as positive; False Positive (FP), which is a negative sample predicted by the model to be positive; False Negative (FN), which is the positive sample that is predicted to be negative by the model; True Negative (TN), which is the negative sample that is predicted to be negative by the model. Then, accuracy, recall and F1 score can be calculated as follows:
In this experiment, five commonly used classification algorithms are selected for comparison with FAST-BiLSTM. These five classification algorithms are described in detail as follows: Bayes, an algorithm that uses knowledge of probability and statistics for classification, uses Bayes’ theorem to predict the probability of a sample belongs to each category, and selects the most likely category as the final result. Random Forest (RF), uses multiple related decision trees to randomly build a forest classifier. While a new sample is inputted, it will be classified by every decision tree, to determine which category the sample belongs to, and then classifies it to the category which selected the most. K-Nearest Neighbors (KNN), according to the unlabeled data, compares each feature of the new data with the corresponding feature of the data in the sample set, and then the algorithm extracts the closest data classification label in the sample set. Select the first K most similar data in the sample dataset, and choose the category that appears most among the K most similar data as the category of the new data [ LSTM, a type of temporal neural network, the model will not only pay attention to the current time information all the time, but also pay attention to what information can be used for the current time in the previous processing, and does classification according to the text semantic information. CNN-LSTM, uses the LSTM network to extract the key semantic information of the sentence, distinguishes the text according to the extracted semantics; then uses CNN to extract the features of the text and does classification. This experiment combines them, first extracts the key semantics of the text, and then extracts the key features of the semantics to classify the text.
To verify the effectiveness of our algorithm, experiments were carried out on the Windows platform with following environment: Intel(R)-Core (TM) i7-8750U 2.20 GHz CPU, GTX 1660 GPU and 24 GB memory. Then, keras framework was used to build the FastText and Bi-LSTM model. The main hyperparameters of the model are shown in
Parameter | Setting |
---|---|
MinCount in FastText | 5 |
Dimension of word vector in FastText | 64 |
Number of iterations in FastText | 70 |
Number of Neurons in Bi-LSTM | 100 |
Learning rate | 0.05 |
Optimizer | Adam |
Window size | 5 |
In the tuning process of FAST-BiLSTM algorithm, the parameters of the model were adjusted. Among them, there are four main parameters that are vital to the model: minCount, dimension of word vector, number of iterations in FastText and number of neurons in Bi-LSTM. After carefully adjusting, the influences of these hyperparameters are shown in
The influence of minCount in FastText is shown in
The influence of dimension of word vector in FastText word vector is shown in
The influence of number of iterations in FastText model is shown in
In order to make a better comparison between FAST-BiLSTM and other algorithms, 5-fold cross-validation method is used to test and evaluate. For data partition, the data are randomly shuffled and selected for experiment.
Model | Naive Bayes | Random Forest | KNN | LSTM | CNN-LSTM | FAST-BiLSTM |
---|---|---|---|---|---|---|
Accuracy | 0.876 | 0.883 | 0.862 | 0.902 | 0.914 | 0.937 |
Recall | 0.915 | 0.918 | 0.902 | 0.934 | 0.935 | 0.952 |
F1-Score | 0.845 | 0.847 | 0.833 | 0.851 | 0.855 | 0.874 |
Model | Naive Bayes | Random Forest | KNN | LSTM | CNN-LSTM | FAST-BiLSTM |
---|---|---|---|---|---|---|
Accuracy | 0.927 | 0.936 | 0.921 | 0.949 | 0.956 | 0.978 |
Recall | 0.884 | 0.905 | 0.876 | 0.914 | 0.925 | 0.948 |
F1-Score | 0.876 | 0.879 | 0.864 | 0.884 | 0.891 | 0.917 |
In this paper, we proposed a fast sentiment analysis algorithm, called FAST-BiLSTM, to solve the problems on sentiment analysis. Our algorithm is carried out by fusing the FastText and Bi-LSTM models. First of all, FastText has a fast speed for linear fitting and can generate pre-trained word vectors as a by-product. Secondly, Bi-LSTM uses the generated word vectors for training and then merges with FastText to make comprehensive sentiment analysis. Compared with the traditional word2vec methods, it has an obvious advantage in time efficiency. In order to demonstrate the performance of FAST-BiLSTM, we compare our algorithm with five commonly used algorithms on two datasets in different fields. The results manifest that the time efficiency of the algorithm is improved more than 30%, and FAST-BiLSTM can sufficiently extract contextual semantic information from texts, it is superior to other algorithms in accuracy, recall and F1 score. Moreover, our experimental results indicate that FAST-BiLSTM has achieved considerable performance in text sentiment analysis tasks at a low computational cost, and it is invaluable for practical application.
Despite the overall performance of FAST-BiLSTM is good and the time efficiency is improved especially, the classification performance can further be improved greatly. The subsequent research will focus on how to improve the classification performance. On the other hand, it will also focus on applying the algorithm to a wider range of fields, such as students’ sentiment analysis, online public opinion analysis, etc.