With the vigorous development of the Internet industry and the iterative updating of web service technologies, there are increasing web services with the same or similar functions in the ocean of platforms on the Internet. The issue of selecting the most reliable web service for users has received considerable critical attention. Aiming to solve this task, we propose a service reliability inference method based on deep neural network (SRI-XDFM) in this article. First, according to the pattern of the raw data in our scenario, we improve the performance of embedding by extracting self-correlated information with the help of character encoding and a CNN. Second, the original sum pooling method in xDeepFM is improved with an adaptive pooling method for reducing the information loss of the pooling operations when learning linear information. Finally, an inter-attention mechanism is applied in the DNN to learn the relationship between the user and the service data when learning nonlinear information. Experiments that were conducted on a public real-world web service data set confirm the effectiveness and superiority of the SRI-XDFM.
With the rapid development of the Internet, the number of web services has exploded. Many of these massive services provide similar or identical functions [
As a method that can recommend specific products to specific users, the recommendation system can be applied to the service reliability inference problem. The xDeepFM [
As a general model, xDeepFM has the following disadvantages in the service reliability inference scenario. Firstly, the embedding layer of the xDeepFM model consists of a one-hot coding part and a fully-connected layer, which is called the random initialization method [
In order to solve above disadvantages of the xDeepFM model in our certain scenario, this paper proposes a service reliability inference method based on the xDeepFM model and it is summarized as follows.
According to the pattern of raw data, the embedding layer is improved. Self-correlated information such as the IPs and URLs are extracted from all the fields. Afterwards, a character encoding method is applied to replace the original one-hot encoding method. In addition, the encoded fields are put into a CNN to preserve the distance between elements.
The CIN model is improved using a favorable pooling method. This paper replaces the original sum pooling method with the proposed adaptive pooling method and proposes an A-CIN network. Thus, the pooling operation reserves more information with the same shape, which improves the performance of the explicit knowledge learning.
An inter-attention method is proposed to improve the performance of the DNN. This paper introduces an attention mechanism into the DNN model and proposes an A-DNN network. This method performs an interactive operation between the user and service features, which can concentrate the model’s attention on the relevant fields, and improves the model’s ability to learn implicit information.
The rest of this article is organized as follows. Section 2 reviews some related works. Section 3 describes our service reliability inference method, which is named SRI-XDFM. The simulation results and corresponding discussions are presented in Section 4. Finally, Section 5 summarizes this paper.
In the recommendation system, feature engineering plays a vital role [
1) The labor cost is high;
2) A large number of sparse features easily brings about the dimensional disaster problem;
3) Manually extracted cross features cannot be generalized to patterns that have never appeared in training samples.
Therefore, it is extremely meaningful to automatically learn the interactions between features. At present, most of the related research work is based on the factorization machine framework [
By combining a CIN model with a linear regression unit and a fully connected neural network, we get the final model and name it the Explicit and Implicit Deep Factorization Machine (xDeepFM).
There have also been some methods to predict or select web services. In 2019, Poordavoodi, Alireza et al. modified the Interval Data Envelopment Analysis Models for QoS-aware Web service selection considering the uncertainty of QoS attributes in the presence of desirable and undesirable factors. It improved the fitness of the resultant compositions when they filter out unsatisfactory candidate services for each abstract service in the preprocessing phase [
In order to realize the automatic learning of explicit high-order feature interactions and at the same time make the interactions occur at the vector level, xDeepFM proposes a new neural network called the Compressed Interaction Network (CIN). In the CIN, the hidden layers are called the unit objects. The original features of the input together with the output of the unit objects are organized into matrixes, which are denoted as
The
(1) According to the state
(2) Based on this intermediate result,
The encoder-decoder model is a solution to the seq2seq problem [
In order to solve this problem, the author of [
Compared with the previous encoder-decoder model, the biggest difference of the attention model is that it does not require the encoder to encode all input information into a fixed-length vector. In addition, during decoding, each step will respectively select a subset of the vector sequence for further processing. In this way, when each output is generated, the information carried by the input sequence can be fully utilized. Furthermore, this method has achieved very good results in translation tasks.
In essence, attention selectively filters out a small amount of important information from a large amount of information and focuses on this important information [
We propose a new service reliability inference method named the Service Reliability Inference Method based on Deep Neural Network (SRI-XDFM), with the following considerations.
(1) Self-correlated features are embedded according to the distance between features rather than randomly;
(2) The pooling of the feature map is adaptive, which means that the output can contain more information, and
(3) The relationship between the user and service fields is concentrated in the process of non-linear knowledge learning.
The basic workflow of the SRI-XDFM is shown in
There are three main processes in the SRI-XDFM: embedding, the A-CIN and the I-DNN.
In the next sections, we will introduce these three important units.
The original data are divided into two parts according to if they belong to a user or a service. Then, these two parts are further classified according to the field. A field here refers to the one-dimensional input in the original data. For example, the country to which the user belongs is a field. Afterward, all the field data are divided into two parts according to if it is self-correlated. A self-correlated field here refers to a field with the following characteristic: The distance between two instances that are picked arbitrarily from the field contains valid information. For example, the user’s IP address is a self-correlated field since we can say that two users are related somehow if the IP addresses of these users are similar. On the contrary, the country to which the user belongs is a non-self-correlated field since there contains little useful information in two countries whose spelling is similar.
The fields that are classified and named
One-hot encoding is a method that encodes the input only based on the classification of raw data [
Firstly, the relationship between two inputs is discarded in the random encoding process as we mentioned before. Thus, it can never be learned in the next steps in the model. That means that we lose some information from the original data that is of vital importance.
Secondly, the content of some fields in the original data is too sparse. For example, users will seldom share the same IP address. Therefore, this field will be encoded into a rather long vector, which is unacceptable.
To solve the above problems, we introduce a character encoding method to the embedding layer to replace the one-hot method for self-correlated data. The raw data are encoded in a character way according to specific business scenarios. For example, the IP address of a user is 12.108.127.138, and the corresponding binary IP address is 00001100, 01101100, 01111111, and 10001010. This 32-dimensional variable is the character encoding vector of this user’s IP field. Another scenario is the encoding of the URLs. Suppose that a service has the URL with the path
Character encoding is a method that ensures the important self-correlated information will not be discarded. However, we need a method to transfer raw data with different shapes and rather long sizes into embedding features with a certain shape not taking up too much space. Furthermore, the self-correlated information should be extracted in a reasonable way. Therefore, we adopt a Convolution Neural Network (CNN), which does a good job on the above statements.
The CNN is a feed forward neural network, which extracts hidden local correlation features via the layer-by-layer convolution and pooling of input data, and generates high-level features via layer-by-layer combination and abstraction [
The CNN model designed for embedding feature extraction in this paper consists of two convolution layers, two max pooling layers, and two fully connected layers.
The Adaptive Compressed Interaction Network (A-CIN) is a neural network with the purpose of extracting vector-wise information and learning knowledge implicitly. It takes advantage of the idea of the factorization machine and solves the problem of how to combine features in the case of sparse features. The overview of the A-CIN is shown in
After embedding, all user and service data are converted into x and y vectors with a length of D. We reconstruct them into a matrix that is the input of the A-CIN. Then, it is applied to k operations similar to feature map extraction and named compression, whose purpose is to combine the features on the vector-wise level and simulate the factorization machine. Each output of the compression is passed to an adaptive pooling method. All the output of the adaptive pooling is flattened into a vector, which is the output of the A-CIN.
We combine all user and service embedding vectors into a matrix
Here
Note that
The purpose of adaptive pooling is to reduce the dimension of the output of the compression and reserve more information at the same time. Every hidden layer
The output of adaptive pooling is a number y. With that in mind, we firstly combine all units of H and then put them into a neural-like structure, which results in s. It is calculated as follows:
Note that it is not a neural network for all the parameters
Then, s is the input to an operation called resize, which aims to restrict all the s with a certain scope. The output is denoted as a and calculated as follows:
Then, we can get a vector B denoted as
The above is one iteration in the algorithm. After several iterations, we can get the output. Here we conduct 3 iterations. The process is shown in
Afterwards, all outputs of adaptive pooling are flattened to a vector as the output of the A-CIN layer.
The inter-attention Deep Neural Network (I-DNN) is a model that runs parallel to the A-CIN. A kind of attention mechanism is applied here to concentrate on the relationship between user and service information. Firstly, we construct the user and service embedding vectors into two matrixes. Secondly, they are put into an inter-attention layer. The output is a vector that contains the user and service attention information. Finally, they are put into a DNN to extract bit-wise knowledge. The overview of the I-DNN is shown in
The output of all embedding layers is divided into two categories, namely, user and service features. We construct these vectors as two matrixes named
Firstly, we construct a matrix
Here
Secondly, we reconstruct two matrixes, named
Thirdly, we flatten A and B to P, which is the output of the Inter-attention layer. Thus the input of the DNN is P, which contains relationship information instead of straight flattening A and B to a plain vector.
Finally, we put P into the DNN. Here we put three hidden layers inside with the “ReLu” activation function, after which we get a vector named V. That is the output of the I-DNN layer.
A real-world web service performance dataset (WSDREAM) [
Dataset | D1 | D2 |
---|---|---|
Response-Time | Throughput | |
0–20 s | 0–1000 kbps | |
339 | 339 | |
5,825 | 5,825 | |
1,974,675 | 1,974,675 |
Here we define the reliability of a service between user
The experimental environment in this paper is a server with 8G of memory and a 1.6 GHz CPU (Xeon e5–2603 v3). All the algorithms are written in the Python 3.7 environment using the Keras library and scikit-learn tools.
We adopt the mean absolute error (MAE) and the root mean square error (RMSE) as the performance measurements for our model [
The MAE is defined as follows:
Here
The RMSE is defined as follows:
We compare the performance of the SRI-XDFM with four models that are able to perform service reliability inference, including the NNMF [
No | Train: Test |
---|---|
20%:80% | |
25%:75% | |
30%:70% | |
35%:65% | |
40%:60% |
We evaluate the performance of our proposed method from four aspects: The general model effect, the effectiveness of the self-correlated feature embedding, the effectiveness of the adaptive pooling and the effectiveness of the inter-attention mechanism.
General model means the proposed model that combines self-correlated feature embedding, adaptive pooling and the inter-attention mechanism.
We evaluate the overall result of our model here. The evaluation of the general model is shown as follows.
Testing case | NNMF | TSVD | WSPM | xDeepFM | SRI-XDFM |
---|---|---|---|---|---|
T1 | 3.40 | 2.95 | 2.51 | 2.10 | 1.97 |
T2 | 2.92 | 2.90 | 2.49 | 1.97 | 1.87 |
T3 | 2.85 | 2.79 | 2.43 | 1.75 | 1.70 |
T4 | 2.75 | 2.75 | 2.37 | 1.76 | 1.61 |
T5 | 2.55 | 2.5 | 2.35 | 1.74 | 1.61 |
Testing case | NNMF | TSVD | WSPM | xDeepFM | SRI-XDFM |
---|---|---|---|---|---|
T1 | 5.41 | 4.95 | 4.33 | 3.93 | 3.70 |
T2 | 5.05 | 4.90 | 4.27 | 3.94 | 3.65 |
T3 | 4.98 | 4.83 | 4.22 | 3.74 | 3.64 |
T4 | 4.83 | 4.81 | 4.19 | 3.69 | 3.64 |
T5 | 4.85 | 4.81 | 4.18 | 3.66 | 3.63 |
Overall, compared with the other four models, the SRI-XDFM is 20.27%, 13.02%, 7.16% and 4.55% better in the MAE and 5.6%, 3.21%,1.32% and 0.14% better in the RMSE. Therefore, we can say that the general model shows great superiority, and the task of service reliability inference is better solved. It is of great significance for service selecting in real world.
In this section, we evaluate the effectiveness of self-correlated feature embedding, adaptive pooling and the inter-attention mechanism by comparing the original xDeepFM with a xDeepFM plus a self-correlated feature embedding part, a xDeepFM utilizing adaptive pooling instead of max pooling and a xDeepFM plus an inter-attention layer. Hereafter, we call them the S-XDFM, A-XDFM and I-XDFM, respectively.
Testing case | xDeepFM | S-XDFM | A-XDFM | I-XDFM | SRI-XDFM | |||
---|---|---|---|---|---|---|---|---|
T1 | 2.1 | 2.06 | 2.01 | 2.07 | 1.97 | |||
T2 | 1.97 | 1.93 | 1.89 | 1.93 | 1.87 | |||
T3 | 1.75 | 1.73 | 1.71 | 1.72 | 1.7 | |||
T4 | 1.76 | 1.7 | 1.64 | 1.65 | 1.61 | |||
T5 | 1.74 | 1.69 | 1.65 | 1.63 | 1.61 |
Testing case | xDeepFM | S-XDFM | A-XDFM | I-XDFM | SRI-XDFM |
---|---|---|---|---|---|
T1 | 3.93 | 3.77 | 3.93 | 3.9 | 3.7 |
T2 | 3.94 | 3.72 | 3.92 | 3.87 | 3.65 |
T3 | 3.74 | 3.67 | 3.74 | 3.69 | 3.64 |
T4 | 3.69 | 3.65 | 3.68 | 3.64 | 3.64 |
T5 | 3.66 | 3.63 | 3.64 | 3.64 | 3.63 |
C represents the contribution value. XD, SRI and T represent the MAE or RMSE of the xDeepFM, the SRI-XDFM and the target model, respectively.
We can see that self-correlated embedding does not have much effect on the general MAE result, and its contribution is only from 30.7% to 40%. Adaptive pooling contributes 69% to 80% to the decrease of the MAE in all testing cases, which means that this method does a great job regarding the MAE. The inter-attention method contributes little when the training set is small at only 23.1%, but it increases sharply with the training set increases and it reaches 84.6% at last.
The experimental results show that the combination of these three methods is useful for service reliability inference. When considering both the MAE and RMSE, the inter-attention method gets similar performance. However, it is obviously that the self-correlated embedding method does a better job regarding the RMSE and adaptive pooling is better regarding the MAE.
In summary, all the above experimental results confirm that the SRI-XDFM does a good job in the service reliability inference task and performs better than the other methods.
With the development of web service technology, the number of services on the Internet has exploded. It is of vital importance to select the best service from a vast amount of similar services in a reasonable time. This paper proposes a service reliability inference model called the SRI-XDFM based on a general but effective recommendation system model called xDeepFM. Considering of the pattern of our scenario, three methods are proposed to enhance the effect of our model, including an embedding method addressing the self-correlation pattern in raw data, an adaptive pooling method to reserve more information and an inter-attention method to concentrate on the relationship between user and service. The experimental results show that the proposed method does a good job on this task and it outperforms other models. However, it is far from perfect. This method does not consider time, which is an important factor. Further research can be carried out on this aspect. We hope that our work can provide some references for the service reliability inference research in the future.
The authors would like to thank Beijing University of Posts and Telecommunications for the support in this research.