Recommendation systems are going to be an integral part of any E-Business in near future. As in any other E-business, recommendation systems also play a key role in the travel business where the user has to be recommended with a restaurant that best suits him. In general, the recommendations to a user are made based on similarity that exists between the intended user and the other users. This similarity can be calculated either based on the similarity between the user profiles or the similarity between the ratings made by the users. First phase of this work concentrates on experimentally analyzing both these models and get a deep insight of these models. With the lessons learned from the insights, second phase of the work concentrates on developing a deep learning model. The model does not depend on the other user's profile or rating made by them. The model is tested with a small restaurant dataset and the model can predict whether a user likes the restaurant or not. The model is trained with different users and their rating. The system learns from it and in order to predict whether a new user likes or not a restaurant that he/she has not visited earlier, all the data the trained model needed is the rating made by the same user for different restaurants. The model is deployed in a cloud environment in order to extend it to be more realistic product in future. Result evaluated with dataset, it achieves 74.6% is accurate prediction of results, where as existing techniques achieves only 64%.
E-commerce has extended its horizon to provide different types of services and products to the end customers. Recommendation systems act as an integral part of the modern e-commerce sites and applications. Whatever the products may be, the sellers use recommendation systems for making their products reach their customer. The products range from music [
The objective of this work is to analyses the recommendation systems based on user similarity both profile based and rating based. In addition to this a deep neural network model based on restricted boltzman machine is also developed and the performance is measured. The experimentation is done with the small dataset which could be helpful for the restaurants with small number of customer ratings.
The models implemented in this work, are as follows Recommendation of restaurants to the user based on similarity between the users in the context of profile-based similarity. Recommendation of restaurants to the user based on similarity between the users in the context of rating-based similarity. Recommendation of restaurants to the user based on the rating made by the same user with deep neural network.
In order to provide an impact of real time implementation, the details of the restaurants and the corresponding rating are deployed as the web services in the Google cloud environment. When a particular user is needed to be recommended with a set of restaurants, the ratings and the details of the restaurants are obtained from the web services and further processed. For ease of implementation, it has been considered the profiles of the users are stored in a separate web service. The model is depicted in the following
The paper is organized as follows. The second section explains the state of art model that are in practice, the third section discusses in brief the various models implemented, the fourth section gives the results and the final section provides conclusion and the future work.
The various approaches for recommendation system are depicted in the following
Collaborative filtering is the commonly followed approach in recommendation systems. State of art models employ this kind of approaches for recommending items to the users. Collaborative filtering can be classified in to two types as shown in the below
The study made in this includes the various collaborative filtering approaches and their variants in terms of the way similarity measures are calculated and other such criteria.
Achrafand Lotfi et al. [
The concept of multi criteria rating model is combined with deep neural networks in [
Valdiviezo-Diaz et al. [
A new similarity measurement technique is proposed in [
Event recommendations are made in [
The number of web services pertaining to a particular application is increasing day by day; Botangen et al. [
While most of the recommendation systems concentrates on recommending products to individuals, Pujahari and Sisodia et al. [
Marchand et al. [
According to Afoudi et al. [
The complexity of the problem increases in this kind of models. Finding neighborhood is also a tedious task in scenarios like this. Cai et al. [
In [
Recommendation of lists has become popular with applications such as generation of play lists. When the traditional collaborative filtering model considers only individual items when predicting the preferences, a mechanism is needed for considering a list of items instead of a single item. Such a model is proposed in [
The dataset employed in the work is obtained from UCI repository and it contains 138 users and 130 restaurants. It also contains the details of the individual users and the details of the individual restaurants. The user details are latitude, longitude, drink level, dress preference, ambience, transport, marital status, birth year, interest personality, religion, activity, color, weight, budget and height. In case of the restaurants, the details are latitude, longitude, name, address, city, state, country, fax, zip, alcohol, Smoking area, dress code, accessibility, price, URL, ambience, franchise, area and other services.
Similarly, the dataset also contains the rating made by the users for different restaurants. The ratings include overall rating, rating made for food and rating made for service.
Three models are designed, The first one employs the features of the users and to find the similarity and followed by the collaborative filtering way of recommending the items. The second model uses the user's rating for finding the similarity and make recommendation. The third model is based on restricted boltzman machine. The third model is also based on the rating and hence it can be applied in the various applications such as E-commerce, live streaming product recommendations.
The pre-processing steps included in the first model is as follows Handling missing/null values Handling categorical data Splitting the dataset into train set and test set
The dataset is split into training set and test set. 90% of the data is allocated for training set and 10% is used for test set. The models built are explained in the following sub sections.
The
For instance, for a particular user with user ID
Si.No | User ID | Percentage of similarity |
---|---|---|
U1021 | 99.39867136999888 | |
U1053 | 98.9496556930428 | |
U1063 | 98.88673572911277 | |
U1116 | 98.9496556930428 | |
U1131 | 100.0 |
The restaurants that are rated by these top users are consolidated and the ratings are averaged and restaurants with top average ratings are recommended to the user U1029. It can be observed the list is not in the right order as shown in
S.No | PlaceID | Name | Latitude | Longitude | Recommendation percentage |
---|---|---|---|---|---|
1 | 132866 | Chaires | 22.14122 | −100.931 | 100 |
2 | 132870 | Tortas y hamburguesas el gordo | 22.14308 | −100.935 | 100 |
3 | 132869 | Dominos Pizza | 22.14124 | −100.924 | 100 |
4 | 132851 | KFC | 22.13687 | −100.935 | 100 |
5 | 135054 | Restaurante y Pescaderia Tampico | 22.14063 | −100.916 | 100 |
6 | 135082 | la Estrella de Dimas | 22.15145 | −100.915 | 100 |
7 | 132668 | TACOS EL GUERO | 23.73821 | −99.152 | 99.39867137 |
8 | 132715 | tacos de la estacion | 23.73242 | −99.1587 | 99.39867137 |
9 | 132740 | Carreton de Flautas y Migadas | 23.7522 | −99.1666 | 99.39867137 |
10 | 135032 | Cafeteria y Restaurant El Pacifico | 22.15248 | −100.973 | 98.94965569 |
12 | 135081 | El Club | 22.16484 | −100.96 | 98.94965569 |
13 | 135062 | Restaurante El Cielo Potosino | 22.1537 | −100.979 | 98.94965569 |
14 | 135027 | Restaurant Orizatlan | 22.14715 | −100.974 | 98.94965569 |
18 | 135052 | La Cantina Restaurante | 22.15098 | −100.977 | 98.94965569 |
20 | 135038 | Restaurant la Chalita | 22.15565 | −100.978 | 98.94965569 |
21 | 135063 | Restaurante Alhondiga | 22.15672 | −100.976 | 98.94965569 |
The restaurants that are rated by U1029 are listed in the
Restaurant | U1029 | U1053 | U1116 | ||||||
---|---|---|---|---|---|---|---|---|---|
Overall rating | Food rating | Service rating | OR | FR | SR | OR | FR | SR | |
135047 | 1 | 1 | 1 | 2 | 2 | 2 | |||
135059 | 2 | 1 | 1 | 0 | 2 | 2 | |||
132937 | 1 | 1 | 1 | ||||||
135085 | 1 | 1 | 1 | 2 | 2 | 2 | |||
132834 | 0 | 1 | 0 | 2 | 2 | 2 | |||
132754 | 0 | 0 | 0 | 1 | 2 | 1 | |||
132825 | 1 | 1 | 0 | 1 | 2 | 0 | 2 | 2 | 2 |
132921 | 1 | 1 | 1 | ||||||
132862 | 1 | 1 | 1 | ||||||
132922 | 1 | 1 | 1 |
The average rating of the other users and the average rating of the intended user is compared in the following
Though it is known that comparison of rating made by two different users is not a good idea, it is made to get an idea of how efficient the recommendations are. It has also been observed from the results when ordered according to the average rating of the similar users also does not show better results. This is not the case with a single user, the model has been tested with 18 users and the average rating between the intended user and the similar users are compared and the root mean square is calculated, the value of which is 0.7912287009094507.
It has been inferred from this user similarity based recommendation model is that it could be used in cases where the users have not made any previous ratings. This can even helps in addressing the cold start problem that prevails in the recommendation systems. But, since the similar users are identified with the profile of the user rather than the rating made by the users, there is a scope for enhancing it which forms the second model.
In the second proposed model, the similarity between the users is calculated on the basis of the rating made by them on different restaurants. Given a user, the objective of the model is to recommend a set of restaurants are shown in
1. Consider a set of users U = (u1, u2,….un) |
5. Let ur be the user to whom the recommendation should be made and ur ∈ U |
The model is based on the similar between the ratings made by different users. In general, various normalization methods are followed to normalize the ratings of the user. This is done in order to address the variations in the rating made by the users with respect to their tolerance and conservative nature. But in the considered dataset the rating is made only with three values 0, 1, 2. So it is believed that normalizing the value is not required as there cannot be much variation. As the first step, Cosine similarity is calculated between the given user and all other users, cosine similarity given in
where A and B are vectors and in the considered scenario, they are the ratings made by the user. With the help of the cosine similarity, the top 30 users who are similar to the intended user are found. The rating made by the similar users for a particular restaurant is extracted. With these values, a score is calculated based on which the restaurants are recommended to the user. The score is calculated as per
where, ri is the rating made by the similar users, smi is the similarity measure between the users
Finding the score enables us to getter a better idea on the results unlike the earlier model.
The important thing to note in this model is only the overall rating is considered rather than the considering all other ratings. The following
Restaurant | U1029 | |
---|---|---|
Overall rating | Score | |
132825 | 1 | 1.2812499999999998 |
132937 | 1 | 1.4999999999999996 |
132862 | 1 | 1.3888888888888888 |
135059 | 2 | 1.6666666666666665 |
135047 | 1 | 1.0999999999999999 |
132834 | 0 | 0.9666878227378957 |
132922 | 1 | 1.838888183281475 |
132921 | 1 | 1.2607810762326521 |
132754 | 0 | 1.461538461538461 |
135085 | 1 | 1.3111250295293724 |
Let |
The comparison of the trend of change in the score with respect to the rating can be inferred from the below
The deep learning model employed in this work is restricted boltzman machine. The steps involved are described below
The training set and the test set are in the separate files and they are loaded separately. Both the training set and test set contains the same data as of the ratings file. The number of observations in training data is 636 and number of observations in test data is 508. It follows the classical train and test split.
The training set and the test set are converted into an array such that the rows represent the users and the columns represent the restaurants. It has also been observed that the user rating is not common. The users are common in training and test set but the restaurants and the corresponding rating are different. The training and test set are converted into pytorch tensors and given as inputs to the boltzman machine. The train and test contain ratings in the range of 0 to 2 it has to be converted in to binary classification 1 or 0.
Restricted boltzman machines unlike other models contains only two layers, the input layer and the hidden layer as shown in the following
The input layer is the one that has the value either 0 or 1. 1 represents that the particular user likes the restaurant and 0 represents that the user dislikes it. Hidden layer represents the factors such as the Food, Service etc. It is called as the latent factors which are used for describing the restaurant choices. The system finds the latent factors hidden, based on the preferences of the user when the training set of that user is given as input. Bernoulli distribution is used in the Restricted boltzman machine for identifying the neurons in the hidden layer that would be activated. In the hidden layer the value from the input layer is multiplied with a weight which is updated with contrastive divergence. The procedure of the model is given in
The restricted boltzman machine works in this fashion and the rating of the restaurant that has not been yet rated by the target user is identified. Since the probabilistic values are used in the models it paves us a way to measure the performance of the model. The values that are obtained with this model are given in the results section.
The implementation of the web services in the cloud context is made in the cloud environment. The implementation of the interface between the recommendation system and the recommendation engine are done in python. The result obtained with the designed restricted boltzman machine is given below. The performance parameter used here is average distance which depicts the difference between the predicted values and the true values. The
From the average distance we can identify the percentage value of correct prediction. From the results obtained the percentage of correct predictions made during training is 64% and the percentage of correct prediction during test is 56%. It can be observed that the results obtained are low. But in general, the property of the deep learning model is that the performance of the model increases with the increase in the size of the dataset. The dataset employed here is a small dataset. The same model is tested with the movie lens dataset and the percentage of correct prediction is 74.6%.
Experimental analysis of the two recommendation models, one based on the similarity of the users based on their profile relevant to the context of choosing a restaurant and another based on the similarity of the users based on the ratings made by them is done. The advantages and the disadvantages of the models are analyzed. With the inferences made with these two models, a deep learning model with restricted boltzman machine is designed. This is designed to predict the rating of the restaurants that the user has not made earlier. The predictions are made only with the ratings made by the user for other restaurants. The implementation is made with the details of the restaurants, users and the corresponding rating as web services. This paves way for the implementation of the model in real time in future. Future work would also include the improvement of the percentage of correct predictions.