With the rapid development of the Internet, the amount of data recorded on the Internet has increased dramatically. It is becoming more and more urgent to effectively obtain the specific information we need from the vast ocean of data. In this study, we propose a novel collaborative filtering algorithm for generating recommendations in e-commerce. This study has two main innovations. First, we propose a mechanism that embeds temporal behavior information to find a neighbor set in which each neighbor has a very significant impact on the current user or item. Second, we propose a novel collaborative filtering algorithm by injecting the neighbor set into probability matrix factorization. We compared the proposed method with several state-of-the-art alternatives on real datasets. The experimental results show that our proposed method outperforms the prevailing approaches.
With the rapid development of the Internet, the amount of data recorded on the network has increased dramatically. It is becoming increasingly urgent to be able to effectively obtain the specific information we need from a vast ocean of data. Although traditional search engines can meet users’ retrieval requirements, they tend to return results with the same ranking to all users. Therefore, it is challenging to actively provide personalized services matching the interests of different users. Therefore, a recommendation system [
Among all the recommendation methods, collaborative filtering [
Although the collaborative filtering recommendation algorithm has many successful applications, several important issues need to be addressed. One significant issue is that traditional collaborative filtering often ignores the structural relationship between users or items. In the following, we only use users to represent “users (items).” Effectively making use of the relationships between users can enrich the information from a single user such that user interests can be identified more accurately. Based on this concept, many collaborative filtering algorithms have been developed that fully consider the user relationships for data mining. They have facilitated many valuable results. The process to obtain the relationships between users and then quantify the relationships has an important impact on the effectiveness of the collaborative filtering algorithms. The existing solutions mainly consist of two types: one makes use of the relationships existing in the explicit social network, and the other calculates the similarity between users through implicit label information to get the relationship between users. However, in practical applications, it is difficult to obtain sufficient information regarding social relationships or labels. Moreover, the above solutions are based on the vague assumption that the interaction between users is undirected, which is not reasonable in reality. For example, if user A highly appreciates user B, then the behavior of user B has a great impact on user A. However, the impact user A has on user B is not significant or even negligible.
In this study, to address the above issues, we propose a nearest neighbor modeling method based on temporal consumption behavior. Constructing a temporal-based consumption network helps identify the interactions between users. The contributions can be summarized as follows:
The proposed modeling strategy only requires the user’s consumption time rather than complicated information such as user labels and social relationships. The calculated influence is directed. The strategy can more accurately discover the interactions between users and identify the biggest impact on the current user from the neighbor set. Based on the modeling strategy, a recommendation algorithm termed TemporalMF is developed, which uses the neighbor set for collaborative filtering recommendation based on probability matrix factorization. We do extensive experiments on the Douban recommendation dataset. The experimental results show that by using social network information and label information, TemporalMF can predict the users’ actual scores more effectively and hence improve the accuracy of recommendations compared to traditional recommendation algorithms.
The remaining sections are arranged as follows. Section 2 briefly reviews related works about collaborative filtering. Section 3 describes the TemporalMF algorithm and its corresponding recommendation framework. In Section 4, some benchmarking algorithms are introduced for comparison. The last section concludes the study.
Collaborative filtering makes use of the behaviors (scores, clicks, etc.) from similar users to predict the interest of the target user for a particular item. It then makes corresponding recommendations based on the predicted interest. Currently, popular collaboration filtering recommendation algorithms can mainly be divided into the neighborhood-based type and the model-based type.
Neighborhood-based collaborative filtering first calculates the similarity between users based on their historical information and then uses the evaluation of other items by neighbors with high similarity to the target user to predict the user’s preference for specific items. The preference degree of users is considered as the recommendation threshold. At present, neighbor-based collaborative filtering mainly includes the user-based type and the item-based type. The core of user-based collaborative filtering algorithms is to find similar users. Item-based algorithms mainly find similar items.
Different from nearest neighbor-based collaborative filtering, model-based collaborative filtering mainly trains a model through users’ rating data and then uses this model to predict the unknown variables [
Although traditional collaborative filtering algorithms can achieve satisfying recommendation performance, they often only take the scoring information into consideration and ignore temporal information and relationship information that is closely related to recommendations. Effective use of this information can further improve the accuracy of the recommendation. Therefore, many recommendation algorithms based on temporal information and relationship information have been proposed.
A temporal-based recommendation takes temporal information into consideration. The derived model can learn the dynamic changes of data to optimize the recommendation. Koren [
Different from these related works, in this study we use temporal information to construct the structural relationship between users or items. Based on the structural relationship, the similarity is calculated and integrated into the probability matrix decomposition such that a new recommendation framework is generated.
Traditional collaborative filtering algorithms generally assume that users or items are independent. So, they ignore the structural connection between users or items. Recommendation algorithms based on user or item relationship mining incorporate user or item relationships into existing collaborative filtering algorithms to enrich the information of a single user and hence improve the accuracy.
One of the core steps of the collaborative filtering algorithm based on relationship mining is to obtain the relationships between users. At present, the methods for obtaining user relationships are mainly divided into explicit and implicit types. In [
Considering that the above two types of relationship modeling methods are often affected by the difficulty of collecting large amounts of related data, in this study we mainly use the time sequence information of user consumption to mine the implicit relationships and integrate user or item relationships into the matrix factorization model. Finally, a collaborative filtering recommendation algorithm (termed as TemporalMF) based on time-sequential behavior is designed for recommendations.
Suppose we have a recommendation system that contains
Let
where
To avoid overfitting, the eigenvectors of the user and items are both assumed to follow the Gaussian prior distribution with
According to Bayesian inference theory, the posterior probability of
In this section, we give a detailed introduction of TemporalMF from three aspects. Firstly, we explain how to use the time sequence information of user consumption to mine the interactions between users or items so as to determine the neighbor set that has the greatest impact on users or items. Second, we explain how to successfully integrate this neighbor set into collaborative filtering based on probability matrix factorization. Lastly, detailed steps and computational complexity of TemporalMF are listed and analyzed.
In relationship-based matrix factorization, one of the core steps is to obtain a user or item relationship. Traditional collaborative filtering ignores the consumption time information. However, the time sequence information can provide some hidden patterns that can be used for mining relationships between uses and items. In order to discover these potential relationships, we first introduce a time sequence-based user consumption network, as shown in
We use
where
Similarly, a time sequence-based item consumption network can be easily constructed. A toy example is shown in
After mining the influence relationships between the corresponding users or items and finding the nearest neighbor set, we apply them to the matrix factorization model. As a result, the eigenvector of the user or item is affected by its nearest neighbors. In other words, similar users or items should have similar eigenvectors.
where
Similar to
The probability map model of this model is shown in
To facilitate the solution, the logarithmic of the posterior probability in
Maximizing this posterior probability is equivalent to minimizing the following objective function:
In
where
The recommendation algorithm is divided into four steps:
In our proposed algorithm, the steps that consume many CPU seconds are from two aspects. One is the consumption network construction and relationship mining. The other is the recommendation based on the obtained influence relationships. Let us take a user-based network as an example. Suppose that on average each item is consumed by
For the second aspect, according to the analysis in [
In this section, we first briefly introduce the datasets used in our experiments and then describe the evaluation criteria and comparison algorithms. Finally, we give the comparative experimental results of our proposed model and comparison algorithms.
In order to compare the impact of different information and relationships on the recommendation results, the experimental datasets should contain rating information, label information, and user social relationships. To this end, we collected such data from the Douban website as our experimental datasets. Douban is a very popular website for evaluating, discussing, and recommending movies, books, and music. The site can provide users with ratings, discussions, and recommendations. It has the largest data of books, music, and movies in China and is one of the largest online communities in China. On the website, each user can rate books, music, or movies within the range [
The dataset we used in this study is listed in
Information | Item type | |
---|---|---|
Books | Movies | |
#Users | 23944 | 9601 |
#Items | 219725 | 44779 |
#Tags | 74095 | 50530 |
Rating records | 1642111 | 1960682 |
Social relationships | 588269 | 91945 |
We adopt RMSE [
where
We selected four methods as comparison algorithms. They are PMF, BPTF, SocialMF, and TagMF. In our experiments, we referred to the original references to set their corresponding parameters.
In this section, we report our experiments from three aspects, namely the performance with different dimensions of the eigenvector, the computational complexity, and the robustness analysis.
First, we set the dimension of the eigenvector
With the increase of
Compared with PMF, the performance of BPTF, TagMF, SocialMF, and the proposed algorithm is significantly improved, which indicates that the relationship between temporal information and users (items) plays a greater role in improving the accuracy of traditional collaborative filtering algorithms.
Compared with SocialMF, the performance of BPTF is improved, which is mainly caused by the sparse social relationships displayed in SocialMF. Compared with TagMF and TemporalMF, the results of BPTF are slightly worse.
Compared with TagMF and our method, SocialMF performs a little worse, which is mainly because SocialMF does not consider the relationships between items. In addition, SocialMF does not consider the direction of the relationship between friends.
Compared with TagMF, the proposed algorithm outperforms substantially, which indicates that the impact relationship proposed in this article can effectively improve the accuracy of the algorithm. In addition, because the proposed algorithm only requires simple time information, from the perspective of information acquisition and applicable scope, the proposed algorithm also has an advantage.
Algorithms | ||||||
---|---|---|---|---|---|---|
Books | Movies | Books | Movies | Books | Movies | |
PMF | 0.7511 | 0.7358 | 0.7465 | 0.7327 | 0.7435 | 0.7311 |
BPTF | 0.7317 | 0.7279 | 0.7267 | 0.7231 | 0.7242 | 0.7228 |
SocialMF | 0.7339 | 0.7309 | 0.7307 | 0.7269 | 0.7289 | 0.7245 |
TagMF | 0.7298 | 0.7267 | 0.7240 | 0.7235 | 0.7219 | 0.7218 |
TemporalMF | 0.7294 | 0.7251 | 0.7238 | 0.7229 | 0.7217 | 0.7208 |
The experimental results show that the proposed algorithm performs the best compared to traditional PMFs, which fully illustrates the rationality and effectiveness of the impact relationship proposed in this paper. Compared to SocialMF, TagMF, and BPTF, although the improvement is not very significant, in actual recommendation systems, the social network relationships or tag information required by SocialMF and TagMF are often difficult to obtain or extremely sparse, whereas the proposed algorithm only requires time information that is very easy to obtain. Therefore, the proposed algorithm can be widely used in many practical applications.
Algorithms | Books | Movies |
---|---|---|
PMF | 1.7 | 1.9 |
BPTF | 15.2 | 18.6 |
SocialMF | 2.9 | 2.7 |
TagMF | 4.2 | 3.9 |
TemporalMF | 4.4 | 4.0 |
We see that the consumed CPU seconds of BPTF are much higher than other algorithms. This is mainly because BPTF consumes too many CPU seconds for Markov Monte Carlo training, so the computational complexity of BPTF is high. In addition,
In TemporalMF, the parameter
In order to compare the effects of the algorithms under different sparsity conditions, we use the number of user ratings as a basis for dividing the users in the training set into four groups, namely [0:10], [10:100], [100, 500], and [500, inf].
After dividing the data accordingly, we use the training set to learn the corresponding model and then calculate RMSE for the four groups of users on the test set. We observe that:
In the case of extremely sparse data (the number of scores is less than 10), the improvement effect of TemporalMF is not obvious, and it is worse than SocialMF and TagMF that introduce additional information. This is mainly because the sparse data will cause the network graph created by TemporalMF to be extremely sparse, which affects its accuracy. BPTF introduces parameter training methods, so it performs better than TemporalMF. However, TemporalMF is still better than traditional PMF. When user rating data increases, TemporalMF is better than TagMF, SocialMF, and BPTF, which further proves that the impact relationships introduced in this article can effectively improve the accuracy of the recommendation system.
It can also be seen from
[0, 10) | [10, 100) | [100, 500) | [500, inf) | |
---|---|---|---|---|
PMF | 0.8411 | 0.7504 | 0.7321 | 0.7777 |
BPTF | 0.8324 | 0.7402 | 0.7158 | 0.7320 |
SocialMF | 0.8301 | 0.7423 | 0.7256 | 0.7541 |
TagMF | 0.8227 | 0.7354 | 0.7148 | 0.7318 |
TemporalMF | 0.8400 | 0.7258 | 0.7125 | 0.7320 |
[0, 10) | [10, 100) | [100, 500) | [500, inf) | |
---|---|---|---|---|
PMF | 0.8588 | 0.7501 | 0.7378 | 0.7458 |
BPTF | 0.8574 | 0.7408 | 0.7305 | 0.7402 |
SocialMF | 0.8361 | 0.7410 | 0.7304 | 0.7403 |
TagMF | 0.8114 | 0.7408 | 0.7304 | 0.7406 |
TemporalMF | 0.8362 | 0.7407 | 0.7302 | 0.7402 |
This study uses temporal information to establish a user (item) consumption network. Through this network, the user (item) potential interaction relationship and the nearest neighbor set are found. Then, they are incorporated into a matrix filtering-based collaborative filtering recommendation algorithm. As a result, the accuracy of score prediction is improved. Since the consumption time information is easier to obtain compared to social network information and tag information, the application of collaborative filtering recommendation algorithms based on temporal behavior is more widely possible. Experiments on Douban recommendation datasets show that this method has strengths compared with traditional recommendation algorithms. In future work, we will further study the challenges in this method arising from solving the problem of a sparse consumption network and the cold start of users and items. In addition, the relationship acquisition method we propose is not limited to the probability matrix decomposition algorithm. We can also consider applying this technique to other matrix factorization methods. In future work, we will also study the use of this method to further improve the recommendation predictions.