Design of Hybrid Recommendation Algorithm in Online Shopping System

In order to improve user satisfaction and loyalty on e-commerce websites, recommendation algorithms are used to recommend products that may be of interest to users. Therefore, the accuracy of the recommendation algorithm is a primary issue. So far, there are three mainstream recommendation algorithms, content-based recommendation algorithms, collaborative filtering algorithms and hybrid recommendation algorithms. Content-based recommendation algorithms and collaborative filtering algorithms have their own shortcomings. The contentbased recommendation algorithm has the problem of the diversity of recommended items, while the collaborative filtering algorithm has the problem of data sparsity and scalability. On the basis of these two algorithms, the hybrid recommendation algorithm learns from each other’s strengths and combines the advantages of the two algorithms to provide people with better services. This article will focus on the use of a content-based recommendation algorithm to mine the user’s existing interests, and then combine the collaborative filtering algorithm to establish a potential interest model, mix the existing and potential interests, and calculate with the candidate search content set. The similarity gets the recommendation list.


Introduction
With the continuous development of Internet technology and the continuous improvement of social material life, in order to save unnecessary time and improve the quality of life in this fast-paced society, personalized recommendation has become one of the core functions of the Internet and is used in various industries. E-commerce(e-commerce) is a commercial activity that uses the combination of microcomputer technology and network communication technology. E-commerce sites use recommendation algorithms to provide users with the most likely products to purchase [1]; friend recommendations on social networking sites provide users with potential friend attention; video sites recommend users most likely to click on the video [2]; Content recommendation on news websites provides users with the most interesting news information [3]. Personalized recommendation technology is the solution, which is also one of the manifestations of Internet intelligence [4].

Related Works
There are three main recommendation algorithms at this stage, content-based recommendation algorithms, collaborative filtering algorithms, and hybrid recommendation algorithms. The content-based recommendation algorithm [5] is based on specific text information to make recommendations [6]. Collaborative filtering algorithms are divided into memory-based methods and model-based methods. At the earliest, it was mainly based on user evaluations, and found similar items and users by traversing the information of the whole or part of the evaluation matrix [7]. However, with the development and application of data mining technology in machine learning, new ideas have been provided for collaborative filtering algorithms. By establishing a learning model, using the model to use the evaluation matrix as experimental data to make intelligent recommendation predictions. However, the above two recommendation algorithms have fatal problems that affect the final recommendation effect. Therefore, in order to avoid the shortcomings of the two algorithms and take advantage of their advantages, the researchers proposed a third algorithm, the hybrid recommendation algorithm, which is based on Content recommendation algorithm and collaborative filtering algorithm are combined together [8]. The mainstream of hybrid recommendation algorithm is based on the combination of content and collaborative filtering, and has been widely used in actual recommendation systems. At present, most of the actual recommendation systems are hybrid recommendation systems that use the fusion of multiple recommendation algorithms [9].
In recent years, researchers have also made new progress in the research of hybrid recommendation algorithms. Jeffery D. Weir proposed a recommendation system based on the tag model in 2016 [10]. In view of the sparseness and cooling of the data of the social network system, researchers have proposed a hybrid recommendation algorithm based on probability matrix, which combines the user trust network with the method of probability matrix factorization [11]. Association and user rating matrix to evaluate and calculate the relationship between potential users and current users. Aiming at the problem that the collaborative filtering algorithm only uses user rating information as the recommendation basis, and ignores user comments and recommended item tags, thereby reducing the accuracy of recommendation, researchers have proposed an improved hybrid recommendation algorithm based on stack noise reduction autoencoder [12]. Based on the behavior and attributes of the purchased goods, researchers have proposed a hybrid recommendation algorithm based on the Gaussian model probability matrix decomposition [13]. In the recommendation system of online courses, Liang Ying uses migration learning algorithms to initially solve the problem of data sparsity [14].

Fundamental
The basic idea of the collaborative filtering algorithm is to make recommendations for current users based on the historical purchase behavior of users with the same interest. In short, from the massive data, mine users with the same interests as you, make them your neighbors, and then generate a recommendation list based on what they like and push it to you. From this, it can be seen that the collaborative filtering algorithm has two core problems: a) Identify users with the same interests as current users; b) Generate a recommendation list and push it to the current user. There are three major steps to implement collaborative filtering algorithms: a) Collect user's likes; b) Find similar users and items; c) Calculate data and generate recommendation list.

Problems with Collaborative Filtering Algorithms
Data sparsity is the biggest problem that collaborative filtering algorithms will inevitably face. In an actual commercial recommendation system, the number of users and their corresponding purchase items are very large, and users tend to base their evaluation on only a few purchase items. With the increase of users and corresponding purchased items, due to the limitation of computing resources and computing speed, the efficiency of the collaborative filtering algorithm will be greatly reduced after users and corresponding purchased items increase to a certain number, so that the actual demand cannot be met.

Fundamental
The content-based recommendation algorithm calculates the similarity between items based on the items that users liked in the past to generate a recommendation list. The core of the algorithm is the accuracy of the pairwise similarity of items. There are three steps to implement the algorithm: a) Extract feature values of past items; b) Use feature data to build user preference models; c) Generate a recommendation list by comparing the candidate items with the preference model.

The Problem of Content-Based Recommendation Algorithm
The problem of the diversity of recommended objects is the primary problem of content-based recommendation algorithms. The preference model is derived from the learning of items that users like in the past, so there are hidden dangers in the original data. At the same time, establishing an association model between users and items and extracting main features is a process that consumes a lot of time and manpower. In addition, if the user rarely purchases and purchases, the details of the data obtained will have a great impact on the browsing history of the recommendation result. The comparison of contentbased recommendation and algorithmic collaborative filtering algorithm is shown in Table 1.

Fundamental
Collaborative filtering algorithms and content-based recommendation algorithms can be complementary. The collaborative filtering algorithm does not have the problem of recommending the diversity of items, and the content-based recommendation algorithm does not have the problem of cold start. Naturally, these two recommendation algorithms can be merged into a new hybrid recommendation algorithm for better recommendation results.

Algorithm Types and Advantages
This method improves the traditional content-based method to obtain the user's existing interest, obtains the user's potential interest through the collaborative filtering of feature words, and mixes the user's existing interest with the potential interest to obtain a mixed user interest model. Similarity calculation, recommend products that may be of interest to different users. Compared with the previous method, this article takes into account the needs of user preferences for diversity and personalization, more fully explores the potential interests of users, and improves user's click-through rate on recommended products.

Hybrid Recommendation Algorithm in Online Shopping System
The hybrid recommendation system model mainly consists of three parts: the user's existing interest model, the potential user's interest model, and the hybrid recommendation algorithm model. First of all, here are some definitions that need to be explained.
Mixed interest model (HM): the weight vector obtained by combining the user's existing interest model and potential interest model according to certain rules, denoted as W3 = {w31, w32, w33, ..., w3n}, where W3i is the feature The corresponding weight in the word sequence is called the mixed interest model(The model construction process is shown in Fig.1).

Figure 1: Construction process of mixed interest model
In the mixed interest model, the user's existing interest model can be divided into the content-based recommendation algorithm based on the principle. The user potential interest model belongs to the content of the collaborative filtering algorithm, so this is the basic framework of the mixed interest recommendation model.

User's Existing Interest Model Design
The searched content should be structured before establishing the user's interest model. The typical solution is TF_IDF (term frequency_inverse document frequency) notation. This notation uses weights to measure the importance of words. TF (term frequency) word frequency, that is, the proportion of a specific word in all words in the text information. In all the text content, "the", "these", "those", and similar common words with no real meaning usually have a high frequency. Therefore, in order to improve the accuracy of the extracted feature words, IDF (inverse document frequency) inverse document frequency concept. Assuming that a certain word is relatively rare, but appears more frequently in a certain document, these words are likely to reflect the characteristics of the article, and most likely are the characteristic words of the article content. A new weight parameter is set for this situation, IDF, which is inversely proportional to the commonness of the vocabulary. The weight calculation method of TF_IDF is TF * IDF, [freq(i,j)/sum(k,j)] *log[N/n(j)], (1) TF=freq(i,j)/sum(k,j), where freq(i,j) is the number of occurrences of word i in the search content set dj; sum(k,j) is all in the search content set dj the number of vocabulary. IDF=log[N/n(j)], where N is the total number of searched content sets, and n(j) is the number of searched content that has appeared in vocabulary i.
Given the search content set D={d1, d2, ..., dn}, and the feature word set S={s1, s2, ..., sn}, the search content set can be expressed as corresponding to the feature word set S A vector space model, di = {wi1, wi2, ..., wij, ..., wik}, where wij represents the weight of the feature word sj in the search content set di, if wij is 0, it means the search content There is no feature word sj in the set di. Therefore, the search content set can be equivalent to a weight matrix: (2)

User Potential Interest Model Design
The difference between the user's potential interest model and the user's existing interest model is that the user's potential interest cannot be found directly through previous searches. This paper proposes to use collaborative filtering algorithm to solve the problem. The traditional collaborative filtering algorithm finds similar users through a scoring matrix. Usually, different users will buy or browse the same items but purchase different items. It is difficult for these users who browse the same items to be classified as similar users. To solve the above problems, we only need to calculate the similarity sim(u,v) of the search content of different users.

Similarity Calculation
The core part of the collaborative filtering algorithm is to find users with the same interest, and its efficiency and results largely determine the efficiency and results of the collaborative filtering algorithm. The method of measuring the similarity of the search content of user i and user j is as follows: first obtain all items of the weights of the feature words in the search content set of users i and j, and then calculate the similarity of users i and j through the similarity measurement method, denoted as sim(i, j).
Usually there are three calculation methods for sim(i,j): cosine similarity calculation method, Pearson correlation coefficient, Euclidean metric. This article uses cosine similarity calculation method.

Recommend Interest Words of Similar User Groups and Build Models
Through the above algorithm, the similarity between the search content of the current user and all other users can be calculated, and the n users with the highest similarity to the current user can be arranged as neighbor groups. The collaborative filtering algorithm recommends the existing interest model of users in the neighbor group to the current user, which is the potential interest model of the user.

Design of Hybrid Recommendation Algorithm Model
After obtaining the user's existing and potential interest models, the two interest models are combined according to the rules, and the similarity is calculated with the candidate recommended item set, given the similarity threshold a, and the recommendation result is checked.

Data Representation
In the hybrid recommendation system introduced in this article, recommendation results are generated according to the different weights of each feature word in different search content sets. The weight matrix can be represented by an m × n matrix. The m row represents the search content set of m users, the n column represents n feature words, and the element Wij in the i-th row and the jth column represents the weight of the j-th feature word in the search content set of the i-th user. The weight matrix is shown in Table 2.

Experimental Evaluation Index
After recommending the recommendation results to users, the following results are obtained (the recommendation results are shown in Table 3): According to the several possibilities of the result, the accuracy rate (or precision rate, Precision) and coverage rate (or recall rate, Recall) are usually used as the index of algorithm evaluation. The calculation formula for accuracy is: (4) Indicates the ratio of the number of recommended and visited items to the total number of recommended items, and the coverage rate calculation formula is: Represents the ratio of the number of recommended hits to the total number of items visited by users in the test set.
In fact, the precision and recall are in conflict with each other. If you increase the number of recommended items, the coverage rate increases, but at the same time the accuracy rate decreases. Therefore, the two are usually given a comparable weight and combined into a comprehensive measure F to evaluate the quality of recommendation. The larger the F value, the higher the recommended quality. Calculated as follows: (6)

Experimental Program
The data in this article is taken from Datacastle's user browsing data set, randomly selected 1,000 users, and the number of recommended items from 10 to 60 for experiments. In the process of discovering the potential interests of users, it is necessary to determine the size of the neighbor group, that is, the number of users with similar interests. In order to facilitate the evaluation of performance, we fixed the size of the neighbor group to 35 people, and the similarity algorithm uses the cosine calculation method. The final result of the recommendation algorithm is to generate N items that users may be most interested in, so that users can choose, and examine different algorithms under different N values (based on content recommendation algorithm, collaborative filtering algorithm, this article introduces the algorithm) The accuracy, coverage, and F value.
Therefore, in the case of different recommended items (N), The accuracy,Coverage rate and F value of each algorithm is as follows:  1. Accuracy and coverage are two reciprocal parameters. As the number of recommended items increases, accuracy decreases and coverage increases.
2. As a comprehensive parameter weight, F has a peak value as the number of recommended items increases, and then slowly decreases.
3. According to Fig. 4, the algorithm introduced in this paper is better than collaborative filtering algorithm and content-based recommendation algorithm, and there is no cold start problem.

Summary and Outlook
The recommendation system has gone through a long period of research and development and has achieved remarkable results. The function of personalized recommendation system is mainly manifested in three aspects: 1) Turn e-commerce viewers into buyers; 2) Improve the ability of e-commerce websites to cross-sell; 3) Improve user experience and increase user loyalty.
But further efforts are needed. It is undeniable that there are still many difficulties in the recommendation system that have not been broken through, such as extracting accurate user preferences and object characteristics; multi-dimensional research on recommendations; research on the security of recommendation systems. However, we firmly believe that with the development of society, technological advancements With continuous progress, the research on recommender systems will become more and more in-depth, so as to better serve people's material and cultural life.