Improved Short-video User Impact Assessment Method Based on PageRank Algorithm

The short-video platform is a social network where users’ content accelerates the speed of information dissemination. Hence, it is necessary to identify important users to effectively obtain information. Four algorithms (Followers Rank, Average Forwarding, K Coverage, and Expert Survey and Evaluation) have been proposed to calculate users’ influence and determine their importance. These methods simply take the number of a user’s fans or posts as the standard of influence evaluation, ignoring factors such as the paid posters, which makes such evaluations inaccurate. To solve these problems, we propose the short-video user influence rank (SVUIR) algorithm, which combines direct and indirect influence to comprehensively measure the influence of short-video users, using reference factors such as the number of fans, likes, number of users’ works, users’ work quality, focus on behavior, comments, and forwarding behavior. An experiment verifies the algorithm on Douyin (i.e., TikTok), which is a typical short-video platform, and confirms that SVUIR is more comprehensive and objective than the above four algorithms.


Introduction
The American government has prohibited TikTok, the overseas version of Douyin, since early 2020, which highlights the importance of short-video sites. The development of Web 2.0 technology has helped this form of social platform to proliferate, and short videos have quickly become popular. Short videos stand out and have gained many users with original content. 5G technology has made the mobile internet more popular, and video can attract the attention of netizens more than static text and pictures, changing the era from "all people have microphones" to "all people have cameras." According to the 45th Statistical Report on China's Internet Development [1], the number of short-video users in China had reached 773 million by March 2020, accounting for 85.6% of all internet users. The popularity of short video increases the speed of information dissemination, reduces the threshold of knowledge acquisition, and promotes the virtuous circle of knowledge. However, the content is highly entertaining, the user algorithm's defects, evolved its model, and derived many extended algorithms. Xing et al. [17] added a chain relationship to form a weighted PageRank algorithm. Wang et al. [18] quantified the spread ability of users and integrated it in PageRank to obtain the Sf-uir algorithm. Ku et al. [19] combined PageRank with the HITS algorithm to more quickly obtain influence rankings. Zhou et al. [20] combined user dynamic behavior with PageRank to more accurately identify users with high influence. Fialal et al. [21] combined PageRank with author contributions to explore the influence ranking of scholars in an academic network.
The influence measurement method based on network topology is clear and direct, and can grasp the overall structure of a social network. PageRank, in particular, has played an important role in theory and practice. This paper introduces the characteristics of short-video social networks, comprehensively considers the number and quality of users, and proposes a short-video user influence measurement algorithm based on an improved PageRank algorithm.

Principle of Proposed SVUIR Algorithm
The topological structure of a short-video social network is compared to the webpage topology in the World Wide Web (WWW). Users on a short-video platform are equivalent to webpages on the WWW, and the behavior of users' mutual attention is equivalent to webpage links. Thus PageRank can be used to measure the influence of short-video users.

Mathematical Model of PageRank Algorithm
PageRank is a classic webpage ranking algorithm; a webpage's rank in the search results is based on how many other pages link to it. The mathematical model of the PageRank algorithm is [16] where p i is the number of webpages in the internet, M ðp j Þ is the number of linked in pages of p j , L(p j ) is the number of linked out pages of p j , N is the total number of pages, and d is a damping coefficient, which takes the value 0.85 according to experience.

Shortcomings of PageRank Algorithm
Using PageRank to calculate the user influence of short-video, users are treated as webpages, which only considers the "follow" relationship between users, and has the following deficiencies.

Initial PR Value is Inaccurate
PageRank determines the initial PR value by averaging, which is reasonable when ranking webpages. However, it is unreasonable that the initial PR value is given to the user directly by using the average method in the ranking of users. Considering the different factors between different users, such as the number of fans and the activity of the user, the influence of the user's own attributes on the short-video spread cannot be ignored.

Unreasonable Proportion of PR Value Transfer
PageRank divides the PR value of a webpage equally over linked pages. This is not accurate in the transmission of user influence because most users do not pay equal attention to all users of interest.

Topology Does Not Conform to Actual Situation
PageRank is applied to user influence based on the "follow" relationship, so it is largely related to the number of a user's fans. Because these probably contain artificial followers, the number of fans cannot truly reflect the influence of users [22]. Furthermore, dynamic behavior between users, such as comments and forwarding, will greatly affect the scope and dissemination speed of work.

Improvement of Influencing Factors of SVUIR Algorithm
A user exerts short-video influence on the crowd through posted videos. It is generally believed that more fans imply greater influence [23]. However, it is neither objective nor comprehensive to judge the influence of users only by the number of fans. Factors affecting influence are complex, including not only the number of fans, works, and likes, among other objective indicators, but also subjective indicators such as fan activity. To measure the influence of users, one must consider various factors so as to objectively reflect the dynamic [18] of users.
To improve upon the PageRank algorithm, we propose SVUIR, which considers two factors.

Factors of Direct Influence of Short-video Users
Due to the differences of users, it is not reasonable for PageRank to give them all the same initial PR value. Four factors can be used to evaluate the direct influence of users-number of fans, number of likes, number of works, and quality of works-so as to assign users different initial PR values. We can preliminarily screen out silent powder and zombie powder without any activity on an account, and exclude them when constructing the topology.

Number of Fans
The number of fans is a basic indicator of user influence. Generally speaking, the more fans a user has, the more people can see the user's works. In the topology, the more links a node has, the more important it is.

Total Likes
The total number of likes reflects a user's degree of recognition. The more total likes a user's works have, the more popular the user is, and the greater the influence.

Number of User's Works
Short video is the fast food of entertainment, and the popularity of a video drops quickly. Publishing works is the most important way for users to maintain their influence. The short-video platform will provide a user pool for each video, and each video will have a guaranteed number of users to watch. The more works users have, the more influential they are.

User's Work Quality
If a user's works are well reflected upon after the initial user pool test, the platform will push these highquality works to more users. The quality of the work determines how many people can see it, and this is reflected in the number of views, comments, and forwards. Larger numbers mean higher quality.

Factors of Indirect Influence
Users cannot equally like all the short-video users they come into contact with. They are often interested in one or several users and are willing to invest in the promotion of their works. This preference is manifested in the interaction between users, which can increase their indirect influence, as measured by the following factors.

Focus on Behavior
A user can put favorite accounts on an interest list, and the system will automatically push new works from those accounts to the user. In the network topology, it forms a pair of chain-in and chain-out relations, and the influence of users can be spread through the network.

Like, Comment, and Forward Behavior
A webpage distributes its influence equally to pages it links out in the PageRank algorithm. With shortvideo users, it divides their influence equally among the users concerned. But in real life, users have preferred accounts. The three interactive behaviors of likes, comments, and forwards show a user's degree of liking for a user. Forwarding is most important, followed by comments, and then likes. Through these three behaviors, the closer the relationship between fans and users, the higher the degree of closeness, and the greater the proportion of fans' contributions allocated to a user. By quantifying the closeness between users, the transfer of PR value in unreasonable proportions is solved.
In conclusion, SVUIR calculates the initial PR value of users through their own attributes and solves the problem of inaccurate initial PR value determination. Through the interaction between users, it obtains a user's interest intention, determines the user's indirect influence, optimizes the allocation of PR value when transmitting, and constructs a user's influence communication network, which effectively avoids the interference of zombie powder and silent powder on user influence calculation.

Derivation
The total influence of a user consists of direct and indirect influence. Direct influence is based on behavior, while indirect influence is based on interaction with fans. The total influence of user i is where SVUIR directðU i Þ and SVUIR indirectðU i Þ are, respectively, the direct and indirect influence of user i.
The direct influence of users can be divided into two parts. One is radiated to the next level through the "follow" relationship, which is related to the number of users, fans, and works. The other is pushed by the platform through the video, which affects more users. This influence is reflected by the number of likes, comments, and forwards of a video. The direct influence is calculated as where SðU i Þ is the direct influence of users, and M ðU i Þ is the direct influence of their videos.
The direct influence of users is reflected in the total number of works, likes, and fans, where W U i , Z U i , and B U i are the total number of works, likes, and fans, respectively, of user i, and a 1 , b 1 , and c 1 are their respective weights.
The direct influence of video lies in the number of people watching and the influence of secondary transmission, which is mainly reflected in the number of likes, comments, and forwards. It is calculated as where T U i is the collection of works published by user i within the statistical period, which is set to 30 days based on experience; m j is any video work published by user i within the statistical period; L m j is the number of likes obtained by work m j ; C m j is the number of comments on work m j ; R m j is the number of reposts of work m j ; and a 2 , b 2 , and c 2 are respective weights.
The indirect influence of users is distributed by all fans to their favorite users through interactive behaviors, where d is a damping coefficient, F U i is the set of fans of user i, j is any fan of user i, and affect(U i ,U j ) is the influence ratio assigned to user i by fan j based on their interaction, where interactðU i ; U j Þ is the propagation ability of user i to fan i, force U j consists of all admirers of fan j, and p is any admirer of fan j.
The propagation ability of fans to their admirers depends on their degree of preference for the user, which is mainly reflected in the number of likes, comments, and forwards of videos. This is calculated as where T likeðU i ;U j Þ is the number of likes that fan j gives to user i, T likeðU j Þ is the total number of likes that fan j gives to admirers, T commentðU i ;U j Þ is the number of comments that fan j gives to user i, T commentðU j Þ is the total number of comments that fan j gives to admirers, T forwardðU i ;U j Þ is the number of forwards that fan j gives to user i, and T forwardðU i ;U j Þ is the total number of forwards that fan j gives to admirers.
By substituting Eq. (6) into Eq. (2), we can obtain where SVUIR directðU i Þ þ ð1 À dÞ is a constant. The influence of short-video users can be obtained after multiple iterations to achieve Markov convergence.

Parameter Determination
Weight parameters appear in Eqs. (4), (5), and (8) in the mathematical model of the SVUIR algorithm. Existing data and experiments cannot provide reference values for weights. Hence, we adopt the entropy weight method to determine the parameters.
The entropy weight method is commonly used to determine weights in comprehensive evaluation [24]. Entropy represents the degree of disorder of information. An index with low entropy provides more information, has a greater role in comprehensive evaluation, and has a higher weight.
For any k variables {X 1 , X 2 , X 3 ,… X k }, the data in X i , are {X i1 , X i2 , … X in }. The entropy weight method has three steps in the determination of the weights of k variables.

Data Standardization
The first step is to calculate the standardized data Y ij for X ij : Y ij ¼ X ij À minðX i Þ maxðX i Þ À minðX i Þ .

Entropy of Information Calculation
The second step is to calculate the comentropy E j of variable j, E j ¼ À ln ðnÞ À1 P n i¼1 p ij lnðp ij Þ, where

Weight Value Determination
The third step is to calculate the weight of variable i, w i ¼ 1 À E i k À P E i .

Data Acquisition
The short-video platform Douyin was selected for data acquisition and analysis to evaluate the effect of the SVUIR algorithm. A mobile phone and computer were placed on the same LAN, and the mobile phone's proxy IP was changed to the computer's IP. Scripts were written to simulate mobile phone users receiving videos recommended by the Douyin platform. The mobile data traffic was captured and duplicate video authors were removed to obtain 573 Douyin users. A crawler was written to collect basic information and behavior information of users from time 0:00 on November 15, 2019, to 0:00 on December 15, 2019. The crawled data were divided into databases [25] of user information, video information, and interactive behavior. The collected basic fields of user, video, and interactive behavior information are shown in Tabs. 1-3, respectively. There were 573 pieces of user information, 16,456 pieces of video information, and 1,054,695 pieces of interactive behavior information.

Data Processing
To ensure the objectivity and authenticity of the experimental results, the data were processed, and all codes were converted to UTF-8 format to facilitate the import and export of the database [26]. Fields unrelated to the algorithm were deleted. Users were simply filtered to remove obvious silent and zombie users. The processed results were imported into the database.
The top 100 users with high attention were selected, and the network topology between users was drawn according to the relationship between followers as shown in Fig. 1. The influence value of short-video users was calculated and ranked by the SVUIR algorithm, and the influence ranking results were also calculated for the commonly used Followers Rank, Average Forward, K Coverage, and Expert Survey and Evaluation algorithms. To ensure a performance comparison of algorithm results under the same conditions, algorithms used the same data source and ran in the same computer environment.

Comparative Analysis of Results
Influence results of short-video platform users were compared as obtained by the SVUIR, Followers Rank, Average Forwarding [27], K Coverage, and Expert Survey and Evaluation [28] algorithms, and their pros and cons were judged. MATLAB was used to show the ranking of the top 10 users according to each algorithm. Fig. 2 shows the situation [29].

Comparison of SVUIR and Followers Rank Algorithms
Tab. 4 shows the ranking comparison of the top 10 users calculated by the SVUIR and Followers Rank algorithms. From Tab. 4 and Fig. 2, we can see that while there was a small difference in the rankings obtained by the two algorithms, there were users with big differences in rankings. Live in rizhao was Table 3: Interactive behavior information form Table name Description type categories of actions (likes or comments) movie id video ID of action performed start id ID of implementer end id ID of person the behavior acted on time time to perform the action Figure 1: Short-video platform user network topology ranked sixth by SVUIR and 34th by Followers Rank. This user has fewer followers, but each video receives many comments and forwards. Users like Zhu Xiaohan and Gui Ge were ranked highly by Followers Rank. Although they have a large number of fans, their works are few, and their response degree is low. The analysis shows that Followers Rank focuses on the number of fans without considering their quality. This ignores the particularity of fans, which leads to the result that Followers Rank is less convincing than SVUIR.

Comparison of SVUIR and Average Forwarding Algorithms
Tab. 5 shows the ranking comparison of the top 10 users calculated by the SVUIR and Average Forwarding algorithms. It can be seen from Tab. 5 and Fig. 2 that the results obtained by the two algorithms are quite different. Among the top 10 users from the SVUIR algorithm, only one user of CCTV news maintains top 10 influence in the Average Forwarding algorithm, and the rest are individual users. Although these users have high forwarding numbers, there are many zombie and silent fans, and  their forwarding cannot prove effective for users' influence. People's Daily online was ranked 98th by the Average Forwarding algorithm, but it is still an influential government official account. This shows that the influence of users is not necessarily positively correlated with the number of video forwards.

Comparison of SVUIR and K Coverage Algorithms
Tab. 6 compares the top 10 user rankings calculated by the SVUIR and K Coverage algorithms. From Tab. 6 and Fig. 2, we can find that although some users' influence rankings are relatively close, there are some differences. We can see that the K Coverage algorithm mainly considers the hierarchical relationship among users, but not the influence of users' own factors, such as The Xinhua News Agency and other government official account numbers, which themselves have an important influence. Therefore, K Coverage lacks the persuasiveness of SVUIR.   Fig. 2 that their results are somewhat similar, but the Expert Survey and Evaluation algorithm has two defects. One is that the values of index weights depend on expert experience. Although the accuracy can be improved through discussion with experts, there is still a certain error in the actual situation. Second, the algorithm does not consider the hierarchical structure of users. As a result, People's Daily online, which has a large number of active fans, does not rank highly. Therefore, SVUIR is more reliable.
In summary, personal accounts whose rankings are high in traditional algorithm have a lower ranking in the SVUIR algorithm compared with official accounts of the government. This is because many artificial followers exist among fans of personal accounts, while official government accounts of government tend to have high-quality, active fans. The SVUIR algorithm considers not only the user's own factors but the contribution of the user's fans, which can more comprehensively and objectively show the influence of users of short-video platforms.

Conclusions
Based on the PageRank algorithm and the characteristics of short-video platform users, the SVUIR short-video user influence evaluation algorithm was proposed. The method decomposes user influence into direct and indirect influence. Direct influence reflects the role of users themselves, as expressed by the total number of users' followers likes, and forwards. Indirect influence is obtained through interaction between users and is calculated based on the number of likes, comments, and forwards. The SVUIR algorithm was compared to the Followers Rank, Average Forwarding, K Coverage, and Expert Survey and Evaluation algorithms. It was found that the SVUIR algorithm can more truly and objectively reflect the influence of short-video platform users.