|Intelligent Automation & Soft Computing |
Personalized Information Retrieval from Friendship Strength of Social Media Comments
1Department of Software Engineering, University of Gujrat, Gujrat, 50700, Pakistan
2Department of Computer Science, University of Gujrat, Gujrat, 50700, Pakistan
3Department of Information and Communication Engineering, Yeungnam University, Gyeongsan, 38541, Korea
4Computer Science and Artificial Intelligence Department, College of Computer Science and Engineering, University of Jeddah, Saudi Arabia
5Faculty of CS & IT, Jazan University, Jazan, 45142, Saudi Arabia
6Department of Computer Science, COMSATS University Islamabad, Pakistan
7Department of Computer Science & IT, The Islamia University of Bahawalpur, Bahawalpur, Pakistan
*Corresponding Author: Muhammad Shafiq. Email: firstname.lastname@example.org
Received: 02 December 2020; Accepted: 26 April 2021
Abstract: Social networks have become an important venue to express the feelings of their users on a large scale. People are intuitive to use social networks to express their feelings, discuss ideas, and invite folks to take suggestions. Every social media user has a circle of friends. The suggestions of these friends are considered important contributions. Users pay more attention to suggestions provided by their friends or close friends. However, as the content on the Internet increases day by day, user satisfaction decreases at the same rate due to unsatisfactory search results. In this regard, different recommender systems have been developed that recommend friends to add topics and many other things according to the seeker’s interests. The existing system provides a solution for personalized retrieval, but its accuracy is still a problem. In this work, we have proposed a personalized query recommendation system that utilizes Friendship Strength (FS) to recommend queries. For FS calculation, we have used the Facebook dataset comprising of more than 22k records taken from four different accounts. We have developed a ranking algorithm that provides ranking based on FS. Compared with existing systems, the proposed system can provide encouraging results. Key research groups and organizations can use this system for personalized information retrieval.
Keywords: Friendship strength; information retrieval; query recommendation
Most ordinary people use social media to express their views, opinions and share their feelings. Online social networks have become an important source of public opinion. A web-based social network is a place where a large amount of data is distributed by ordinary people of different ages, different groups, different countries and different areas of life. It enables them to connect with each other, discuss and share ideas, information, pictures, sounds and videos. They also express their emotions, feel and make friends. People firmly believe in news, assessments and information about all aspects of life that are shared through social networks. It helps them keep in touch with their peers or other people related to their studies, business, entertainment and other activities.
The level of friendship defines the level of trust in social media communications. This is how we evaluate friendship strength (FS) based on the Facebook data set. Facebook interactions (such as many photo tags and posts on the wall) are used to calculate FS . These two attributes are still very effective for forecasting. Traditional technology uses user profile data to calculate the strength of the relationship between various users . The user’s profile data provides detailed information about his hobbies, religious views, companions, work experience, etc. On the other hand, interactive activities such as commenting, sending messages, and tagging refer to the intimacy of friends.
In recent years, according to several studies, various types of advice-based work have been carried out based on the level of friendship. Various researchers are studying friend suggestions similar to Facebook mechanism . Facebook recommends friends mainly based on mutual friends. User profiles are established based on historical records of performed activities, such as items explored and queries . Then, provide different documents or queries as suggestions according to the configuration file.
Traditional information retrieval (IR) systems mainly return results based on keyword matching. If different users submit the same query, the system returns the same results to all users. The difference between the Personalized Information Retrieval (PIR) system and the traditional system is that it not only provides results related to the query, but also provides results related to the user who submitted the query. In order to provide better results, the PIR system will keep the user’s previous search history and provide result retrieval accordingly.
In this article, we propose a technique to perform PIR from the Facebook comments of close friends. First, the comment data is based on the FS ranking, and the FS is calculated based on the number of likes, comments and tags. FS is also used to rank the retrieved annotations based on user queries. These ranking comments are displayed as a pop-up menu for suggestions/expansion of the target query. When the user types in any keyword, suggestions will appear on the basis of FS and keyword matching. To evaluate the proposed method, we collected comments from friends’ Facebook accounts. After that, the data is preprocessed and FS is calculated. To conduct experiments, a search engine has been developed in which users can enter queries. The experiment was conducted on the query set and the results were compared with the parallel system. The main contributions of this paper can be summarized as follows:
• We made a query suggestion based on the FS metric used for ranking.
• We have developed a query suggestion algorithm based on social media comments
• We have also developed a recommendation system, which has been developed to provide FS-based recommendations.
The rest of this article is organized as follows. In Section 2, a summary of relevant literature is provided. The system model is introduced in Section 3, and the experimental evaluation is carried out in Section 4. Finally, conclusions are drawn in Section 5.
2 Related Work
The literature review is divided into the following subsections.
2.1 Friendship Strength Calculation
Using social media data sets for financial statement calculations is still an effective method for different types of analysis. Previously, different attributes were used for FS calculations. A model has been established to calculate relationship strength based on user similarity and interactivity. The model was developed with the help of nodes and links. Nodes represent users, and links represent relationships between users . Similarly, reference  suggests that transaction information can be used to measure relationship strength. This is a supervised learning method.
A lot of work has been done on personal similarity. These properties are good, but not the most effective for strength calculations. User profile information and communication tools (such as emails and messages) are used to calculate relationship strength . In Xiang , latent variables have been used to calculate relationship strength. The user’s personal data and message history have been used for estimation in the latent variables. Some researchers have conducted research on “FS intensity” and the results have been ranked from closest friends to ordinary friends. Reference  proposed a model that uses social media data to show link strength. The link strength is divided into two types: strong relationship and weak relationship, which means that the model does not show the strength of the relationship, but only shows the relationship as strong or weak. Similarly, based on the proximity of nodes in social networks, a method for calculating relationship strength is proposed .
FS may also vary from friend to friend, and also depends on the situation/category. A person may have different groups of friends to work, and different groups of friends to play games or dine. FS increases through more interactions, and vice versa. In Singla , it was concluded that there is not only an association between users who use instant messaging to interact, but it also grows over time. In Pappalardo , another multidimensional importance of connection quality is recommended that abuses the presence of different associated shared associations among two people. They check the grouping on a multidimensional arrangement created upon clients in Facebook and Twitter, investigating the essential piece of strong and fragile associations, and associations with broadly perceived similarity strategies.
To show the strength of the relationship, an organized graphical model and independent learning are used. Therefore, customer intimacy, marking and correspondence are used . In addition, four estimates of relationship strength are proposed in Granovetter : joint effort, intimacy, energy, and duration of shared organization. Use FS to solve some special fascinating zones and the information between customers is integrated. Then, using the customer’s personal data and published information with the help of graphical models to evaluate the strength of the relationship . Twitter’s enthusiast following relationship was used to create an association . To evaluate the relationship strength, creators in De Choudhury  used email associations. More messages exchanged infer the closest relationship. Notwithstanding, in Liu  K-Means gathering and support vector computations are utilized to take a gander at the assessments in messages. In order to evaluate the emotions in blogs and texts, people are urged to establish a new framework that takes text documents and sentiment words as input, and generates sentiment classes as output .
The recommender system recommends items related to the user’s search. These suggestions are not only made based on matched keywords, but information is also collected from the user’s search history. A lot of work has been done on the different proposals. Some researchers are dedicated to topic suggestions, and few types of research will recommend “additional friends” based on mutual friends, the same geographic area, or the same study/work organization. In Liu , by proposing a new heuristic similarity model, the user’s own ratings and user behavior are used to calculate the similarity.
In previous studies, contour formation trends are still common. Researchers use activity or like/dislike history to create a profile of a specific user, and then provide recommendations based on the profile. This kind of work established a user electronic file using tags, and then used these files for query development . Similarly, user-generated tags are used to calculate the common interests of a group of users on the Delicious website data . In addition, a recommendation system for flashing tags has been proposed, which uses the user’s tag history and geographic information to provide tag recommendations .
In order to provide users with suggestions, clusters of related users are generated . Use the similarity measure “usefulness” to provide suggestions. Experiments were conducted using flicker, movielens and Last.fm. Content-based filtering and collaborative filtering for recommendations are combined using user-generated content and relationships . Calculate the link strength of users who use social circles and interactive information . They also increase social services by proposing a link strength model. Use inspiring factors such as interests, social networking, and reputation to provide suggestions. Use the number of pictures shared between directly connected users to calculate inspiration .
The user’s interest is calculated through the interaction between them . The system LAICOS provides a network search based on related tags and content tags to construct configuration files . FS has been used to rearrange search results . In order to illustrate the scores of users, user relationships based on location and mutual relationships in social networks are used , and user activities are used to calculate user interests. Activities are based on users’ social associations rather than documents . In addition, the shortest path in social networks is proposed to establish a centrality measure .
Recommendations recommended by experts are called impact-based recommendations. These types of advice are mainly useful in the field of education. This system is proposed by a cooperative team (i.e., a group of expert knowledge personnel) to use their knowledge to make recommendations . The ArnetMiner system is constructed by collecting data of researchers from the Internet. Using this system, related papers are recommended to users . The PREMISE system uses expert information to provide recommendations. Experts are those who influence the press . In Konstas , friendship information, tags, and play times are used to provide music recommendations through a random walk restart method.
2.3 Query Expansion
Few researchers have dedicated themselves to query suggestions. Different techniques have been used for query suggestion and query ranking. Attributes such as gender, age, and location are used to build models based on personalized rankings. This data is extracted from the configuration file of a real Microsoft account. The query suggestion is different from the query expression, because in the query suggestion it is suggested to propose a better query for the search process, while in the query expression, a new query is developed . “Query expansion” is a technique widely used for query suggestions. The basic purpose of query expansion is to improve query suggestions. Query suggestions can also be realized by reordering queries . Query suggestions and term weight responses are used to rearrange suggestions . Using query suggestion methods can enhance the performance of search engines. They divide query suggestion methods into two categories, one is based on search results, and the second is based on log files. Both categories have their own advantages and disadvantages, which make them suitable for different queries. Commonly used similarity calculation techniques for search queries are the cosine similarity method and the Jaccard similarity method . The two techniques are distinguished by comparing Jaccard and cosine methods .
Clustering has also been used in previous methods to cluster related queries. Then according to the keyword matching, the whole clustering proposal is put forward. The query log is also used to collect the searched queries. The query log not only provides searched queries, but also provides clicked links for specific queries. In Zahera , query recommendations based on the query clustering process have been proposed, which are collected from the log files of search engines. They not only cluster related queries, but also rank them based on similarity measures.
Social media data is also used to construct query suggestions to build a circle of related people based on the suggested query. The social media attributes used for similarity measures are gender, city, and the same topic of discussion. Based on these attributes, a weight is provided for each user related to the search. The Jaccard similarity algorithm is further used to provide query ranking [39,40].
Query recommendations are also very important for children in the search process. In order to prevent children from finding irrelevant search results, it is important to only ask them reasonable and relevant queries. In this case, reference  proposed a query recommendation mechanism for children who use social media tags. This method can be used to improve search suggestions. They also proved that social media can play a very important role in advice and can replace traditional log-based advice methods.
The query used for search and the results selected from the search are also very effective for generating search suggestions. Based on the user’s previous research experience, a new query recommendation method is proposed. They suggested three utilities in the model. “Level utility” defines the user’s attractiveness to a specific query, “perceived utility” calculates the user’s actions on the search results, and the posterior utility calculates the user’s satisfaction with the selected results [42,43].
Query recommendations are provided from the query logs of search engines, similar to user queries. In addition, in order to personalize query suggestions, queries of users who have similar profiles to the current user can be suggested from the query log. It uses a similarity matrix to filter personalized results . The bookmark data obtained from the social network is also used to generate query recommendations. According to the result retrieval based on the user’s query, the results are ranked using the user’s familiarity and similarity relationship . On the label data, the top k queries are ranked based on the label/keyword input query. The algorithm uses the relationship strength and relevance of tags. Therefore, it incrementally provides the top k results including the most relevant queries . In addition, query expansion is performed based on the similarity of the tags and the social similarity. Therefore, the relevant terms of the input query based on the above factors are sorted and appended to the query. It uses bookmark datasets for experimentation and comparison .
3 System Model
Fig. 1 shows the architecture of the proposed technology called “Personalized Retrieval from Social Media (PRISM)”. The flow of the architecture is as follows: Use Python scripts to extract datasets and annotations from Facebook. Then merge the two files to form a database. On the annotation file, perform preprocessing to remove irrelevant attributes. In the next step, FS will be calculated. The final database is further used in FS-based search engines. When the user types any word to be searched in the search box, the suggestion list will be displayed in a drop-down menu format. These suggestions change constantly as users type words or sentences. For the user’s query, a suggestion list containing the comments that the user’s friends have posted on his wall is retrieved.
A python script was developed to extract the dataset from Facebook. As output, a data set containing more than 22k records was generated. The two types of attributes that can be used in the structure of the data set are important. The first is personal similarity, for example, the same group likes to join the same page, or the same like/dislike. The second is interaction similarity, which uses transaction information to calculate similarity. In this work, we use interaction similarity to calculate FS. There have been many jobs on FS, and its work is based on personal similarity. The basic properties of FS calculation in this work are:
• Likes count
• Comments count
• Tags count
These attributes are very effective for FS calculations. The number of likes shows the total number of likes of a specific friend on the user’s wall. You can like on pictures, achievements, emotions or any type of post. The number of comments includes the number of comments made by a specific friend on the user’s wall. Comments can also be written on any post or status. The third attribute is the tag count, which shows the number of times the user has been tagged. It can be any post, status, picture, location, or any feature that a user is tagged by a specific friend. All these attributes are used to calculate FS for each individual friend. The specifications of the data set are given in Tab. 1.
We perform Crawling to extract data from Facebook. Therefore, the work of the data extraction process is shown in Fig. 2. When the script runs, the user is asked to enter Facebook’s unique ID/key. In the next step, the script will verify the Facebook key. If the input key is invalid, an error message will be displayed, otherwise the data extraction process will start. In the data extraction, the “friend’s ID”, “friend’s name”, “like count”, “comment count” and “tag count” attributes will be obtained, and a comma-separated value (CSV) file will be obtained as the file containing the required data Output.
3.3 Friendship Strength Calculation
The term “power of friendship” includes two parts: friendship and power. Friendship refers to the relationship between two people, and strength refers to the level of relationship between them. FS varies from friend to friend. As in real life, the level of our relationship with all our friends cannot be the same. Few of us are closer friends, and many are just formal friends. Similarly, we calculated the FS based on each friend of the user. The basic attributes calculated by FS are the number of likes, comments, and tags (photo tags, location tags, or any feature tags). Therefore, the sum of all these attributes can calculate the FS of the user and any of his friends, and the maximum degree of collaboration increases the level of the highest friendship. FS can be calculated as follows:
where refers to FS, account for likes count, means comments count and is tags count.
For example, the friend “Ali” has a total of 32 likes on the user’s wall, which means Ali likes his 32 posts, including pictures, videos, achievements or any other posts. Similarly, “Ali” posted a total of 42 comments on all posts, pictures or achievements on the user’s wall. In addition, the number of tags is between “Ali” and the user, including 22 locations. According to the three attribute values, the FS of the user with “Ali” is 96.
3.4 Comments Extraction
The process of annotation extraction is shown in Fig. 3. When the script runs, the user is asked to enter a Facebook unique ID/key. If the key is invalid, an error message will be displayed, otherwise the data acquisition process will begin. The extracted data attributes include the ID of the post, the ID of the comment, the comment, the ID of the friend, the name of the friend, and the creation time of the comment. Some less important attributes are removed during the preprocessing stage. The important attributes in the acquired attributes are the ID of the comment, the comment, the ID of the friend, and the name of the friend. These attributes are also used to provide recommendations through the FS portfolio.
3.5 Comments Preprocessing
Preprocessing is the process of removing irrelevant attributes from the data set and retaining only the necessary attributes. Do this on both data set files to create the database used in the recommendation system. The data file contains the friend’s ID, friend name, like count, comment count, and tag count, while the comment file contains post ID, comment ID, comment, friend ID, friend’s name, and creation time. In the data file, the FS attribute is added. Later, the two files (i.e., the data file and the annotation file) were merged to form the final database.
3.6 Recommender System
The search engine has been developed on top of the database that is finalized by combining comments and FS attributes. The process of the search engine is given in Algorithm 1.
In Algorithm 1, the user enters a keyword query in the search engine (line 1). In the next step, divide the input query into words (line 2). In addition, every word in the query matches every word in the database (lines 3-4). It is recommended to print according to FS. Here, “DESC” is used to sort the suggestions in descending order relative to FS. The limit is 0.9 and is used to display a list of the top 10 suggestions in the output (line 5 and beyond). The information retrieval process is also shown in Fig. 4.
The output of the input query is a set of suggestions retrieved by the search engine. These are arranged according to FS. Therefore, the suggestions at the top of the list belong to the closest friends. In Fig. 5, the suggestions retrieved for the query “Allah” are described.
4 Experimental Evaluation
We have conducted experiments to evaluate the performance of the proposed technology PRISM. For experimentation, a search engine has been developed. In order to search for relevant data, the user types a query in the search box of a search engine. Therefore, suggestions are retrieved based on the input query. In order to describe the basic work of a search engine, Fig. 6 shows the suggestions of five friends for an input query. It only considers context-based retrieval without FS. The most relevant results were retrieved from the comments of “Saqib” and “Talha”, with a correlation of 100%. So, the relevance of “Kamar”‘s comments is 75%. The suggestions received from the comments of “Mudassar” are 60% relevant, while the comments of “Ali Naqvi” are 0% relevant.
When it comes to FS, a different retrieval order will be obtained, as shown in Fig. 7. Using similar queries for context-based retrieval (Fig. 6), including FS will produce different results. Obviously, the comments of “Mudassar” occupy the first place because “Mudassar” is a close friend. Similarly, the comment of “Talha” is in the second position. Here, “Ali Naqvi” ranks third on the basis of FS, but since his keyword similarity is 0%, there are no suggestions in his comments. The “Qamar” proposal is in the 4th place, and the “Saqib” proposal is in the 5th place.
The comparison of the query results has been performed in Fig. 8. Here, the query “Noman Yousaf” is used to compare results based on FS and those without FS. It can be observed that when it comes to FS, the comment will change its position in the suggestion. One suggestion ranks first in the absence of FS, and when considering FS, it is recommended to occupy a position among the first 6 suggestions. The top 9 positions without FS are in the top 9 positions with FS.
The comparison of the query results has been performed in Fig. 8. Here, the query “Noman Yousaf” is used to compare results based on FS and results without FS. It can be seen that for FS, the comment will change its position in the recommendation. In the absence of FS, a recommendation comes first. When considering FS, it is recommended to occupy a place among the first 6 recommendations. The first 9 positions without FS are located in the first 9 positions with FS.
Parallel systems related to our proposed PRISM include query log, context merging, bookmark-based and personalized social query expansion (PSQE). We show here the comparison between the proposed PRISM and the parallel system. Fig. 9 shows the retrieval of suggestions from matching reviews without considering social similarity (or FS measure). The average result of ten queries with the same number of terms has been proven. Using the weighted Borda Fuse (WBF) algorithm, PSQE achieves the greatest accuracy (without FS measure), while PRISM achieves the second best accuracy. However, when we consider the FS measure, our system outperforms existing solutions (see Figs. 10 and 11). In contrast to context-based retrieval, social similarity-based retrieval provides personalized results. As shown in Fig. 10, PRISM showed better results compared to other parallel systems, while previously it provided 61% correlation results without using a similarity measure.
In Fig. 11, we demonstrate the effect of different numbers of terms in the input query. We consider using 0 to 10 terms to track the results. It can be observed that when the number of items is the smallest, most systems provide better results. The accuracy decreases as the term increases. With existing systems, PSQE can produce good results. In contrast, PRISM can obtain the highest accuracy with FS.
In Fig. 13, the results were produced without social measures. Compared with Fig. 12, when PSQE provides 70% accuracy and PRISM achieves 67% accuracy, the correlation of the results is reduced. It can be inferred that social measures increase the relevance of the search.
This paper proposes a new query recommendation technology. It uses FS to rank queries. A query suggestion algorithm based on social media comments has been developed. Based on this algorithm, a recommender system is constructed to provide suggestions based on FS. The Facebook dataset has been constructed and used for experiments. By using the data set, a comparative analysis with the parallel system has been performed. The proposed system PRISM can provide about 85% accuracy. The accuracy of PSQE is about 80%, second only to the comparison system. Therefore, the accuracy of PRISM has been significantly improved. In the future, the FS-based recommendations can be improved by adopting the actual search queries of the research team. In addition, the query log can be used to collect queries, and surveys can be conducted from users to find the level of satisfaction regarding recommendations.
Funding Statement: The authors received no specific funding for this study.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.|