Preserving privacy of the user is a very critical requirement to be met with all the international laws like GDPR, California privacy protection act and many other bills in place. On the other hand, Online Social Networks (OSN) has a wide spread recognition among the users, as a means of virtual communication. OSN may also acts as an identity provider for both internal and external applications. While it provides a simplified identification and authentication function to users across multiple applications, it also opens the users to a new spectrum of privacy threats. The privacy breaches costs to the users as well as to the OSN. Despite paying millions of dollars as fine every year, the OSN has not done any significant changes, as data is the fuel and what it loses as fine is far less compared to the money OSN makes out of the shared data. In this work, we have discussed a wide range of possible privacy threats and solutions prevailing in OSN-Third Party Application (TPA) data sharing scenario. Our solution models the behavior of the user, as well as TPA and pinpoints the avenues of over sharing to the users, thereby limiting the privacy loss of the user.
Third Party Applications (TPA) are developed and maintained by vendors external to Online Social Networks (OSN). Facebook is a popular social media with a worldwide user base of 2.85 billion monthly active users. Facebook has integrated a large number of third-party applications to provide multitude of services to the users. Apart from this, the OSN also serves as an identity provider for a large number of third-party services. In both the cases, when accessing the service for the first time, users are prompted to share their attributes (optional and required attributes). On sharing the required attributes, the users will be allowed to use the service, whereas on failing to share, the users are denied to use the service. The users initially share their attributes to OSN, based on the trust they have on OSN. The same trust cannot be transferred to a TPA deployed in an unknown setup. However, the users are enticed to share their attributes for using the TPA’s services, and they end up opening channels leading to privacy infringement. The TPAs hosted in external environments, may even be a malicious application; it may even share the user’s attributes with other fourth party vendors like insurance agents, data aggregators, advertising agencies, background verifiers and so forth. The list goes on, increasing the threat spectrum (
Threats/attributes shared & existing solution | Public Profile | Friends List | About me | actions | birthdate | education | events | Games activity | Hometown | likes | location | Photos, Post, Video | Relationship | Religion,Politics | Tagged places | Work History | User Status | Read Permission | Publish Permission | Manage Permission | Existing reference | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
User impersonation | * | * | * | 4–8 | ||||||||||||||||||
Inferencing sensitive attribute | * | * | * | * | * | 10–11 | ||||||||||||||||
De-anonymization | * | * | * | * | 10–11 | |||||||||||||||||
Identity disclosure | * | * | * | * | 4–7, 9–11 | |||||||||||||||||
Malicious applications | * | * | * | * | * | * | * | * | * | * | 12–14, 17 | |||||||||||
Reputation damage | * | * | * | * | * | * | 4–7, 17 | |||||||||||||||
Stalking | * | * | * | 11 |
Hence, the remedial action is to make the user aware of what they share and with whom they share. In our work, we model what-they-share based on the user’s previous attributes shared with TPAs. GMM allows mixed membership of observations to clusters. Each user observation can belong to each cluster with a different degree of membership. The degree is based on the probability of the observation being generated from each cluster’s multivariate normal distribution, with a mean (
The aspect of whom-they share has been modeled using a fuzzy inference system. The input to the system comprises of the following: rating given by the users, the polarity of the comments received (positive, negative or neutral), the count of users who have provided rating, and TPA’s attribute permission access pattern. The fuzzy based reliability inference system outputs the TPA’s reliability level.
When the user needs to install a new application, a fuzzy recommender prescribes whether the application is suitable for the user, based on the user and TPA models. An enhanced consent request page is displayed to the user with detailed information about the TPA, which conveys the privacy implication as compared to the regular abstract consent page, which hides the information transacted.
User sharing their attributes is not considered as a privacy breach, whereas sharing without user’s consent is a privacy breach. In one of our earlier work [
Moreover, the data shared in the context of TPA may in turn be shared with fourth party vendors, increasing the risk of the privacy breach [
The risk faced by the user in the scenario of user-TPA sharing can be reduced by either limiting the share of sensitive attributes to TPA; or by sharing a less sensitive form of the sensitive attribute. The solutions like User to application policy Model [
The users will not be granted access to applications in case of limiting the data to TPA, in which case, techniques to share as well as to alleviate the risk are required. Solutions based on Data Generalization [
Hosting applications in a secluded environment [
With the current approach, the user has to share all the required data attributes with TPA to avail the service. In most cases, the users lack the competence to articulate their privacy requirement and decide whether to share or not at the time of installation. In the verge of accessing the TPA, the user agrees to share without awareness. Thus, large silos of personal information would be shared by the same person to many TPA’s at different instants. All these pieces of information could be correlated to come up with an inference which was not intended to be revealed by the user, leading to a privacy breach. In our system, we have sorted out this root-cause by alerting the user whenever the user’s actual level of privacy preference differs with that of current sharing.
User’s previous sharing with the TPA is captured and is used in modeling their behavior. Each user has a set of applications they had accessed earlier and their attribute sharing decision to that application. Variable TPAi_ att j indicates attribute sharing decision of jth attribute to ith TPA. A ‘1’ indicates that the attribute is shared and ‘0’ indicates that it has not been shared.
i) Choose randomly c observations as cluster centroids ii) Compute the distance between each data point in X and assign it to the cluster with minimum distance.
Compute the new centroids as the average of the data points in the cluster iii) Repeat steps ii and iii until the data assigned to the clusters remain in the same cluster.
The GMM is the weighted sum of c component Gaussian densities as in
Initialize mean, covariance and weights of the Gaussian densities. Mean and covariance are initialized based on the K means clustering output from step 1. Weights are initialized with equal weights such that the summation of weights assigned to Gaussian components equals 1 Expectation Step: Compute the posterior probability ( Maximization Step: update the parameters of Gaussian components-means (
Repeat (i) and (ii) until convergence.
The Fuzzy model is used to calculate the application’s reliability value. Since this study uses data obtained in real time, Fuzzy inference is used to deal with any ambiguity in the data. The graded membership function used in Fuzzy inference expresses the distribution of truth of the variable. The features used to infer the reliability of the application has been collected from multiple sources. The preprocessing part of the input data mentioned above is available in our earlier work on computing the reliability of the TPA [
Let Y be universe of data, with a generic element of Y denoted by y. Thus,
The membership functions for the fuzzy sets of input parameters were chosen based on the predicted reliability score and fine-tuned to minimise the Root Mean Squared Error (RMSE).
As described in The truth value flow Inference theory [
Sample rule 1: If (Rating is low) and (Raters(N) is low) and (Data-Access is high) and (Comments is negative) then (App-Reliability-Score is low) (.75)
Sample rule 2: If (Rating is med) and (Raters(N) is med) and (Data-Access is med) and (Comments is neutral) then (App-Reliability-Score is high) (0.9)
The following steps are carried out to infer the reliability level of the TPA. In Implication, the output is obtained by mapping the input values to consequent fuzzy sets to obtain the output level. The weight ( The weighted average of all the rules is calculated to arrive at the final output of the system.
where
The TPA recommender shown in
Privacy Violation Alerter displays an enhanced intimation page to the user. The intimation page consists of TPA related information like comments, rating of the TPA, number of raters, Reliability Level of the TPA and how likely the TPA follows the data access pattern of applications of that category. This measure is strong indicator when the TPA has a unusual data access pattern compared to applications belonging to same domain. The user’s privacy level is also displayed. Only when the user agrees to share, the TPA installation is carried out.
Finding the likelihood: Let
When a user needs to install a new TPA, the Share Level Intimator compares the user’s previous sharing behavior to that of the new TPA’s attribute request permission. If matched, an enhanced consent page is presented to the user. Otherwise, a violation alert page followed by enhanced consent page is presented to the user. The enhanced consent page consists information about user’s sharing pattern (privacy concerned, pragmatic, unconcerned) and TPA’s information (rating, number of raters, user comments, attribute permissions requested, reliability score of the TPA and TPA’s data access pattern).
User Sharing Trait Modeler (UTM) uses GMM to capture user’s sharing pattern,
TPA reliability score inference and TPA recommender has been designed using fuzzy inference system. Matlab simulation has been used to fine tune the membership function, and to decide the rules for fuzzy inference system. 81 rules have been used in total. The sample rule list can be found in
Rule No. | Rating | No. of raters | Data access behavior | Comments | Weight | Application reliability score |
---|---|---|---|---|---|---|
1 | High | High | Optimal | Positive | 1.0 | High |
2 | High | High | Moderate | Positive | 0.8 | High |
3 | Medium | Low | Optimal | Positive | 0.6 | High |
4 | Low | Low | High | Positive | 1.0 | Low |
5 | Low | Low | Moderate | Negative | 1.0 | Low |
6 | High | High | High | Negative | 1.0 | Low |
7 | Low | High | Optimal | Neutral | 1.0 | Moderate |
8 | Medium | Medium | Optimal | Neutral | 1.0 | Moderate |
Application name | Rating | NOR* | DAB* | User comments polarity | Reliability score |
---|---|---|---|---|---|
TED | 3 | 133657 | 2.4 | 0.4 | 0.8 |
Cut the rope 2 | 3 | 1962621 | 2.7 | 0.0 | 0.5 |
Spin the bottle | 1 | 262 | 3 | 0.6 | 0.1 |
POF free dating app | 3 | 769300 | 2.6 | 0.3 | 0.4 |
Cougar dating for older women | 1 | 99 | 2.6 | 0.8 | 0.2 |
Clash royale | 3 | 489166 | 2.8 | 0.2 | 0.5 |
The lion guard | 1 | 27 | 2.4 | −0.2 | 0.3 |
Diner dash | 2 | 275187 | 2.7 | −0.1 | 0.1 |
QuizUp | 3 | 595375 | 2.5 | 0.3 | 0.7 |
*NOR-Number of Raters | *DAB-Data Access Behavior |
The surface view graph for TPA recommendation based on TPA reliability score and Users Privacy Level is presented in
For privacy concerned users the weight for certain rules are adjusted so that the applications with low reliability level will not be recommended to them. Blue region in
Applications | Reliability score | User privacy level = low | User privacy level = medium | User privacy level = high |
---|---|---|---|---|
Robinson | 0.5 | Highly recommended | Recommended | Recommended with risk |
Spin the bottle | 0.1 | Recommended with risk | Not recommended | Not recommended |
Farm heroes saga | 0.4 | Recommended | Recommended with Risk | Recommended with Risk |
Subway surfers | 0.8 | Highly recommended | Highly recommended | Highly recommended |
Diner dash | 0.3 | Recommended with risk | Not recommended | Not recommended |
It is evident from the literature discussed earlier that the privacy infringement is more prone to happen in user-TPA data sharing context. The proposed solution attempts to solve the root cause of the problem, by explicitly alerting the user on what is being shared, and to whom it is being shared and allows the sharing only based on an explicit informed consent. The existing solutions do show a consent page containing information about the attributes requested. But, users find it difficult to comprehend the information in the consent page and compare it with their privacy preferences while making a decision to share. We have designed the solution by building a model each for both user and TPA involved in the transaction. When there is a need to install a new application, the user’s sharing pattern is compared with that of the new TPA’s posterior probability, and the users are intimated with an enhanced consent page before sharing. In the case of a violation two level alert is displayed to the user. In the first place, the privacy violation is signaled, and upon the user choosing to continue with the TPA, the enhanced consent page with additional information is rendered to the user. The user’s data is shared to the TPA, only after these two stages towards privacy preservation. In our future work, we wish to explore the use of advanced perturbing techniques assuring privacy, so as to enable the users to even utilize the services of TPA’s seeking intruding information.
We show gratitude to anonymous referees for their useful ideas.