|Computer Systems Science & Engineering
Evaluating Security of Big Data Through Fuzzy Based Decision-Making Technique
1Department of Computer Science, College of Computers and Information Technology, Taif University, P. O. Box 11099, Taif 21944, Saudi Arabia
2Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, 21589, Saudi Arabia
3Department of Computer Applications, Mangalmay Institute of Engineering & Technology, Greater Noida, 201310, Uttar Pradesh, India
4Department of Information Technology, Babasaheb Bhimrao Ambedkar University, Lucknow, 226025, Uttar Pradesh, India
*Corresponding Author: Raees Ahmad Khan. Email: email@example.com
Received: 05 December 2021; Accepted: 17 January 2022
Abstract: In recent years, it has been observed that the disclosure of information increases the risk of terrorism. Without restricting the accessibility of information, providing security is difficult. So, there is a demand for time to fill the gap between security and accessibility of information. In fact, security tools should be usable for improving the security as well as the accessibility of information. Though security and accessibility are not directly influenced, some of their factors are indirectly influenced by each other. Attributes play an important role in bridging the gap between security and accessibility. In this paper, we identify the key attributes of accessibility and security that impact directly and indirectly on each other, such as confidentiality, integrity, availability, and severity. The significance of every attribute on the basis of obtained weight is important for its effect on security during the big data security life cycle process. To calculate the proposed work, researchers utilised the Fuzzy Analytic Hierarchy Process (Fuzzy AHP). The findings show that the Fuzzy AHP is a very accurate mechanism for determining the best security solution in a real-time healthcare context. The study also looks at the rapidly evolving security technologies in healthcare that could help improve healthcare services and the future prospects in this area.
Keywords: Information security; big data; big data security life cycle; fuzzy AHP
Nowadays, information and data are generated and handled at high velocity, creating huge amounts of data. This huge amount of data is known as “Big Data”. Information and data are tools that users use to transfer information and data with them from the moment they start to live together. Information technology’s type and nature have been transformed radically in the previous few years. This is the technical era, where information and data are created, managed, and processed at high velocity, creating a huge amount of data and information. This information is generated from various sources such as social media, hospital databases, bank transactions, etc. Thus, it’s totally up to the users to decide whether the information is worthless or valuable [1,2].
The most common definition of information security is that information cannot be disclosed to anyone. It is a combination of three main components, i.e., security, privacy, and accessibility [3,4]. Sharing information nowadays can pose jeopardy to any organisation and any user’s privacy. Big data is a collection of a large amount of personal information that is easily accessible to be collected and analyzed. Organizations are rich sources of users’ sensitive and personal information and epitomise supplementary opportunities for financially motivated cyber attackers [5,6]. Hacking and IT incidents, unauthorised access, theft and improper disposal were the main causes of data breaches.
India and other countries are facing cyber-related issues. The biggest data breaches occurred in India in 2021. Dominos India’s 18 crore user data was compromised, Mobikwik’s 10 crore, Facebook’s 60 lakh, Air India’s 45 lakh, Upstox’s 25 lakh, and Bigbasket’s 20 million customer data was exposed on the dark web . From time to time, hackers appear in the cyber world and demand a ransom for data leaks. Percentage growth in 2020 data breaches compared to 2019 in different sectors is discussed and depicted in Fig. 1. Data breaches in the food and beverage industry increased by 1300% in 2020 compared to 2019 . This was the highest growth compared to other industries in one year. Data security from hacking and other unauthorised access is more important.
Data leaks enable data breaches, the cause of both events are intertwined hence the list below cloud pinpoint the sources of either data breaches or data leaks. When monitoring attack surfaces, information security programmes should choose a data leak detection perspective to increase the effectiveness of data breach prevention efforts. This will inevitably reveal and fix the security flaws that are at the root of both cyber threats. Managing vulnerabilities solely for the sake of preventing data breaches narrows the threat detection field, increasing the risk of crucial data leaks. In 2020, the following incidents will be some of the most common source of data leaks . Misconfigured software; software settings that are incorrectly configured could disclose valuable client information. If the leaky software becomes widely used, millions of people could be vulnerable to cyberattacks. Social Engineering; Attackers seldom instigate data dumps, but when they do, it is usually as a results of social engineering tactics.
The use of psychological manipulation in order to get sensitive credentials from victims is known as social engineering [10,11]. Phishing is the most prevalent form of social engineering attack, which can be carried out verbally or online. Recycled passwords; Consumers tend to use the same password for all of their logins; a single leaked password often leads to the compromise of many digital solutions. Because stolen client data is commonly traded on dark web forums, this poor security approach results in a serious data loss. Physical theft; when company devices get into the wrong hands, sensitive information on them can be used to said security breaches or identity theft, resulting in data breaches. Software vulnerabilities; Software flaws, such as zero-day exploits, provide easy access to sensitive information. This skips the first stage of cyberattack lifecycle, propelling attackers straight to the privilege escalation stage of the attack lifecycle, which is the only stage left before a data breach.
Use of default passwords; even attackers have access to many of the factory-standard login credentials that come with new gadgets. As a result, factory-standard credentials that haven’t changed are considered data leaks [12,13]. Information security, privacy, and accessibility provide data security across the organisation and are crucial in the complete operation of an organization. Together, they involve the users and techniques needed to reduce unauthorised access [14,15]. Therefore, to accomplish this, various scientists are attempting to enhance the security by calculating it via various approaches. There is a lot of work available in this area, but in the literature, evaluating the attributes of security and accessibility characteristics by their applicability in actual situations didn’t meet. Moreover, the user is the sole individual who is responsible for his or her information security and accessibility.
Hence, providing security with accessibility to information is a crucial task, but it’s the demand for time in the big data environment that’s the demand. These paper goals are to secure the information with accessibility through MCDM methods with the combination of fuzzy [1,6]. In general, the fuzzy approach enables users to analyse an IoT e-Health support system in a smart home and compute risk analysis to determine doubts in a real-world environment using soft computing. This study helps to develop and design a way that utilises fuzzy logic to analyse the hierarchy of security-accessibility attributes. For this evaluation, a hierarchy that describes the real attributes of security and accessibility was created.
Thus, in the next section, the authors addressed hierarchy attributes, and Fuzzy-AHP defined the demand for security in the big data world. The results obtained by this study may help users preserve their privacy during the big data security life cycle. The paper is organised as follows: The literature survey is part of the 2 section of this paper. In the next section, methodology will be discussed. Section 4 provides an overview of the obtained results and, finally, the conclusion in Section 5.
2 Literature Review
Almulihi et al., have discussed about big data breaches in healthcare perspective and evaluate the importance of various databases with the help of fuzzy logic . Hence, the main aim of organisations should be to prevent unauthorised access and provide security with accessibility through proposed security-saving data mining. Data mining, security saving techniques, and cryptography approaches might be found in the blend study [16,17]. Multiple approaches, like randomization for grouping, utilising privacy-preservation techniques, life span of security have been studied [18,19]. Decision making-based methodologies offer greater security at the expense of the high computation and compatibility required in these cases. A detailed explanation of the relationship between various fields . Further, this research work presented an extensive and similar analysis of mystery sharing mechanisms for Secure Multiparty Computation (SMC) based systems, as well as their effectiveness.
The authors proposed the preservation of privacy assessment in data mining [21,22]. The strategies of secure multiparty computation (SMC), homomorphic encryption (HE), and comparison have been useful. In the work of Sahu et al., current needs and importance of industry related to reliability have been discussed, which includes the various methods was recommended . They included AHP in that research work with three parameters . In a study, the approach was modified by Alzahrani . In this method, the ANP approach was used to identify and prioritise the modes of failure and apply the fuzzy TOPSIS technique to get the risk factors’ weight in the operations in a study.
Kumar et al., have underscored the details that in IS security, human factors’ importance has been identified by both researchers and organisations, and they conducted a study survey on information security for user awareness [26,27]. However, prior research on user security behaviour has only focused on computer theft . Kumar et al., have presented a model, which shows the judgments of respondents by defining the suitable functions of membership in a questionnaire with a 5-point Likert scale .
3 Proposed Methodology
3.1 Evaluation of Security Privacy Attributes
The authors simultaneously consider the risks and benefits of sharing personal information or reference information. Everything has two sides, so technology offers easiness via social media and mobile phones on one hand, but on the other hand, it has offered many issues related to privacy, security, and data availability. Between all the problems discussed, such as users’ privacy and security, are of paramount importance, as users are disclosing their personal information on social media just for their own enjoyment . Hence, when users expose, or we can say, provide the accessibility of their personal information on social media, then they simultaneously consider privacy risk and benefits. Security and accessibility attributes discussed here:
Information security entails more than just protecting data from unauthorised access. The practise of preventing unauthorised access use, disclosure, disruption, alteration, inspection, recording, or destruction of information is known as information security. There are two types of information: physical and electronic. Information can be anything from your personal information to your social media profile, cell phone, biometric and so on. As a result, information security encompasses a wide range of academic topics, including cryptography, mobile computing, cyber forensics, and online social media, among others.
Confidentiality: The efforts of an organization to keep their data private or secret are referred to as confidentiality. In practise, it means restricting data access to prevent illegal disclosure. Typically, this entails ensuring that only authorised individuals have access to specified assets, and that unauthorised individuals are actively prevented from gaining access.
Integrity: Integrity refers to the property of something being full or complete in everyday usage. Integrity refers to ensuring that data has not been tampered with and so can be trusted. It is accurate, authentic and trustworthy.
Availability: If system programmes, and data are not accessible when authorised users need them, they are of limited use to a business and its customers. Availability indicates that data, networks, system, and application are operational it ensures that authorised users get timely and consistent access to resources when they required.
Severity: The severity of a flaw is defined as the degree to which it can affect the data. The severity of a defect is a metric that indicates how serious it is and how much it affects the data or software’s functionality.
When we talk about data accessibility, we are talking about reducing restrictions to properly utilising the data in today’s databases. Great software solutions that offer expanded data access can enable anyone in any function to use their data as a single source of truth to draw important insights and make educated decision in their work. Some of accessibility techniques discussed here:
Privacy Concern: Privacy of data, also known as information privacy, is a subset of data security that focuses on the proper handling of data, including consent, notification and regulatory requirements. Practical data privacy problems frequently focus around: whether or not data is shared with third parties and how data is shared with third parties.
Self-disclosure: Self-disclosure is a type of communication in which individual shares personal information with other. Thoughts, sentiments, aspirations, objectives, failures, achievements, fears and hopes, as well as one’s likes, dislikes and favourites can all be included in the information.
Benefits: Accessibility benefits all members of society, including those with impairments, it is safe to state. Improving accessibility improves one’s quality of life by allowing them to be more independent and socially integrated. It also leads to greater health and can save money in a variety of ways.
Privacy-Risk: The potential loss of control over personal information is referred to as a privacy risk. Any information about an character maintained by means of a corporation, such as (1) and information that may be utilized to differentiate or trace an identity, such as name, number, BOB, mother’s name, biometric records and (2) every other information this is related to an person inclusive of clinical, educational, financial, and employment information, according to the privacy Act.
A collection of reference information can disclose the patterns of the user’s actual life. The context information that has been disclosed on the internet can become a rich source for a hacker. That means users have been provided access to others [31,32]. Therefore, the researchers believe that this context information is a personal matter, and when disclosing it, users have to take privacy risks and benefits into consideration. There is a demand for time to evaluate security-accessibility for safeguarding the security of big data for ease of usage and satisfaction. The evaluation of the results of security accessibility attributes should be analysed in-depth and can be used to improve the utility of security services. Priority analysis is performed by the Multi-Criteria Decision Making (MCDM) method [33,34].
With expert input, this work donates security prioritisation and privacy attributes via Fuzzy AHP. It detects the ranks and weight of security-privacy attributes. Fig. 2 demonstrates security components, i.e., confidentiality, integrity, accessibility, and severity, and privacy attributes such as privacy concern, self-disclosure, benefits, and privacy risk, which are more important than affecting the user’s security. Security and privacy can be improved by focusing on their attributes together.
3.2 Fuzzy AHP
A fuzzy set can be categorised via membership function. Fuzzy numbers can most commonly be used in two ways: the first is triangular numbers and the second is trapezoidal fuzzy numbers. The researchers considered fuzzy numbers to be triangular in this paper. Fig. 3 represents a triangular fuzzy number that is expressed as (f1, f2, f3) [35–37].
Several academics have started a project that provides precise numbers. Fuzzy AHP is valuable because of its weights, whereas AHP is worthy of examining a decision in a group [34–39]. AHP is an essential apparatus that is often adopted by decision-makers. To handle the ambiguity and uncertainties of social decisions, the researchers came up with AHP’s updated version, which is called Fuzzy AHP. The Fuzzy AHP, which includes a fuzzy theory with the AHP approach . Hence, the priorities of security-accessibility attributes are essential to calculate the essential attributes of these eight factors. Also, the present of every attribute in security-accessibility is intended. The prioritisation of security-accessibility attributes to increase the security of information accessibility is discussed here. Researchers have proved that for a small-scale MCDM problem, AHP is the best arrangement approach [15–17].
The main aim of the paper is to find out the priority of security-accessibility factors. A questionnaire has been planned for this purpose. Thus, to answer the questionnaire, it is necessary to have a number of users who consider the information accessibility and its security. To evaluate the significance of security-accessibility factors, Fuzzy AHP is used because it can control the fuzzy decision-related inputs given by participants . For the better valuation of security accessibility in the form of rankings and weights, then convert qualitative results into quantitative results [15,16]. Additionally, the pair-wise comparison matrix is arranged with the support of a questionnaire for the Fuzzy AHP approach. For calculating the weight of security-accessibility attributes, the applicant’s inputs are converted into numeric values. The numeric values have been converted into Triangular Fuzzy Numbers (TFN) with the help of Eqs. (1) and (2) [17,18] and indicated as (Lo, Mo, Up), where Lo is the lowest possible, Mo is the most probable, and Up is the highest possible action.
With the help of Eqs. (1)–(4) transfiguring the linguistic terms into Triangular Fuzzy Number (TFNs).
where Lo ≤ Mo ≤ Up
By the above situation, assumed two TFNs T1 and T2, T1 = (Lo1, Mo1, Up1) and T2 = (Lo2, Mo2, Up2). Further, Eqs. (5)–(7) help to aggregate TFN values.
Step 2: Pair-wise decision matrix designed after the receiving the expert’s response. The matrix is calculated by Eq. (8).
The average of the preferences of each expert is assessment by the Eq. (9).
Stage 3: In this stage authors update the pair-wise comparison matrixes for all factors in Fig. 3 on the behalf of results obtained from Eq. (10) with the help of Eq. (11).
We obtained the geometrical mean and fuzzy weights of all factors by giving below Eq. (11).
Stage 4: We assessment the fuzzy weight of all factors by the Eq. (12).
After that, average weights are calculated and normalized weight of criteria by the Eqs. (13), (14).
Stage 5: In this stage, we have done the de-fuzzification of the values to obtain again the crisp values. The BNP (Best Non-fuzzy Performance) values of the fuzzy weights of every measurement by the Eq. (15).
The values between two parameters have comparative significance from step 1 to step 6, where i and j represent the pair of norms determined by participants. For specific comparison, the value of aij is evaluated based on the geometric mean. The geometric mean can correctly collect and represent the values  and signify the lower and upper scores correspondingly for the comparative position among the two criteria. A pair-wise comparison matrix of fuzzy is evaluated as an n1 n2 matrix, after calculating the values of TFN for every pair-wise evaluation. The size of the comparison matrix is 4 4, with a size of group limit to attain an adequate reliability level.
In this evaluation, participants include online users such as academicians and students with both security and accessibility experience. These candidates were selected to confirm the reliability of the AHP test. After the qualitative calculation, the pair-wise membership of TFN and the decision matrix have been created after the function is performed. The matrix was arranged by the authors after evaluating the responses . In the third step, the TFN function and pair-wise comparison matrix are evaluated to generate a fuzzy decision matrix which has been established. Additionally, after creating the comparison matrix, to generate measurable values that depend on the TFN calculated values, de-fuzzification is performed. In this work, the de-fuzzification approach has been adopted from [15–19] as formulated in Eq. (15).
4 Result Evaluation
The researchers classify the security attributes (confidentiality, integrity, availability, and severity) as well as accessibility (privacy concerns, self-disclosure benefits, and privacy risk) attributes. Accessibility is a vital attribute for providing security at the time of disclosure of information. As a result, people and originators must understand the impact of both attributes in order to obtain secure information. So the researchers are using the Fuzzy Analytic Hierarchy Process (Fuzzy-AHP) method for the assessment in this paper. For the evaluation, a hierarchy needs to be established that describes the affected attributes. With the help of the hierarchy and Fuzzy AHP method, the security-accessibility of information has been calculated . Outcomes obtained from the evaluation can support the security of information at the time of the data generation/creation phase in the big data life cycle .
Tab. 1 represents the security pair-wise comparison matrix of Fuzzy. The values of CR (Consistency Ratio) are less than 0.1. Tab. 2 shows the local weight of the de-fuzzified level 1 and pair-wise comparison matrix characteristics. Through the results, it is clear security is more important than accessibility to balance the accessibility of information. Tab. 3 represents the level 2 attributes for security, independent weight, and the de-fuzzified pair-wise comparison matrix. Further, CR values and the security-accessibility attributes’ independent weight are represented in Tabs. 4–6, respectively. Fig. 4 represents the graphical representation of local weights at level 1. Fig. 5 also represents the weight levels 2 and 3. Tab. 7 represents the independent weights and attributes of level 2’s de-fuzzified pair-wise comparison matrix for accessibility. Tab. 8 represents the overall ranking and dependent weights of the hierarchy.
Insecurity accessibility, security has 0.725 and accessibility has 0.275, which means security is more essential than accessibility and needs stability between accessibility and security. After implementing the Fuzzy AHP approach, another method has been used, known as the AHP method, to prove the complete accuracy of the result and estimation. To prove the results’ accuracy, the security and accessibility effects of various types were evaluated by AHP. Data collection is a process in classical AHP, but there is only one difference in classical AHP is that no de-fuzzification is needed. Therefore, for classical AHP, the data comes in the form of crisp [16,17]. Fig. 5 depicts the graphical representation of the results.
A decision hierarchy is developed in the traditional AHP. Next, a pairwise matrix of participants’ opinions has been established, but in this technique, numeric values are used directly rather than TFN values. Another phase is to combine the comparison matrixes pairwise of participants’ opinions while checking CR. Tab. 6 shows the level 1’s independent weights and the combined comparison matrixes pairwise As a result, it is clear that security is more important than accessibility in improving overall security.
Further, Tab. 7 depicts the combined comparison matrix of pairwise and independent weights of the security attributes of level 2. Tab. 8 shows the level 2’s local weights and the combined comparison matrix pair-wise of security factors. The dependent weights and the overall ranking of the hierarchy are shown in Tab. 9. The proposed algorithm is also compared with the existing approach based on security and accessibility attributes such as confidentiality, integrity, availability, severity, privacy concerns, self-disclosure benefits, and privacy risk (Tab. 10). In Tab. 11, the comparison results of security and accessibility evaluation via both the fuzzy AHP and traditional methods of AHP have been represented. It can be seen that while using the Fuzzy-AHP, results are better and more efficient in comparison to the AHP. The difference between the results from both the methods, Fuzzy AHP and AHP, is shown in Fig. 5 and Tab. 11.
Nowadays, security is one of the essential quality attributes for all users. The purpose of this paper is to evaluate the security accessibility of information at the data creation phase of the big data security life cycle. The model proposed in this paper will help to calculate the security of information. In this contribution, the author has examined eight security and accessibility factors throughout the big data life cycle. In this research, the attributes of security and accessibility are recognized. Users want to access information as well as the security of their personal data. But providing access to security is a tedious task. In this paper, the calculation of security accessibility is a multi-purpose measure of decision difficulty. Fuzzy AHP has been used to calculate the security accessibility. Most essential characteristics have also been calculated. In the big data environment, if we want to provide a secure environment to any user, then we have to restrict access to information. For the assurance of security accessibility, at the data creation phase, developers need to restrict access to information and focus on the information’s security.
Acknowledgement: We deeply acknowledge Taif University for supporting this study through Taif University Researchers Supporting Project Number (TURSP-2020/150), Taif University, Taif, Saudi Arabia.
Funding Statement: Funding for this study was received from the Taif University, Taif, Saudi Arabia under the Grant No. TURSP-2020/150.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.
|This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.