Classification is one of the data mining processes used to predict predetermined target classes with data learning accurately. This study discusses data classification using a fuzzy soft set method to predict target classes accurately. This study aims to form a data classification algorithm using the fuzzy soft set method. In this study, the fuzzy soft set was calculated based on the normalized Hamming distance. Each parameter in this method is mapped to a power set from a subset of the fuzzy set using a fuzzy approximation function. In the classification step, a generalized normalized Euclidean distance is used to determine the similarity between two sets of fuzzy soft sets. The experiments used the University of California (UCI) Machine Learning dataset to assess the accuracy of the proposed data classification method. The dataset samples were divided into training (75% of samples) and test (25% of samples) sets. Experiments were performed in MATLAB R2010a software. The experiments showed that: (1) The fastest sequence is matching function, distance measure, similarity, normalized Euclidean distance, (2) the proposed approach can improve accuracy and recall by up to 10.3436% and 6.9723%, respectively, compared with baseline techniques. Hence, the fuzzy soft set method is appropriate for classifying data.
Nowadays, Big Data is used in Tuberculosis (TBC) patient data in healthcare, stock data in economics and business fields, and BMKG data (containing weather, temperature, and rainfall data), etc. Data mining is the process of extracting knowledge from large amounts of data [1], and is done by extracting information and analyzing data patterns or relationships [2,3].
Classification is one of the data mining processes used to predict predetermined target classes with data learning accurately. The classification has been used in health [4–6], economics, and agriculture fields [7,8]. Classifying data is challenging and requires further research [9].
In 1965, Zadeh [10] introduced a fuzzy set in which each element object had a grade of memberships ranging between zero and one. In comparison, Molodtsov [11] introduced soft set theory to collect parameters from the universal set subsets (set U). Soft set theory is widely used to overcome the presence of elements of uncertainty or doubt, such as those found in decision-making. Roy developed fuzzy soft set theory by combining soft set theory and fuzzy set theory. This theory was then used in decision-making problems [12,13]. Majumdar and Samanta [14] presented a fuzzy soft set for similarity measurement between two generalized fuzzy soft sets for decision-making.
The fuzzy soft set, an extension of the classical soft set, was introduced by Maji [15]. There have been many works about fuzzy soft set theory in decision-making. Ahmad et al. [16] defined arbitrary fuzzy soft union and fuzzy soft intersection and proved Demorgan laws using fuzzy soft set theory. Meanwhile, Aktas and Cagman [17] studied fuzzy parameterized soft set theory, related properties, and decision-making applications. Rehman et al. [18] studied some fuzzy soft sets’ operations and gave fuzzy soft sets the fundamental properties. Finally, Celik et al. [19] researched applications of fuzzy soft sets in ring theory.
The critical issue in fuzzy soft sets is the similarity measure. In recent years, similarity measurement between two fuzzy soft sets has been studied from different aspects and applied to various fields, such as decision-making, pattern recognition, region extraction, coding theory, and image processing. For example, similarity measurement [20] has been researched in fuzzy soft sets based on distance, set-theoretic approaches, and matching functions. Sut [21] and Rajarajeswari [22] used the notion of the similarity measure in Majumdar and Samanta [20] to make decisions. Several similarity measurement [23] based on four types of quasi-metrics were introduced to fuzzy soft sets. Sulaiman [24] researched a set-theoretic similarity measure for fuzzy soft sets, and applied it to group decision-making. However, some studies haphazardly investigated the similarity measurement of fuzzy soft sets based on distance, resulting in high computational costs [20,23]. Feng and Zheng [25] showed that the similarity measure based on the Hamming distance and normalized Euclidean distance in the fuzzy soft set is reasonable. Thus, the similarity of generalized normalized Euclidean distance is applied in the present paper to a fuzzy soft set for classification. The similarity is used to classify the label of data. The experimental results show that the proposed approach can improve classification accuracy.
The Proposed Method/Algorithm
This section presents the basic definitions of fuzzy set theory, soft set theory, and some useful definitions from Roy and Maji [12].
Fuzzy Set
Definition 2.1 [10] Let U be a universe. A fuzzy set A over U is a set defined by a function
μA:U→[0,1]
where μA is the membership function of A, and the value μA (x) is the membership value of x∈U. The value represents the degree of x belonging to the fuzzy set U. Thus, a fuzzy set A over U can be represented as in (2).
A={μA(x)∣x∈U,μA(x)∈[0,1]}
The notion that the set of all the fuzzy sets over U was denoted by F(U).
Definition 2.2 [10] Let A be a fuzzy set, where A∈F(U). Then, the complement of A is as in (3)
Ac={μAc(x)∣x∈U,μAc(x)=1−μA(x)}
Definition 2.3 [10] Let A, B be the fuzzy set, where A, B∈F(U). The membership degree of union of A and B is denoted by μA∪B(x):
μA∪B(x)=max{μA(x),μB(x)};
for all x∈U and μA∪B(x) ∈ [0,1].
Definition 2.4 [10] Let A,B be the fuzzy set, where A,B∈F(U). The membership degree of intersection of A and B is denoted by μA∩B(x):
μA∩B(x)=min{μA(x),μB(x)};
for all x∈U and μA∪B(x) ∈ [0,1].
Fuzzification
Fuzzification is a process that changes the crisp value to a fuzzy set, or a fuzzy quantity into a crisp quantity [26]. This process uses the membership function and fuzzy rules. The fuzzy rules can be formed as fuzzy implications, such as (x1is A1) ° (x2is A2) ° … ° (xnis An); then Y is B, with ° being the operator “AND” or “OR”. B can be determined by combining all antecedent values [14].
Fuzzy Soft Set (FSS)
Definition 2.5 [12] Let U be an initial universe set and E be a set of parameters. Let P(U) denote the power set of all fuzzy subsets of U, and A ⊆ E. ΓA is called a fuzzy soft set over U, where the function of γA is a mapping given by γA:A→P(U) such that γA(e)=∅ if e∉A.
Here, the function γA is an approximate function of the fuzzy soft setΓA, and the value γA(e)is called an e-element of a fuzzy soft set for alle∉A. Fuzzy soft set ΓA over U can be represented by the set of ordered pairs:
ΓA={(e,γA(e))|e∈A,γA(e)∈P(U)}.
Note that the set of all the fuzzy soft sets over U was denoted by FS(U).
Example 1 [14] Let a fuzzy soft set ΓA describe the attractiveness of the shirt concerning the given parameters, which the authors are going to wear. U={u1,u2,u3,u4,u5} is the set of all shirts under consideration. P(U) be the collection of all fuzzy subsets of U. Let E = {e1 = “colorful”, e2 = “bright”, e3 = “cheap”, e4 = “warm”}. If A={e1,e2,e3} can be the approximate value of the function fuzzy,
γA (e1) = {0.5|u1, 0.9|u2},
γA (e2) = {1|u1, 0.8|u2, 0.7|u3},
γA (e3) = {1|u2, 1|u5}.
The family {γA (ei); i = 1,2,3} of P(U) is then a fuzzy soft set ΓA. The tabular representation for fuzzy soft set ΓA is shown in Tab. 1.
The representation of the fuzzy soft set ΓA
U/A
e1
e2
e3
x1
0.5
1
0
x2
0.9
0.8
1
x3
0
0.7
0
x4
0
0
0
x5
0
0
0.3
Definition 2.6 [14] Let ΓA, ΓB∈FS(U). ΓA is a fuzzy soft subset of ΓB, denoted by ΓA ⊆ ΓB, if γA(e) ⊆ γB(e) for all e∈A, A ⊆ B.
Definition 2.7 [14] Let ΓA, ∈FS(U). The complement of fuzzy soft set ΓA is denoted by ΓAc such that γAc(e)=γAc(e) for all e∈A.
Definition 2.8 [14] Let ΓA, ΓB∈FS(U). The union of ΓA and ΓB is denoted by ΓA∪B(e) = γA(e) ∪ γB(e) for all e∈A ∪ Be∈A∪B.
Definition 2.9 [14] Let ΓA, ΓB∈FS(U). The intersection of ΓA and ΓB is denoted by ΓA∩B(e) = γA(e) ∪ γB(e) for all e∈A ∪ Be∈A∩B.
Definition 2.10 [14] Let ΓA, ∈FS(U). The cardinal set of ΓA, denoted by cΓA, can be defined by cΓA = {µ cΓA(e)|e:e∈A}, where membership function µ cΓA of cΓA is defined by
cΓA:E→[0,1]
μcΓA(e)=|μA(e)||U|.
|U| is the cardinality of universe U, and
|μA(e)|=∑u∈UμγA(e)(u).
The set of all cardinal sets of fuzzy soft set over Ucan be denoted by cFS(U).
Classification
Classification involves learning a target function that maps each collection of data attributes to several groups of predefined classes. The purpose of the classification is to see the class’s target predictions as accurate as possible for each case in the data. The classification algorithm consists of two stages. In the training stage, the classifier is trained on predefined classes or data categories. An X tuple, represented by the n-dimensional vector attribute, X={x1,x2,…,xN}, describes by the measurements made on the tuples with n attributes A1,A2,…,AM. Each tuple belongs to a class, as identified by its attributes. Class attribute labels have discreet, non-consecutive values, and each value acts as a category or class. Next, the second step is Classification. In this step, the built-in classifier was used to classify the data by looking at the classification algorithm’s accuracy in the estimated data testing. The step is to see the accuracy in the first classification; the predicted classifier’s accuracy is estimated. If using a training set to measure the classifier’s accuracy, then the estimate would be optimal because the data used to form the classifier comprise the training set. Therefore, a test set (a set of tuples and their class labels selected randomly from the dataset) were used. Test sets are independent of the training sets because test sets were not used to build a classifier.
Similarity Measurement
A measurement of similarity or dissimilarity defines the relationships between samples or objects. Similarity measurements were used to determine which patterns, signals, images, or sets are alike. For the similarity measure, the resemblance is more critical when its value increases, but, conversely, for a dissimilarity measurement, the resemblance is more robust when its value decreases [27]. An example of the dissimilarity measure is a distance measure. Measuring similarity or distance between two entities is crucial in various data mining and information discovery tasks, such as classification and clustering. Similarity indicators calculate the degree that various patterns, signals, images, or sets are alike. A few researchers have measured the similarity between fuzzy sets, fuzzy numbers, and vague sets. Recently [14,20,28] studied the similarity measure of the soft set and fuzzy soft set. They explained the similarity between the two generalized fuzzy soft sets as follows.
Let U={x1,x2,…,xn} be the universal set of elements and E={e1,e2,…,em} be the universal set of parameters. Let Fρ and Gδ be two generalized fuzzy soft sets over the parameterized universe (U,E). Hence, Fρ={F(ei),ρ(ei),i=1,2,…,m} and Gδ={G(ei),δ(ei),i=1,2,…,m}. Thus, F={F(ei),i=1,2,…,m} and G={G(ei),i=1,2,…,m} are two families of fuzzy soft sets.
The similarity between F and G is found and denoted by M(F,G). Next, the similarity between the two fuzzy sets ρ and δ is found and denoted by m (ρ,δ). Then, the similarity between the two generalized fuzzy soft sets Fρ and Gδ is denoted as S(Fρ,Gδ) = M(F,G) × m(ρ,δ).
Therefore, M (F, G) = max Mi (F,G), where:
Mi(F-,G-)=1−∑j=1n|F-ij−G-ij|∑j=1n(F-ij+G-ij).
Furthermore,
m(ρ,δ)=1−∑j=1n|ρi−δi|∑j=1n(ρi+δi).
If we use the universal fuzzy soft set, then ρ=δ=1 and m(ρ,δ) = 1. Now, the formula for similarity is
and M1(F,G) ≅ 0.73; M2(F,G) ≅ 0.43; M3(F,G) ≅ 0.50. Thus, max [ Mi(F,G) ] ≅ 0.73.
Hence, the similarity between the two GFSS Fρ and Gδ were S(Fρ,Gδ) = M(F,G) × m(ρ,δ) = 0.73 × 0.82 = 0.60 for a universal fuzzy soft set, where ρ=δ=1 and m(ρ,δ) = 1. Then, the similarity S(Fρ,Gδ) = 0.73.
Distance Measurement
In this study, the fuzzy soft set was calculated based on the normalized Hamming distance [25]. We assume fuzzy soft sets (F,A) and (G,B) have the same set of parameters, namely, A = B. The normalized Hamming distance and normalized distance in Fuzzy Soft Set (FSS) are obtained using Eqs. (13) and (14).
Example 3. As in Roy and Maji [12], let U = {u1, u2, u3} be a set with parameters={a1,a2,a3}. Two FSS (G,A) and (H,A) are represented by Tabs. 2 and 3, respectively.
Using Eqs. (13) and (14), respectively, the normalized Hamming distance and normalized distance in FSS between (G,A) and (H,A) can be calculated as follows:
d′ indicates the distance between the ith parameter of (F,A) and(G,B), and d1((F,A),(G,B)) indicates the distance among all parameters of (F,A) and(G,B).
Discussion
In this section, the proposed approach and experimental results of the Fuzzy Soft Set Classifier (FSSC) using the normalized Euclidean distance are discussed.
Proposed Approach
This study proposed a new classification algorithm based on the fuzzy soft set; we call it the Fuzzy Soft Set Classifier (FSSC). This algorithm used the normalized Euclidean distance of similarity between two fuzzy soft sets to classify unlabeled data. Before training and classification steps, we first conducted fuzzification and created a fuzzy soft set.
Training Step
The goal of training the algorithm is to determine the center of each existing class.
LetU={u1,u2,…,uN}, E be the set of parameters, A⊆E,andA={ei,i=1,2,…M}. There are k classes with nr samples in each class, where r=1,2,…,k and ∑r=1knr=N. Let us say that Cr⊆U is r-class data, and ΓCr, is the set of fuzzy soft sets of the r-class data. Thus, the center set of class Cr is denoted as ΓPCr and be defined as in Eq. (17).
The new data of the training step results were used to determine the classes in the new data; that is, by measuring the similarity of two sets of fuzzy soft sets acquired in the class center vector and new data.
Given ΓCr,r=1,2,…,k fuzzy soft set of new data ΓG. The formula for measuring similarity:
similaritymeasure=1−disctancemeasure.
We use the generalized normalized Euclidean distance for normalized Euclidean distance of the fuzzy set. With relation to Eq. (15), rather than the normalized Euclidean distance fuzzy set,
After the value the similarity for each class was obtained, the algorithm looked for which class label is appropriate for new data ΓG by determining the maximum similarity.
prediction=arg[maxr=1kS∗(ΓPCr,ΓG)].
Experimental Results
We conducted experiments using the University of California (UCI) dataset to assess the accuracy of the proposed data classification method. The dataset samples were divided into training (75% of samples) and test (25% of samples) sets. Experiments were performed in MATLAB R2010a software. Figs. 1–4 show the classification results obtained by our fuzzy soft set method and other baseline techniques.
Comparison of accuracy
Comparison of precision
Comparison of recall
Comparison of computational time
As seen in Fig. 1, calculations using the normalized Euclidean distance method yield the highest accuracy results. Fig. 2 shows that the normalized Euclidean distance method obtains the second-highest precision; the highest precision is obtained by the comparison table method in MatLab.
Fig. 3 shows that the normalized Euclidean distance method produces the highest recall results, whereas Fig. 4 illustrates that the method has the highest computation time.
The fastest sequence is matching function, distance measure, similarity, normalized Euclidean distance. Comparisons are shown in Tab. 4.
Improvement of accuracy and recall
Comparison Table
Similarity
Distance Measure
Matching Function
Normalized Euclidean Distance
Improvement
Accuracy
0.6580
0.6688
0.6671
0.6689
0.7380
10.3436 %
Recall
0.6986
0.7212
0.7221
0.7222
0.7725
6.9723 %
Conclusions
In this study, a new classification algorithm based on fuzzy soft set theory was proposed. Experimental results show that the normalized Euclidean distance method improves accuracy by 10.3436% and increases by 6.9723%, compared to baseline techniques. We also find that all similarity measurements proposed in this paper are reasonable.
Funding Statement: The authors received no specific funding for this study.
Conflicts of Interest: The authors declare that they have no interest in reporting regarding the present study.
ReferencesJ.Han, M.Kamber and J.Pei, “13 - Data mining trends and research rontiers BT - data mining,” In: J. Han (ed.),
,
3rd edition,
Boston:
Morgan Kaufmann, pp.
585–
631,
2012.Y.Cheng, K.Chen, H.Sun, Y.Zhang and F.Tao, “
Data and knowledge mining with big data towards smart production,”
, vol.
9, no.
9, pp.
1–
13,
2018.M.Azarafza, M.Azarafza and H.Akgün, “
Clustering method for spread pattern analysis of corona-virus (COVID-19) infection in Iran,”
, vol.
3, no.
1, pp.
1–
6,
2021.D. E.Lumsden, H.Gimeno and J.-P.Lin, “
Classification of dystonia in childhood,”
, vol.
33, pp.
138–
141,
2016.M.Zheng, “
Classification and pathology of lung cancer,”
, vol.
25, no.
3, pp.
447–
468,
2016.A.Ojugo and O. D.Otakore, “
Forging an optimized bayesian network wodel with selected parameters for detection of the coronavirus in Delta State of Nigeria,”
, vol.
3, no.
1, pp.
37–
45,
2021.X.Li and Y.Tang, “
Two-dimensional nearest neighbor classification for agricultural remote sensing,”
, vol.
142, no.
10–12, pp.
182–
189,
2014.Y.Tang and X.Li, “
Set-based similarity learning in subspace for agricultural remote sensing classification,”
, vol.
173, no.
10–12, pp.
332–
338,
2016.B.Handaga, T.Herawan and M. M.Deris, “
FSSC: An algorithm for classifying numerical data using fuzzy soft set theory,”
, vol.
2, no.
4, pp.
29–
46,
2012.L. A.Zadeh, “
Fuzzy sets,”
, vol.
8, no.
3, pp.
338–
353,
1965.D.Molodtsov, “
Soft set theory—first results,”
, vol.
37, no.
4–5, pp.
19–
31,
1999.A. R.Roy and P. K.Maji, “
A fuzzy soft set theoretic approach to decision making problems,”
, vol.
203, no.
2, pp.
412–
418,
2007.P. K.Maji, A. R.Roy and R.Biswas, “
An application of soft sets in a decision making problem,”
, vol.
44, no.
8–9, pp.
1077–
1083,
2002.P.Majumdar and S. K.Samanta, “
Generalised fuzzy soft sets,”
, vol.
59, no.
4, pp.
1425–
1432,
2010.P. K.Maji, R.Biswas and A. R.Roy, “
Fuzzy soft sets,”
, vol.
9, no.
3, pp.
589–
602,
2001.B.Ahmad and A.Kharal, “
On fuzzy soft sets,”
, vol.
2009, pp.
586507,
2009.H.Aktaş and N.Çağman, “
Soft sets and soft groups,”
, vol.
177, no.
13, pp.
2726–
2735,
2007.A.Rehman, S.Abdullah, M.Aslam and M. S.Kamran, “
A study on fuzzy soft set and its operations,”
, vol.
6, no.
2, pp.
339–
362,
2013.Y.Celik, C.Ekiz and S.Yamak, “
Applications of fuzzy soft sets in ring theory,”
, vol.
5, no.
3, pp.
451–
462,
2013.P. Majumdar and S. K. Samanta, “On similarity measures of fuzzy soft sets,” International Journal of AdvanceSoft Computing and Applications, vol. 3, no. 2, pp. 1–8, 2011.D. K.Sut, “
An Application of similarity of fuzzy soft sets in decision making,”
, vol.
3, no.
2, pp.
742–
745,
2012.D. P.Rajarajeswari and P.Dhanalakshmi, “
An application of similarity measure of fuzzy soft set based on distance,”
, vol.
4, no.
4, pp.
27–
30,
2012.H.Li and Y.Shen, “Similarity measures of fuzzy soft sets based on different distances,” in
. Vol.
1.
Hangzhou, China, pp.
527–
529,
2012.N. H.Sulaiman and D.Mohamad, “
A set theoretic similarity measure for fuzzy soft sets and its application in group decision making,” in 20th National Symposium on Mathematical Sciences: Research in Mathematical Sciences: A Catalyst for Creativity and Innovation. Proceedings: AIP Conference,
Putrajaya, Malaysia, vol.
1522, pp.
237–
244,
2012. Q.Feng and W.Zheng, “
New similarity measures of fuzzy soft sets based on distance measures,”
, vol.
7, no.
4, pp.
669–
686,
2014.L.Baccour, A. M.Alimi and R. I.John, “
Some notes on fuzzy similarity measures and application to classification of shapes, recognition of arabic sentences and mosaic,”
, vol.
41, no.
2, pp.
81–
90,
2014.S.Chowdhury and R.Kar, “
Evaluation of approximate fuzzy membership function using linguistic input-an approached based on cubic spline,”
, vol.
1, no.
2, pp.
53–
59,
2020.P.Majumdar and S. K.Samanta, “
Similarity measure of soft sets,”
, vol.
04, no.
01, pp.
1–
12,
2008.