Information Theoretic Weighted Fuzzy Clustering Ensemble

: In order to improve performance and robustness of clustering, it is proposed to generate and aggregate a number of primary clusters via clustering ensemble technique. Fuzzy clustering ensemble approaches attempt to improve the performance of fuzzy clustering tasks. However, in these approaches, cluster (or clustering) reliability has not paid much attention to. Ignoring cluster (or clustering) reliability makes these approaches weak in dealing with low-quality base clustering methods. In this paper, we have utilized cluster unreliability estimation and local weighting strategy to propose a new fuzzy clustering ensemble method which has introduced Reliability Based weighted co-association matrix Fuzzy C-Means number of data objects. Performance and robustness of the proposed method are experimentally evaluated for some benchmark datasets. The experimental results demonstrate efficiency and suitability of the proposed method.


Introduction
Data mining  involves many tasks. One of the most important tasks in data mining is clustering [25][26][27][28][29]. According to different similarity criteria implemented by various clustering algorithms in the context of unsupervised learning, different objective functions are targeted [30]. According to "no free lunch" theory, there is no dominant clustering method [22,23]. Therefore, the idea of combining clustering, which is also called cluster (or clustering) ensemble, emerged. In a clustering ensemble, some of the basic partitions are combined to acquire a better solution capable of managing all objectives of different partitions which sometimes are contradictory [28][29][30]. Cluster ensemble offers many advantages, among which the following can be mentioned: robustness to noise [29], capability of producing novel results [31][32][33], quality enhancement [29], knowledge reusability [30], multi-view clustering [33], stability, parallel/distributed data processing [30], nding number of real clusters, adaptability and heterogeneous data clustering.
The process of clustering ensemble consists of two related phases as shown in Fig. 1a. The rst phase includes production of the diverse base clusterings through the basic clustering algorithm (s), and the second phase involves extracting the nal clustering from the primary basic partitions using a consensus function. To improve the performance of the clustering ensemble in the rst phase, a special attention should be paid to the ensemble diversity which determines the quality of the combination. If the quality of each voter is greater than a random voter, the quality of the combination is improved by increasing the ensemble diversity, where the combination can be any community like the ensemble clustering [33]. There are many approaches, which are used in production of primary basic partitions in order to acquire the desired diversity, such as: Different initializations of parameters [30], heterogeneous ensemble clustering methods [28], different subsets of the features [30], different subsets of objects [28], projection of the object subsets [28] and hybrid methods. In our proposed method, a basic clustering algorithm is implemented with different parameters to achieve the desired diversity. In the second phase, a consensus function is utilized to achieve the nal clustering. Selection of the clusters with high quality is an NP hard problem because clustering is an unsupervised problem [33]. Several consensus functions are proposed to face this issue, each of which uses a speci c approach and different information from the primary basic partitions achieved from phase one, and sometimes considers the initial characteristics of the data. Consensus ensemble methods are divided into the following classes: 1. intermediate space clustering ensemble methods [28], 2. methods based on the co-association matrix [33], 3. hyper-graph based methods [30], 4. expectation maximization clustering ensemble methods [32], 5. mathematical modeling (median partition) approaches [34], and 6. voting-based methods [35].
There are two types of clustering algorithms: (a) First, hard clustering algorithms in which a data object either is de nitely assigned to a cluster or is not assigned to it at all and (b) Second, fuzzy clustering algorithms in which there are some data points that are not allocated to a speci c cluster but they are allocated to all clusters with different membership degrees (for a data point, the summation of all membership degrees to all clusters should be one). The basis of fuzzy clustering algorithms is the basic fuzzy c-means (FCM) clustering algorithm [36]. Although soft (or fuzzy) clustering has more generality than crisp clustering, researches in soft clustering are in early stages and fuzzy clustering ensemble approaches have not been widely developed. Fuzzy clustering ensembles are proposed by a few researchers, (e.g., Punera et al. [37] have proposed soft versions of Cluster-based Similarity Partitioning Algorithm (CSPA), Meta CLustering Algorithm (MCLA) and Hybrid Bipartite Graph Formulation (HBGF) which are respectively named soft CSPA (sCSPA), soft MCLA (sMCLA) and soft HBGF (sHBGF)) but crisp clustering has more maturity. Some of the existing fuzzy clustering ensemble methods convert fuzzy clusters to hard clusters through some existing simple methods and then the hard consensus functions are used to compute the nal partition. This conversion causes the loss of uncertainty information. This loss of uncertainty information causes some challenges in suggesting an ef cient fuzzy consensus clustering from multiple fuzzy basic partitions.
Consensus process is highly dependent on the quality of the primary partitions so that the low-quality or even noisy primary partitions could adversely affect it. Improving performance of the consensus functions through quality-evaluation and weighting of the primary partitions is attended to confront with low-quality primary partitions based on the implicit assumption that all clusters have the same reliability value in the same base clustering [38]. This is usually performed by assigning a weight to each primary partition, which is considered to be an individual as a whole, without caring about the quality of its clusters. We have to confront different reliability values of clusters in the same clustering because real-world datasets are inherently complex and noisy. In some methods, it is needed to have access to the data features such as what was proposed by Zhong et al. [33]. Their method has investigated the reliability values of the clusters using the Euclidean distances between objects of the clusters and its ef ciency is highly dependent on the distribution of the data in the dataset. However, in a clustering ensemble in its general formulation, we assume that we have not access to the original data features.
In this work, according to estimation of ensemble driven cluster unreliability and local weighting strategy, a new approach of fuzzy clustering ensemble is proposed as shown in Fig. 1b. In order to increase the consensus performance, a locally weighted plan is achieved by integrating the validity and unreliability of the cluster. An entropic criterion is used to estimate unreliability of any fuzzy cluster based on relation of the cluster acceptability to the entire ensemble. The new metric is de ned to estimate the fuzzy cluster unreliability and then the reliability values of the clusters are determined using an index named RDCI. Each cluster is assessed and weighted by its RDCI which evaluates the cluster using an effective indication provided by the crowd of diverse clusters in the ensemble. Then, a fuzzy weighted co-association matrix with the calculated weights based on the reliability, incorporates local acceptability into the conventional co-association (Co) matrix and is treated as a summary for the ensemble of diverse clusters. After that, new consensus algorithms are proposed to achieve the nal clustering. The new consensus algorithms include: (a) RBFCM and (b) RBGP which consider cluster reliabilities, and (c) RBHC which considers cluster pairwise acceptability.
The main contributions of the paper are summarized as follows. (I) We estimate an unreliability value per a fuzzy cluster in relation to other clusterings through the proposed method utilizing an entropic criterion. The entropic criterion considers only distribution of membership degrees of all objects to each primary fuzzy cluster while the access to the original data features is not needed; it has also no assumptions on data distribution. (II) The primary fuzzy clusters in the ensemble are assessed and weighted by proposing an RDCI which results in providing a reliability indication at the cluster-level with a contribution to the local weighting plan. (III) A new approach for fuzzy co-association matrix computation in the fuzzy cluster ensemble is proposed. (IV) New consensus functions are proposed to construct the nal clustering according to estimation of the reliability-driven fuzzy cluster unreliability and local weighting strategy. (V) Finally, the derived experimental results demonstrate the performance and robustness superiority of the proposed fuzzy clustering ensemble approach to the state-of-the-art approaches.
The rest of the paper is organized as follows. In Section 2, the related literature is reviewed. Section 3 provides the background knowledge about entropy and clustering ensemble. The proposed approach of fuzzy clustering ensemble is described in Section 4 based on the cluster unreliability estimation and local weighting strategy. The experimental results are provided in Section 5, and nally the paper is concluded in Section 6.

Related Work
Some of the most important ensemble clusterings include k means-based consensus clustering [28], spectral ensemble clustering [39] and in nite ensemble clustering [40]. The following works are also considered to be important researches in fuzzy clustering ensemble: sCSPA and sMCLA introduced in [37] as the fuzzy extension versions of CSPA and MCLA are proposed in [30]. sHBGF is also proposed as a fuzzy version of HBGF [31].
To extract the nal fuzzy clustering out of a fuzzy clustering ensemble, an explicit objective function is proposed in [41] based on the new contingency matrix. In order to accurately and ef ciently extract the nal clustering in this approach which is a parallelizable algorithm and capable of being used for big data clustering, a exible utility function is employed to change fuzzy consensus clustering into a weighted piecewise FCM-like iterative.
Vote-based merging algorithm (VMA) as a fuzzy clustering ensemble method which was proposed in [42], calculated the nal clustering based on the membership matrix averaging in each clustering. Since all the base clusters in this method have to be relabeled, VMAs would be among time-consuming algorithms. An information theoretic kmeans (ITK) was introduced by Dhillon et al. [43].
In [34], the clusters labels are shown as a 0-1 bit string where the goal is to obtain the nal fuzzy clustering which is a membership matrix any of whose members shows the membership degree of a data object to a cluster. Accordingly, they introduce the Fuzzy String Objective Function (FSOF) which tries to minimize the summation of distances between centers of the initial clusters and the nal clusters. Since the nal clustering was fuzzy, some constraints were added to the objective function which created a non-linear optimization NP problem. They employed a genetic algorithm for its solution. Note that it is applicable while the base clusterings are crisp. Two consensus functions, i.e., Fuzzy String Cluster Ensemble Optimized by Genetic Algorithm numbers 1 and 2 (FSCEOGA1 and FSCEOGA2), were also proposed based on cross-over and mutation operators.
In order to enhance the stability of the fuzzy cluster analysis, a heterogeneous clustering ensemble has been proposed [44] in which basic fuzzy clustering algorithms are rst utilized and then, nal clustering is obtained using a FCM algorithm so that all clusters in the co-association matrix have equal participation weights. To achieve the consensus clustering from fuzzy clustering ensembles, a voting mechanism is used in [35]. The work includes disambiguation and voting procedures. Disambiguation is a phase in which Hungarian algorithm [45] with time complexity of O n 3 (n is (average) number of clusters in each clustering) is used for re-labeling problem.
The voting phase is implemented to achieve the nal consensus clustering. There are many voting phases such as con dence-based voting methods (including sum voting rule and product voting rule) and positional-based voting methods (including Borda voting rule and Copeland voting rule). Time complexity of these voting rules is O (nMβ) where n, M and β are (average) number of the clusters in each clustering, number of data objects and number of base clusterings respectively. There exist many different consensus functions depending on direct or repetitive combination of re-labeling and voting phases.
According to the particle swarm optimization (PSO) which has the capability of nding fuzzy and crisp clusters [46,47], a method for construction of a fuzzy clustering ensemble is proposed in [46]. It creates initial clusters through parameter change; and then, using pruning process, n clusters are chosen from the β initial clusterings so that n c where c is the number of the clusters in all of the β initial clusterings. For pruning process, one of the internal cluster validity indices such as Ball-Hall, Caliński et al. [48], Dunn index [49], Silhouette index [50] or Xie-Beni [51] is selected to evaluate the tness of primary basic clusters; and then, one of the genetic selection mechanisms such as tournament or roulette wheel is used to choose the elite clusters. Final clustering is subsequently achieved by use of the consensus function through PSO algorithm. Unlike other PSO-based methods in which each particle represents a clustering, each particle represents a cluster in this method.

Preliminaries
Through the following de nitions, we introduce the general formulation of data, fuzzy clustering ensemble and entropy in this paper. Note that the notations implemented in this work are presented here. An object (or data point) is a tuple denoted by vector where x j i represents the j-th attribute from i-th data. Also, x j : is de ned as j-th attribute of dataset, N = x : 1 denotes size of the dataset dimensions (i.e., quantity of the attributes), and M = x i : shows the dataset size (i.e., quantity of the data points). π (x) is a M × n matrix which represents the fuzzy partition de ned on dataset x, where M = x 1 : and n is an integer indicating the number of clusters, and we have Eq. (1).
where π (x i ) j shows a real number indicating how much i-th data object belongs to the j-th cluster in partition π which is π = {C 1 , C 2 , . . . , C n }. denotes a clustering ensemble which includes β primary partitions, i.e., = π 1 , . . . , π β in which π m = C m 1 , . . . , C m n m where, π m represents the m-th primary partition in , C m i and n m are the i-th cluster and the number of the clusters in π m respectively. The set of all clusters in the ensemble is denoted by C and de ned as i is the i-th cluster of the partition π j . Therefore, the number of all clusters in the base clusterings is denoted by c and is de ned as c = n 1 + . . . + n β . Let's assume that z is a discrete random variable. The entropy is a measure of the unreliability related to a random variable. For a discrete random variable z whose domain, i.e., the set of the values for z, is denoted by Z, it is de ned according to Eq. (2).
where ρ z = z is the probability mass function of z and 0 log 2 0 is always assumed to be 0 here. H (z, w) is the joint entropy as a measure of the unreliability associated with a set of two discrete random variables z and w. It is de ned according to Eq. (3).
where ρ (z, w) is the joint probability of two discrete random variables z and w. When z and w are independent random variables, the joint entropy is H (z, w) = H (z) + H (w). Thus, for q independent random variables z 1 , z 2 , . . . , z q , the joint entropy is de ned according to H z 1 ; . . . ; z q = q q =1 H z q .

Proposed Approach
Block diagram of the proposed method is presented in Fig. 1b in which a new approach of fuzzy clustering ensemble is proposed based on estimation of the ensemble driven cluster unreliability and local weighting strategy. The proposed algorithm includes the following stages. First, we compute the acceptability of each cluster. Then, the unreliability of each cluster is estimated and after that weight of each cluster is determined. Next, the weighted co-association matrix is produced and nally, the consensus clustering is achieved.

Cluster Acceptability Computation
According to Fig. 1b and Eq. (4), the acceptability of each cluster over other clusters is computed which is equivalent to the computation of agreement probability between two clusters in different clusterings. Acceptability computation of cluster C s i (cluster C i ∈ π s ) over the cluster C r j (cluster C j ∈ π r ), when s = r is performed using Eq. (4).

Cluster Unreliability Estimation
As depicted in Fig. 1b, the next stage in the algorithm is unreliability estimation of the clusters. Using the entropy concept, as a measure for discrete random variables, unreliability is computed to be applied to the cluster labels in reliability computation of the clusters. It is a reasonable strategy because there is not information about original data features and their distribution. If cluster C i does not belong to the base clustering π r ∈ exactly, then, the clustering π r will partition the cluster; this means that the membership degree of data objects of C i , is different from their membership degree in all clusters of π r and the clustering π r may not accept the membership degree of C i data objects. Therefore, depending on how the data objects of C i are clustered in π r , the unreliability (or entropy) of C i with regards to π r is estimated by entropy concept presented by Eq. (2). Unreliability of the cluster C s i (i.e., the i-th cluster in s-th clustering) with respect to clustering π r of the ensemble where s = r, is acquired using Eq. (5).
whereρ C s i , C r j is the probability vector obtained according toρ and the term log 2 n r is added to guarantee that υ C s i , π r is in range [0; 1], where n r is the number of the clusters in π r , C r j is the j-th cluster in π r and ρ C s i , C r j is acquired according to Eq. (4). Since we have assumed that the partitions of the ensemble are independent [52], we can achieve the unreliability of a cluster C i with respect to the β base clusterings in ensemble by Eq. (6). It means the unreliability of cluster C s i in ensemble clusterings is computed by Eq. (6).
where the term β−1 is added to make

Cluster Reliability Computation
Weight of each cluster represents its reliability value which is computed according to the derived unreliability or entropy of that cluster in the clustering ensemble via an RDCI. For cluster C s i , RDCI as the weight of each cluster in a clustering ensemble with β base clusterings is de ned as follows in Eq. (7).
where impact of the unreliability on clustering ensemble weight is adjusted by the non-negative parameter ∅. The best result is obtained when ∅ is 0.4. Since When the unreliability value of a cluster C s i is minimized, i.e., when C s i , Π is zero, its RDCI is maximized, i.e., RDCI ∅ C s i is one.

Cluster-Wise Weighted Co-Association Matrix
In this stage, a fuzzy co-association matrix is derived regarding the reliability values of the clusters in the ensemble. Methods based on the co-association matrix are of the most common methods for combination of the base clusterings. Using the Evidence Accumulation Clustering (EAC) method, which was proposed in [32], individual data object clusterings in a clustering ensemble were projected into a new metric of pairwise similarity. However, this method could not suitably achieve the co-association matrix from fuzzy clusters. Thus, Evidence Accumulation Fuzzy Clustering (EAFC) is implemented as a new method to derive the co-association matrix. Eq. (8) is used to derive the fuzzy co-association clustering ensemble matriẋ where x i and x j are the data objects, inf (x, y) and sup (x, y) are considered to be xy and x + y . In Eq. (9), the weighted fuzzy co-association matrix is obtained by using the RDCI as a weight in calculation of the co-association matrix to consider the reliability of each cluster. Calculation of the weighted fuzzy co-association clustering ensemble matrix (WFCA) is performed using Eq. (9).
To make theẆFCA i,j matrix normal, we use WFCA i,j =Ẇ Toy example. Tab. 1a represents two fuzzy clusterings π 1 and π 2 (i.e., β = 2) on an assumptive dataset x with 6 data objects (i.e., M = 6). Tab. 1b represents the ρ values of the fuzzy clusters in Tab. 1a. Tab. 1c showsρ values of the fuzzy clusters in Tab. 1a and their unreliability values. Tab. 1c also contains the corresponding RDCI values of fuzzy clusters. Using Eq. (8), the coassociation matrix FCA of π 1 and π 2 from Tab. 1a is derived and shown in Tab. 1d. Tab. 1e exhibits the weighted co-association matrix of the fuzzy clustering ensemble presented in Tab. 1a regarding the calculated RDCIs.

Consensus Functions
Final clustering computation, which is the last stage according to Fig. 1b, is performed through three proposed consensus algorithms in this section based on the following ways: (a) Using co-association matrix and a subsequent hierarchical clustering, (b) According to reliability computation of each cluster of the ensemble and a subsequent graph clustering, and (c) Based on the acceptability computation of each cluster over other clusters of the ensemble and subsequently applying of a metaheuristic algorithm.

WFCA-Based Consensus Function
When a cluster's unreliability is a large value with respect to the base clusterings, we understand that the cluster is divided into some fragments of the data objects in the partitions of the ensemble. Here, a conclusion is to keep the clusters with large reliability values with respect to the clustering ensemble in the nal ensemble of elite clusters. By considering reliability values of clusters in co-association matrix computation, and viewing it as a new similarity matrix between the data object pairs in the dataset, the consensus partition can be achieved by applying a simple FCM clusterer algorithm or a hierarchical clusterer algorithm over the new similarity matrix.
We need to calculate the weight of each cluster using Eq. (7) to achieve the nal clustering from base clusterings based on each cluster's local reliability value in the ensemble. Therefore, to compute the entropy of each cluster in , the following steps must be followed: the acceptability of any cluster (denoted by p) in the ensemble is obtained by Eq. (4) and then, it is normalized. Then, the unreliability of each cluster is computed by Eq. (5) with respect to the clustering π m . Subsequently, the entropy of each cluster is calculated according to Eq. (6) in the ensemble. After computation of the RDCI, using Eq. (9), the weighted fuzzy co-association matrix of the clustering ensemble is acquired and then, it is normalized. For consensus function, a hierarchical clustering algorithm is implemented which is a widespread clustering technique whose typical input is a distance matrix d achieved by d = 1 − WFCA. Also, for consensus function, a simple FCM clustering algorithm can be employed instead of the hierarchical clustering algorithm by considering the distance matrix d as an intermediate feature space. RBFCM algorithm with two inputs (primary clustering ensemble and n * number of clusters in the consensus partition) is presented in detail as follows in Algorithm 1. Algorithm 1:: RBFCM (Reliability Based Weighted Co-association matrix Algorithm) Input: , n * ; Output: π * // stands for a pool of fuzzy primary partitions // n * stands for number of nal clusters // π * stands for nal clustering 1: According to Eq. (4), the cluster acceptability over other clusters is calculated in clustering π r (π r ∈ ) for each cluster. 2: According to Eq. (5), the cluster unreliability is computed with respect to the clustering π m in the ensemble for each cluster. 3: Using Eq. (6), the unreliability of the clusters in the ensemble is derived. 4: Using Eq. (7), RDCI values of the clusters in ensemble are calculated. 5: According to Eq. (9), WFCA matrix is created. 6: Using d = 1 − WFCA, d matrix is computed. 7: Final clustering with n * clusters is achieved using FCM (d, n * ) //π * = FCM (d, n * ) Output: Consensus clustering π * .

Reliability Based Graph Clustering Algorithm
According to the de nition of the bipartite graph and its clustering in which all of clusters and data points are considered to be its nodes, RBGC consensus function is proposed. In order to have an edge between two vertices, they have to be different. Thus, if one node is a data object and the other is a cluster, an edge could be developed. Weight of an edge located between a data object x i ∈ X and a cluster v j ∈ C is given by Eq. (10).
where RDCI ∅ v j is given by Eq. (7) and v j (x i ) is a real number in [0; 1] which indicates how much data object x i belongs to in cluster v j . The result of application of RDCIs is, in addition of considering the belong-to relationship between objects and clusters by bipartite graph, the reliability values of clusters are re ected. In order to obtain the consensus clustering, after the weighted bipartite graph was constructed, some basic graph partitioning algorithms such as METIS [53] are utilized to divide the mentioned weighted bipartite graph into n * clusters, where n * is the quantity of the desired clusters in the consensus partition and the consensus graph partition is considered to be consensus partition. Algorithm 2 shows RBGP algorithm in detail.

Algorithm 2:: RBGP (Reliability Based Graph Partitioning Algorithm)
Input: , n * ; Output: π * // stands for a pool of fuzzy primary partitions // n * stands for number of nal clusters // π * stands for nal clustering (Continued) Algorithm 2:: (Continued) 1: Using Eq. (4), the acceptability of each cluster is calculated over other clusters in the clustering π m of the ensemble . 2: Eq. (5) is implemented to compute the unreliability of each cluster with respect to the partition π m in the ensemble . 3: The unreliability values of clusters in is derived by Eq. (6). 4: RDCI values of the clusters in the ensemble are achieved according to Eq. (7). 5: The weighted bipartite graph is constructed using Eq. (10). 6: METIS algorithm is applied to the weighted bipartite graph to obtain n * clusters as consensus partition Output: Consensus partition is considered to be consensus clustering π * .

Reliability Based Hyper Clustering Algorithm
In RBHC algorithm (presented in Algorithm 3) with the idea of merging similar clusters in an approach which is faster than other two methods (it can be experimentally proved later in Section 5.4), the cluster acceptability over other clusters is considered to be pairwise cluster similarity for each cluster. Pairwise similarity of the clusters is the acceptability of a fuzzy cluster over other clusters.

Algorithm 3:: RBHC (Reliability Based Hyper Clustering Algorithm)
Input: , n * , SType; Output: π * // stands for a pool of fuzzy primary partitions // n * stands for number of nal clusters // π * stands for nal clustering // SType determines the membership scheme and could be max, min or sum 1: Eq. (6) is used to achieve the acceptability of each cluster in over the clusters. 2: k means clusterer algorithm is implemented to extract a partition of the n * hyper-clusters on the primary clusters 3: The amount that any data point x l belongs to each hyper-cluster hc i is calculated as: Using Eq. (4), acceptability (p) of each cluster is derived over other clusters in the ensemble and the pairwise cluster acceptability matrix is obtained which could be clustered into a number of hyper-clusters by the basic clustering algorithms such as k means. After that, depending on the membership scheme (membership value function), the amount that any data point belongs to each hyper-cluster is achieved according to the membership values of the data object to the primary clusters in that hyper-cluster. There are three cases for member scheme. First, the case in which membership scheme is max, membership degree of an arbitrary data object x l in each hypercluster is the maximum membership degrees of the data object in all clusters of the hyper-cluster. Second case is when the membership scheme is min in which the minimum of membership degree of an arbitrary data point x l to the base clusters in a hyper-cluster is achieved and the minimum achieved value is considered to be membership degree of the data object in that hyper-cluster. The third case is when membership scheme is sum, where the membership degrees of an arbitrary data object x l in each hyper-cluster is the average of the data object membership degrees in all clusters of the hyper-cluster. Finally, data object membership degrees of each hyper-cluster is divided to sum of the data object membership degrees in all hyper-clusters in order to set this sum to 1. The following notations are used in this paper: hc i is a hyper cluster, |hc i | represents the number of base clusters in hc i , membership degree of x l in base cluster of C j is shown by C j (x l ), and the membership degree of x l in hyper-cluster hc i is shown by µ hc i (x l ) which is obtained according to Eq. (11).
where S µ hc i (x l ) is the membership degree of the data object x l in hyper-cluster hc i which is calculated by scheme function. We can use different basic clustering algorithms depending on their applications to accomplish the clustering task on the primary clusters. As an example, we can replace kmeans clusterer algorithm by Kernelized FCM clusterer algorithm when we need to extract the clusters with different shapes. We can use this algorithm to generate the nal fuzzy partition which bene ts the smaller computation time in comparison with other consensus algorithms. It will be analyzed in Section 5.4.

Benchmark
In this paper, several datasets are selected to evaluate the robustness and performance of the proposed fuzzy clustering ensemble approach. Here, for robustness and performance evaluation, we have selected several datasets from UCI machine learning datasets [54], "Galaxy" dataset described in [55] and handmade experimental dataset from a well-known HalfRing dataset as described in Tab. 2.

Performance Evaluation Criteria
Clustering performance is evaluated using the accuracy (AC) and normalized mutual information (NMI) criteria operating on the crisp clusterings and also the XB and FS criteria operating on the fuzzy clusterings. In the following, a general description on these criteria is provided.
The accuracy measure [28] used for clustering performance evaluation, is one of the most widespread evaluation criteria which creates a sound indication between the nal clustering and the ground-truth labels (the prior labeling information) of the examined dataset. When the AC values are larger, it means that better clustering results are achieved. NMI is another criterion for evaluating clustering performance which is the normalized mutual information between a pair of partitions [30]. Similar to AC metric, a larger NMI value indicates a better clustering result. A fuzzy partition should be converted to a crisp partition before being given to AC and NMI metrics, because these metrics are designed to operate on the crisp partitions. Another criterion for evaluating fuzzy partition π 1 is XB criterion proposed in [47] and modi ed in [51]. Another criterion is proposed for evaluating fuzzy clustering named FS. A good clustering result obtains a small FS value [51].

Base Clustering Generation
By construction of the base clusterings through the FCM clustering algorithm, consensus performance is evaluated over various ensembles. Different numbers of clusters for the FCM method are randomly chosen in the interval 2, √ M ; therefore, a diverse set of base clusterings could be made, where M is the data objects number in the under test dataset. The base partitions should be constructed for each dataset before applying all methods to a same set of base partitions. For the considered methods, the performance evaluation, robustness assertion and execution-time are respectively measured for the ensemble size β = 50, 20 and 10. The proposed approach and the state-of-the-art fuzzy clustering ensembles are assessed using their performance criteria and average robustness criterion (in terms of AC) over 100 runs in the simulation environment (MATLAB) to provide a con dent and fair comparison.

Experimental Results
Achieving a more robust and consistent consensus clustering is the main purpose in a clustering ensemble using combination of several primary partitions. We have compared three proposed consensus algorithms RBFCM, RBGP and RBHC (RBHC contains three versions: (a) RBHCmax, (b) RBHC-sum and (c) RBHC-min) with the state of the art methods, i.e., ITK, sCSPA, sHGPA, sMCLA, sHBGF and the FSCEOGA1 clustering ensemble methods. Performances of the resultant consensus partitions in the proposed and baseline methods are determined by the criteria AC, NMI, XB and FS. Note that in each dataset, number of the clusters is the same as the number of them in pre-de ned ground truth labels.
Here, each of the proposed methods and the baseline methods are executed 100 times. Tabs. 3-6 respectively exhibit the average values of 100 runs for AC, NMI, XB and FS criteria where the bolded value in each row is the best acquired performance of each dataset among all of the algorithms. The average performance of all algorithms is shown in the last column for each dataset, and in the last row, performance of each algorithm averaged over all datasets is shown. Because the FSCEOGA1 and ITK are computationally expensive, these methods cannot handle large datasets because of their large execution time. For this reason (i.e., this method is computationally expensive), the performance results of FSCEOGA1 method are presented on 100 different subsamples on the datasets possessing large numbers of data objects including Satimage, Yeast, Vehicle and Vowel. ITK is also a computationally expensive method which makes us present its performance results on the Satimage and Yeast datasets as averaged on 100 different subsamples.      The results of the proposed and other consensus algorithms for different datasets are provided in Tabs. 3-6. From Tab. 3, we can observe that RBGP, RBHC-min and RBFCM outperform other algorithms on ve, one and nine datasets respectively while other algorithms do not outperform the rest in any datasets. We can also see that the proposed RBFCM outperforms the other cluster ensemble algorithms in terms of the averaged AC value.
From Tab. 4, we can see that RBGP, RBHC-min, RBFCM outperform the other algorithms on three, four and ve datasets. It is also obvious that RBHC-sum, sMCLA and ITK outperform the other algorithms on one, two and one datasets respectively. The averaged performance values on all datasets show that RBFCM algorithm achieves the best performance and RBGP has the next best performance in terms of the averaged NMI value.
Tab. 5 shows that RBHC-sum, RBHC-max and RBHC-min outperform the other algorithms on eight, ve and four datasets, respectively while RBFCM outperforms the other algorithms on one dataset. It is also observable that the best averaged XB is achieved by RBHC-max. Also, the proposed RBHC algorithm outperforms the other algorithms on most of the datasets. Using Tab. 6, we can see that, among all algorithms on all datasets, the RBHC methods achieve the best results.
Robustness is a fundamental property for machine learning algorithms, which measures the learning algorithms' tolerance against perturbations (i.e., noises). In order to assess the performance (here, the performance is only accuracy) robustness of the proposed methods as a fundamental property which measures the algorithms' tolerance against perturbations, ve infected datasets with 5% to 20% noise ratios are selected from KEEL-dataset repository including Glass, Ionesphere, Iris, SAHeart, and Wine [56]. The datasets are infected with 5% to 20% noise ratios and the ensemble size of this evaluation is β = 20. Fig. 2 shows the accuracies of the proposed methods against baseline methods of sCSPA, sMCLA, sHBGF and ITK; it is worthy to be mentioned that the accuracies are averaged on all of the infected datasets here. We can conclude from Fig. 2 that RBGP, RBFCM and sHBGF are respectively the most robust methods against the noises in the infected datasets. We can also see that compared with these three methods, RBHC-sum has a lower accuracy but higher robustness. In Tab. 7, the execution time of the proposed methods is compared with the other related methods for ensemble size of β = 20 over all of our 15 datasets. The execution time per consensus clustering in all algorithms is computed. It is obvious that among the baseline algorithms, FSCEOGA1, and ITK have longer execution times respectively. Since time complexity is quadratic in terms of number of data objects, co-association based algorithms (sCSPA, RBFCM) are with longer execution times after FSCEOGA and ITK. We can also conclude that considering both the performance and the computational cost, RBHC has better performance than the other algorithms because for most of the datasets, it has a shorter average execution time per consensus with sum membership scheme.
Here, performance assessment of the proposed method is carried out through statistical analysis to assure that the results are not accidental. It is assumed that the data are with normal distribution or homogeneous variance and multiple variables are similar to those in [57] for non-parametric Friedman test [58].    3 where the hypothesis of equal mean rank (in terms of accuracy) is rejected, because of the signi cant difference shown by probability-value of 3.82E-09. We can also observe that RBFCM, RBGP, sHBGF and RBHC-min are respectively with the highest accuracy scores while ITK has the lowest score. The last rows of Tabs. 4-7 exhibit respectively the Friedman tests of the experimental results for data of Tabs. 4-7. The equal mean rank null hypothesis of the NMI (the XB, the FS and the consumed time) is rejected due to the signi cant difference provided by probability-value of 1.59E-05 (8.47E−19, 3.02E−16 and 3.21E−22).
Time complexity of RBFCM algorithm is analyzed according to Algorithm 1 which results in complexity of O c c + M 2 + Mn * t , where c is the quantity of all of the fuzzy clusters in the primary partitions, n * is number of consensus clusters, t represents quantity of the iterations that the FCM needs to converge and M stands for the dataset size. The co-association matrix computation is performed using the term c 2 corresponding to lines 1 to 4 of Algorithm 1. The term cM 2 from line 5 and the term Mn * t are related to the time complexity of the FCM. For RBGP, the time complexity is acquired as O (c (c + M) + Mn * β) using Algorithm 2, where the terms c 2 and Mc are related to bipartite graph construction corresponding to lines 1 to 4, and line 5 respectively. The term Mn * β is also the METIS time complexity with the ensemble size of corresponding line 6. Algorithm 3 is used for the time complexity calculation of RBHC which is derived as O c 2 + cn * t + Mn * , where Mn * , c 2 and cn * t (k means' time complexity) are related to lines 3 and 1 and 2 respectively. In comparison with the other algorithms, RBHC is more ef cient in reality, because of the rapid growth of the majority term, i.e., Mn * , provided that M c, and M n * .

Conclusions and Future Work
Based on fuzzy cluster unreliability estimation, a novel fuzzy clustering ensemble approach was proposed in this paper. Cluster unreliability was estimated according to an entropic criterion using the cluster labels in the entire ensemble. After that, based on the cluster unreliability and local weighting strategy, a new RDCI measure was proposed that is independent on features of the original data and is without any pre-assumption on data distribution. In order to promote the conventional co-association matrix, a local weighting scheme was proposed through the RDCI weight (WFCA) which determines how much amount each primary fuzzy cluster should contribute in construction of the co-association matrix; indeed, it should be with respect to its reliability in the ensemble. Three algorithms of RBFCM, RBGP and RBHC (suitable for large datasets) were proposed that respectively achieve the nal clustering from WFCA matrix, the nal clustering with respect to the reliability of each cluster and the nal clustering by computing the acceptability of each cluster over other clusters of the ensemble. In comparison to other fuzzy clustering ensemble methods, the proposed approach has shown performance improvement and more robustness against noisy datasets through experimental results over fteen datasets. We concluded that RBHC and RBGP algorithms are appropriate for large datasets because they are linear to the number of data points and have good performance comparison to other fuzzy clustering ensemble methods. It was shown in the paper that RBHC is the best method in terms of time complexity and performance, while RBGP is the best in terms of performance and robustness. RBFCM was also relatively the best in terms of performance. For future works, challenging high dimension datasets in RBFCM and missing values in a dataset can be considered to be problems to be solved. As other challenges for future studies, we can address the diversity impact of the proposed method and its performance in different sampling mechanisms.
Author Contributions: LPY and YXW designed the study and equally contributed the paper as the rst author. HG supervised the hypothesis and experiments. LY and YW wrote the manuscript; LY, YW, HP, and KHP edited the manuscript with help from HG and ZM; AB carried out the analyses, implementation of the codes, and the statistical analyses. HP, and KHP generated all gures and tables. All authors have read and approved the nal version of the paper.

Funding Statement:
The author(s) received no speci c funding for this study.

Con icts of Interest:
The authors declare that they have no con icts of interest to report regarding the present study.