Fuzzy C-means (FCM) is a clustering method that falls under unsupervised machine learning. The main issues plaguing this clustering algorithm are the number of the unknown clusters within a particular dataset and initialization sensitivity of cluster centres. Artificial Bee Colony (ABC) is a type of swarm algorithm that strives to improve the members’ solution quality as an iterative process with the utilization of particular kinds of randomness. However, ABC has some weaknesses, such as balancing exploration and exploitation. To improve the exploration process within the ABC algorithm, the mean artificial bee colony (MeanABC) by its modified search equation that depends on solutions of mean previous and global best is used. Furthermore, to solve the main issues of FCM, Automatic clustering algorithm was proposed based on the mean artificial bee colony called (AC-MeanABC). It uses the MeanABC capability of balancing between exploration and exploitation and its capacity to explore the positive and negative directions in search space to find the best value of clusters number and centroids value. A few benchmark datasets and a set of natural images were used to evaluate the effectiveness of AC-MeanABC. The experimental findings are encouraging and indicate considerable improvements compared to other state-of-the-art approaches in the same domain.

Clustering of data is a statistical approach used for managing large data volume. It is a multivariate analytical approach, which identifies patterns and relationships that exist amongst data. By data clustering, a user could divide data into relatively homogeneous groups. By reorganizing these groups, the user may be able to utilize the original data volume efficiently. Clustering accuracy is crucial because, according to [

Clustering algorithms consist of two types, namely, the partitional clustering, which generates several partitions, and hierarchical clustering, which generates only one partition [

Over the last few years, clustering methods have been demonstrated to be effective, particularly in tasks of categorization, requiring semi or full automation [

Clustering has many useful features, making it one of the most well-known in many fields of image segmentation, pattern recognition, machine learning, data mining, etc. [

However, the Fuzzy clustering-based approach still has significant weaknesses; for instance, its ability to obtain an automatic approach without prior knowledge of the number of clusters and centroid locations. Furthermore, determining the number of clusters within a particular dataset is a significant challenge and the unavailability of experts, operators, and any prior knowledge has contributed to this challenge as well. Accordingly, many researchers have worked on the successful implementation of clustering methods in the last few years to find the number of the appropriate clusters within a particular dataset without experts and operators.

Metaheuristic optimization search algorithms such as artificial bee colony (ABC) [

A fuzzy clustering FCM was proposed in Ouadfel et al. [

A method, fuzzy automatic clustering named AFDE for image segmentation problem was proposed by [

This paper proposes an automatic fuzzy clustering approach based on mean artificial bee colony called (AC-MeanABC). The proposed method uses the (MeanABC) algorithm to determine the appreciate cluster number and initial location of centroids. The method modifies the search equation depending on mean previous and global best solutions to reach the best balance between exploration and exploitation. The remaining portions of the paper are as follows. The type 2 fuzzy set is described in Section 2, related background on the employed techniques are presented in Sections 4 and 5. The proposed method is presented in Section 5, and Section 6 presents the experiments and results, wherein Section 7 concludes the paper.

FCM is an unsupervised method that can identically partite objectives based on the similarity attribute, which tends to increase the similarities of entities within the same set and decrease the similarities of entities within different sets [

_{i}_{kj}

A standard ABC algorithm was proposed in Karaboga et al. [

where (_{j} min_{j} max

where: _{i}_{i}

That onlooker selects solutions with certainty. This certainty means that correlation exists between the fitness value of a source of food and the bees employed. The probability of fitness is as in

In this regard, when several trials of the solution are not changing, the “limit,” is reached, and employee bees are converted to scout bees to abandon their solutions. Accordingly,

The ABC has some disadvantages in terms of unbalanced search behaviour, and this has become a common limitation with other optimization algorithms. The authors used an improved ABC based mean global best (MeanABC) [

where _{i, j}

The second concerns with switching the present mean position value of (

Based on the above:

AC-MeanABC is an unsupervised data clustering method that employs the ability of MeanABC to seek the best solution that comprises a set number of clusters and centroids’ location. The pseudo-code for AC-MeanABC algorithm is available in

The AC-MeanABC algorithm begins by initializing the

The values of

This phase of AC-MeanABC clustering algorithm includes finding the best

Here, _{j, i} is a randomly chosen jth parameter of the kth individual, and i is one of the _{i}_{j}_{i} ¡ fit_{j}; then, update the food source vector encodes, else, keep it as it is.

In the onlooker bee stage, each onlooker selects a source of food with a probability determined by the amount of nectar (_{i}_{i}_{i}_{i}

When trials of solutions are not changing, and it reached to “limit,” employee bees become scout bees and abandon their solutions. Here, is where, the scout bees begin new searches and solutions, randomly via

The degree of appropriateness or goodness is measured by its fitness value of each MeanABC solution. Each data is allocated to one or more clusters by using the fuzzy membership categories. These fuzzy membership values are determined using the fuzzy membership equation. Consequently, a cluster validity index is used as an indicator for the appropriateness of the clustering. This index is typically exercised to establish the quality of different solutions obtained using different settings in a particular clustering algorithm (or for solutions given by different algorithms).

In this paper, the cluster validity index of

where s is a constant number, and _{n}

where _{p}_{p}_{k}_{n}

The AC-MeanABC fuzzy clustering algorithm is conducted as a fully automatic data clustering to find the number of clusters in datasets. The experiments and results are represented in three parts. Firstly, the AC-MeanABC parameters setting and the most appropriate values are chosen. Secondly, AC-MeanABC is conducted on 11 benchmark clustering datasets selected from the UCI databases such as Iris, Ecoli, Wisconsin, Wine, Dermatology, Glass, Aggregation, R15, D31, Libras movement and Wholesaler customers. Some details about these datasets are given in

Datasets | Number of instances | Number of attributes | Number of clusters | Datasets | Number of instances | Number of attributes | Number of clusters |
---|---|---|---|---|---|---|---|

Iris | 150 | 4 | 3 | Aggregation | 788 | 2 | 7 |

Ecoli | 327 | 8 | 5 | R15 | 600 | 2 | 15 |

Wisconsin | 683 | 9 | 2 | D31 | 3100 | 2 | 31 |

Wine | 178 | 13 | 3 | Libras movement | 360 | 28 | 15 |

Dermatology | 358 | 34 | 6 | Wholesaler customers | 440 | 8 | 3/4/5 |

Glass | 214 | 9 | 6 |

The third part includes AC-MeanABC being conducted with five natural images obtained from the Berkeley1 dataset [

To obtain the best outcomes from any clustering optimization algorithm, the suitable selection of parameters is critical because these parameters are essential in the algorithm’s performance and accuracy. To determine the values of parameters of AC-MeanABC (i.e., Population size SN, limit, maximum cycle number (MCN), nonnegative constant parameter (C), and Termination criterion TC) have to be determined. The values of these parameters were set in

Approach | Population size SN (Np) | Limit | Maximum cycle number (MCN) | Nonnegative constant parameter C | Termination criterion TC |
---|---|---|---|---|---|

Benchmark data clustering (part1) | 20 | 50 | 500 | 1.43 | 2 |

Benchmark data clustering (part2) | 30 | 100 | 9000 | 1.43 | 2 |

Image clustering | 50 | 100 | 10000 | 1.43 | 2 |

Datasets | Np | Datasets | Np | ||||
---|---|---|---|---|---|---|---|

Iris | 5 | 0 | 1 | D31 | 56 | 1 | 2 |

Ecoli | 20 | 2 | 1 | Libras movement | 19 | 1 | 2.5 |

Wisconsin | 13 | 0 | 1 | Wholesaler customers | 1 | 0 | 0.8 |

Wine | 15 | 0 | 1 | Lena | 20 | 2 | 1 |

Dermatology | 20 | 2 | 1 | Jet | 20 | 2 | 1 |

Glass | 15 | 1 | 1.5 | MorroBay | 15 | 1 | 1 |

Aggregation | 28 | 1 | 2 | Mandril | 20 | 2 | 1 |

R15 | 24 | 1 | 2.5 | Peppe | 20 | 3 | 1 |

In this experiment, AC-MeanABC was performed with 11 benchmark clustering datasets, which were chosen from the UCI database. The benchmark datasets were divided into two parts, whereby part1 of the benchmark datasets included Glass, Aggregation, R15, D31, Libras movement and wholesaler customers, while part2 of the benchmark datasets included Iris, Ecoli, Wisconsin, Wine and Dermatology.

In this part, the AC-MeanABC outcomes were compared with the standard ABC algorithm and other related works such as iABC, AKCBCO, AKC-MEPSO, DCPG, DCPSO and DCGA [

Datasets | Measure | ACMeanABC | iABC | ABC | AKC-BCO | AKC-MEPSO | DCPG | DCPSO | DCGA |
---|---|---|---|---|---|---|---|---|---|

Glass | Average | 0.3644 | 0.3752 | 2.6411 | 2.1352 | 0.4700 | 0.5400 | 0.60 | |

SD | 0.0020 | 0.0017 | 0.0053 | 1.1412 | 1.0154 | 0.0500 | 0.0800 | 0.13 | |

Aggregation | Average | 0.3628 | 0.3706 | 2.3986 | 0.7543 | 0.6431 | 0.6311 | 0.52 | |

SD | 0.0121 | 0.0117 | 0.0159 | 0.7359 | 0.2752 | 0.0368 | 0.0445 | 0.00 | |

R15 | Average | 0.2753 | 0.2164 | 0.9885 | 0.3271 | 0.2800 | 0.2700 | 0.40 | |

SD | 0.0011 | 0.0013 | 0.0013 | 1.5429 | 0.0549 | 0.0500 | 0.0400 | 0.06 | |

D31 | Average | 0.3188 | 0.3056 | 3.8074 | 0.5769 | 0.4700 | 0.8670 | 0.42 | |

SD | 0.0030 | 0.0004 | 0.0004 | 2.6032 | 0.2293 | 0.1300 | 0.1900 | 0.01 | |

Libras movement | Average | 0.6270 | 0.6610 | 2.0972 | 1.8171 | 0.6262 | 0.6380 | 0.79 | |

SD | 0.0024 | 0.0079 | 0.0136 | 0.3840 | 0.1956 | 0.0311 | 0.0357 | 0.03 | |

Wholesaler customers | Average | 0.2040 | 0.2041 | 0.5104 | 0.4622 | 0.3010 | 0.2419 | 0.29 | |

SD | 0.0005 | 0.0008 | 0.0002 | 00535 | 0.0894 | 0.0665 | 0.0324 | 0.05 |

Datasets | #OC | Measure | AC-MeanABC | iABC | ABC | AKC-BCO | AKC-MEPSO | DCPG | DCPSO | DCGA |
---|---|---|---|---|---|---|---|---|---|---|

Glass | 6 | Average | 6.22 | 7.03 | 6.53 | 3 | 3.98 | 5.93 | 5.43 | 4.97 |

SD | 0.14 | 0.18 | 0.51 | 0 | 1.24 | 0.77 | 0.61 | 0.57 | ||

Aggregation | 7 | Average | 7.21 | 7.23 | 7.9 | 11.73 | 8.37 | 6.43 | 6.07 | 6.00 |

SD | 0.00 | 1.28 | 2.44 | 2.6 | 4.63 | 0.90 | 0.74 | 0.00 | ||

R15 | 15 | Average | 15 | 15 | 15 | 9.57 | 11.13 | 8.03 | 7.13 | 8.00 |

SD | 0 | 0 | 0 | 1.31 | 1.25 | 1.37 | 1.11 | 1.46 | ||

D31 | 31 | Average | 31.20 | 31.17 | 31.4 | 28.13 | 22.63 | 28.06 | 21.00 | 10.41 |

SD | 0.44 | 0.38 | 0.5 | 4.15 | 2.72 | 6.24 | 3.16 | 0.91 | ||

Libras movement | 15 | Average | 15 | 15.73 | 15.37 | 4.2 | 4.13 | 6.87 | 6.33 | 8.55 |

SD | 0.12 | 0.87 | 1.1 | 0.61 | 0.35 | 0.97 | 1.24 | 0.78 | ||

Wholesaler customers | 3 | Average | 3.28 | 6.87 | 6 | 3.37 | 3.43 | 7.73 | 6.30 | 3.66 |

SD | 0.001 | 0.57 | 0 | 0.49 | 0.57 | 2.21 | 2.02 | 0.72 |

In

Variable | Mean rank |
---|---|

iABC | 2 |

ABC | 2.8 |

DCPG | 3.62 |

DCGA | 3.8 |

DCPSO | 4 |

AKC-MEPSO | 5.92 |

AKC-BSO | 6.97 |

In this part, the AC-MeanABC outcomes were compared with other related works such as discrete binary artificial bee colony DisABC, GA based clustering algorithms, improved discrete binary artificial bee colony IDisABC [

Datasets | AC-MeanABC | IDisABC | DisABC | GA | DCPSO |
---|---|---|---|---|---|

Average (SD) | Average (SD) | Average (SD) | Average (SD) | Average (SD) | |

Iris | 0.0974 (0.010) | 0.0982 (0.010) | 0.1182 (0.020) | 0.1042 (0.013) | |

Ecoli | 0.3073 (0.057) | 0.3841 (0.069) | 0.5351 (0.176) | 0.3691 (0.086) | |

Wisconsin | 0.135 (0.022) | 0.135 (0.022) | 0.1424 (0.039) | 0.1368 (0.018) | |

Wine | 0.3251 (0.035) | 0.3365 (0.029) | 0.4426 (0.145) | 0.3518 (0.055) | |

Dermatology | 0.3968 (0.043) | 0.4328 (0.048) | 0.5717 (0.138) | 0.4600 (0.045) |

Datasets | #OC | AC-MeanABC | IDisABC | DisABC | GA | DCPSO |
---|---|---|---|---|---|---|

Average (SD) | Average (SD) | Average (SD) | Average (SD) | Average (SD) | ||

Iris | 3 | |||||

Ecoli | 5 | 5.166 (0.379) | 5.333 (0.606) | 7.7 (1.600) | 5.866 (0.776) | |

Wisconsin | 2 | 2.133 (0.681) | 2.066 (0.253) | |||

Wine | 3 | 3.3 (0.534) | 3.4 (0.498) | 4.3 (1.557) | 3.9 (0.712) | |

Dermatology | 6 | 5.56 (0.504) | 5.533 (0.571) | 6.966 (1.790) |

To show the significance of improvement of the AC-MeanABC, Friedman test was performed and the results are shown in

Variable | Mean rank |
---|---|

(2) IDisABC | 2 |

(3) DisABC | 3 |

(4) DCPSO | 4 |

(5) GA | 5 |

In this experiment, five natural images were obtained from the Berkeley1 segmentation dataset [

Images | AC-MeanABC | IDisABC | DisABC | GA | DCPSO |
---|---|---|---|---|---|

Average (SD) | Average (SD) | Average (SD) | Average (SD) | Average (SD) | |

Lena | 0.0982 (0.0118) | 0.1032 (0.0141) | 0.1395 (0.0259) | 0.1126 (0.0130) | |

Jet | 0.0922 (0.0209) | 0.0959 (0.0305) | 0.1517 (0.0595) | 0.1366 (0.1183) | |

MorroBay | 0.0789 (0.0093) | 0.0875 (0.0125) | 0.1268 (0.0395) | 0.1053 (0.0339) | |

Mandrill | 0.1043 (0,0109) | 0.1045 (0,0111) | 0.1419 (0.0324) | 0.1077 (0,0115) | |

Pepper | 0.1081 (0.0142) | 0.1201 (0.0150) | 0.1662 (0.0595) | 0.1323 (0.0460) |

Images | #AC | AC-MeanABC | IDisABC | DisABC | GA | DCPSO |
---|---|---|---|---|---|---|

Average (SD) | Average (SD) | Average (SD) | Average (SD) | Average (SD) | ||

Lena | 6 | 5.9 (0.922) | 5.666 (0.546) | 7.1 (1.516) | 6.695 (1.063) | |

Jet | 6 | 5.733 (0.868) | 5.5 (0.861) | 6.733 (1.680) | 6.033 (0.964) | |

MorroBay | 4 | 4.333 (0.479) | 4.333 (0.479) | 4.621 (1.146) | 4.333 (0.546) | |

Mandrill | 6 | 5.950 (0.520) | 6.166 (1.0199) | 7.4 (1.652) | 6.9 (1.471) | |

Pepper | 7 | 6.733 (0.691) | 6.566 (0.568) | 7.466 (1.105) | 7.333 (1.212) |

In

Variable | Mean rank |
---|---|

(2) IDisABC | 2 |

(3) DisABC | 3 |

(4) DCPSO | 4 |

(5) GA | 5 |

In this paper, the automatic fuzzy clustering based on the MeanABC search method called AC-MeanABC was proposed to solve the challenges of determining the number of clusters (region) and cluster centroids. AC-MeanABC clustering method used the capability of the MeanABC algorithm to explore the search space in positive and negative directions to search for the near-optimal number of clusters and centroids values. The experiments and results were obtained using 11 benchmark datasets and 5 natural images. These experiments compared the AC-MeanABC with other clustering methods such as iABC, ABC, AKCBCO, AKC-MEPSO, DCPG, DCGA IDisABC, DisABC, DCPSO, and GA. In conclusion, the clustering results of the AC-MeanABC are better than those of the state-of-the-art techniques in determining the optimal number of clusters and the value of validity index VI.

Thanks to our families & colleagues who supported us morally.