Weighted Particle Swarm Clustering Algorithm for Self-Organizing Maps

The traditional K-means clustering algorithm is difficult to determine the cluster number, which is sensitive to the initialization of the clustering center and easy to fall into local optimum. This paper proposes a clustering algorithm based on self-organizing mapping network and weight particle swarm optimization SOM&WPSO (Self-Organization Map and Weight Particle Swarm Optimization). Firstly, the algorithm takes the competitive learning mechanism of a self-organizing mapping network to divide the data samples into coarse clusters and obtain the clustering center. Then, the obtained clustering center is used as the initialization parameter of the weight particle swarm optimization algorithm. The particle position of the WPSO algorithm is determined by the traditional clustering center is improved to the sample weight, and the cluster center is the “food” of the particle group. Each particle moves toward the nearest cluster center. Each iteration optimizes the particle position and velocity and uses K-means and K-medoids recalculates cluster centers and cluster partitions until the end of the algorithm convergence iteration. After a lot of experimental analysis on the commonly used UCI data set, this paper not only solves the shortcomings of K-means clustering algorithm, the problem of dependence of the initial clustering center, and improves the accuracy of clustering, but also avoids falling into the local optimum. The algorithm has good global convergence.


Introduction
With the advent of the era of big data and artificial intelligence, information has been growing exponentially. We are faced with a huge amount of text, video, pictures, and audio data. How to dig out the information with real value from the massive data, it has gradually become one of the research topics in the field of computer. Traditional data analysis, which relies on personal experience and teamwork to identify and determine results. It is not only unsatisfactory but also a waste of time and human resources. Therefore, data mining technology was born to help people extract valuable information from massive data. As an effective tool of data mining, clustering algorithm has been widely applied in many fields, including machine learning, pattern recognition, image analysis, information retrieval, computer vision, etc. Efficient data mining ability has attracted more and more attention [1].
As a common technical means in the field of data mining, so far, scholars at home and abroad have proposed many classic clustering algorithms, such as k-means [2] and k-medoids [3]. These algorithms are simple in calculation and fast in convergence. However, there are also two inherent disadvantages: (I) The determination of cluster number K, and the selection of K according to what index will directly affect the clustering accuracy; (II) Selection of initial clustering center. The clustering effect depends on the initialization of the clustering center.
In recent years, domestic and foreign scholars have proposed improvement schemes for the shortcomings of the K-means algorithm. For example, the x-means algorithm proposed by Pelleg [4] successfully solved the K value problem with the help of Bayesian information criteria (BIC) based on Kmeans. Park [5] chose a new center point and calculate the distance relationship between objects, which improved the efficiency of the algorithm, but failed to improve the clustering accuracy. Merwe et al. [6] proposed the clustering algorithm of particle swarm optimization (PSO)and K-means fusion, which effectively improved the convergence speed and the effectiveness of clustering to a certain extent.

Related Research
As an important branch of data mining, clustering algorithm has been studied for many years. A large number of clustering algorithms have emerged so far. However, since the data itself has its form and dimension, no algorithm is universal to the data, and all kinds of algorithms have some defects. For example, some clustering algorithms have significant results on middle and low dimensional data but do not perform well in high-dimensional data. Some clustering algorithms can only deal with the data of special distribution structure and cannot deal with the data of other distribution well.
These defects require scalability of the algorithm, the ability to process different types of data, and the ability to discover clusters of various shapes to solve "noise" and outliers. Traditional clustering algorithms have been unable to solve the above problems. Some scholars conducted clustering by integrating swarm intelligent optimization algorithms and found that a better clustering effect could be achieved.

Traditional Clustering Algorithm
Traditional clustering algorithms are generally based on partition, hierarchy, density, grid, and model clustering algorithms.
The clustering algorithm based on partition divides the sample set into several disjoint clusters according to the distance rule, and iterates until the target function stops the clustering division at the minimum. This method is easy to implement and converges quickly, but its complexity is linearly related to sample size, sample dimension and clustering center. Its representative algorithms include K-means [2], K-medoids [3], CLARANS [7], etc.
Hierarchical clustering algorithms can be divided into agglomerating hierarchical clustering and splitting hierarchical clustering. Early clustering algorithms include AGNES and DIANA clustering proposed by Kaufman et al. [8]. Afterward, BIRCH algorithm proposed by Zhang et al. [9] made use of clustering features and clustering feature trees for hierarchical clustering. Guha et al. CURE [10] algorithm, ROCK [11] algorithm, and Karypis et al. CHAMELEON [12] algorithm are also three famous clustering algorithms.
Compared with partition clustering and hierarchical clustering, the density-based clustering algorithm is not only applicable to convex sample sets but also can find clusters of various shapes and sizes in noisy data. DBSCAN [13] algorithm is a very typical density clustering algorithm, which requires two parameters: distance parameter and density threshold parameter, and divides the "density reachable" samples in space into one class. OPTICS [14], as an extension of DBSCAN, has improved the sensitivity to parameter Settings of DBSCAN.
The grid-based clustering algorithm quantifies the object space into a finite number of units, which form the network structure on which all clustering operations are carried out. STING [15] algorithm is a typical representative of the grid-based clustering algorithm. CLIQUE [16] combines the idea of the grid and density clustering, and it can cluster large-scale high-dimensional data.
The model-based clustering algorithm uses a statistical model and neural network to obtain clustering distribution information of data. The statistical method includes COBWEB algorithm. The network neural method has self-organizing maps (SOM) algorithm.

Integrated Clustering Research
In recent years, the shortcomings of traditional clustering algorithms are gradually exposed. The selection of the initial clustering center is sensitive, the number of clusters is difficult to determine, and the high requirements on data format will be a huge challenge to the clustering field. The integrated clustering algorithm improves the accuracy of the algorithm, avoids falling into local optimization, and has better convergence. Therefore, the integrated clustering algorithm provides stronger robustness and stability in different fields and data. Because the K-means algorithm is based on the extreme value of fitness function to optimize the objective function, it is prone to fall into local optimization and low efficiency in processing massive data. In recent years, the improvement of this algorithm is a hot research topic in the field of clustering. A clustering algorithm combining genetic algorithm, PSO algorithm, artificial immune algorithm, ant algorithm and its related improvement algorithm with K-means has emerged.
In 2002, Omran et al. [17] proposed an unguided image classification algorithm based on particle swarm optimization, which is the origin of the PSO clustering algorithm. In terms of the disadvantages of traditional clustering algorithms, PSO optimization can achieve certain results. For example, Gehad Ismail Sayed et al. [18] proposed an algorithm based on hybrid particle swarm optimization and K-means to remove residual stains and interphase cells from metaphase chromosome images, so that they were only concentrated on chromosomes, and the segmentation accuracy reached 95%. Liu et al. [19] proposed a new particle cluster clustering algorithm with good global convergence, which not only effectively overcomes the problem that the traditional K-means algorithm is prone to fall into the local minimum and is sensitive to the initial value, but also has a fast convergence rate. Literature [20] proposed a PSO hybrid K-means clustering algorithm and realized MPI based parallelization of the hybrid clustering algorithm to improve the execution efficiency of the algorithm. Literature [21] adopts the classical particle swarm optimization algorithm to improve the initial clustering center of the K-means algorithm and improve the accuracy of clustering results. Literature [22] studied the K-means clustering algorithm based on the improved particle swarm optimization algorithm, and processed particles trapped in local extreme values to make them jump out of the local optimal solution. Although the algorithm inherited the global search ability of the PSO algorithm, it did not fully and effectively utilize the local search ability of the K-means algorithm.

SOM-WPSO Model
This paper studied the traditional PSO-Kmeans clustering algorithm and found that the particle position was composed of the clustering center, the particle swarm size was manually set. And the particle moving position every time was a process of optimizing the clustering center, ignoring the optimization of sample weight, resulting in low efficiency of the algorithm and no obvious improvement of the clustering effect.
Based on PSO-Kmeans, this paper proposes the weight particle cluster clustering algorithm-WPSO. It integrates the self-organizing and adaptive characteristics of SOM, which not only solves the selection of cluster number and initial cluster center but also avoids falling into local optimization, to achieve relatively high accuracy.
SOM algorithm has the function of dimensionality reduction and can effectively deal with the problem of outlier points, without complex differentiation, integration and other operations. However, the SOM algorithm also has disadvantages such as long training time and possible "dead neurons" in competitive learning.
WPSO algorithms still need to set the initial clustering center and cluster number. Traditional clustering cluster number by artificial selection, clustering cluster number is a very thorny problem, and the choice of initial clustering center tend to be selected at random or choice based on the density, distance, even the initial clustering center is likely to be isolated points, boundary point, the clustering algorithm easy to fall into local optimum, even an empty cluster problems.
Comprehensive SOM and WPSO algorithm, SOM first to coarse clustering of data, the data clustering situation in an unsaturated state. And SOM still iteration, the network will learn data distribution. When SOM reaches the number of iterations, the SOM network will return iterative training weights. The weight is based on the data of competitive learning, and then with the original data is analyzed by weight, it will get the winning neuron. Then clustering center and sample weights initialization WPSO parameters, the number of samples is the number of particles, the particle position as sample weight, particle fitness function for each particle and the center of the cluster the particle belongs to class Euclidean distance. Our goal is to minimum value fitness, the fitness value of the minimum mean particle near the clustering center is very close. WPSO after updating the weight and speed using the Kmeans clustering division again, using the neighbor's thought will mean clustering mapping to the most recent sample points as the clustering center. It can greatly reduce the effects of noise on the algorithm, also reduce the probability of empty cluster produce. The convergence speed of the algorithm is accelerated, and the algorithm flow chart is shown in Fig. 1

Self Organizing
Map SOM (Self-Organizing Map) was obtained by simulating the self-organization mapping of the human cerebral cortex to signals. On the one hand, SOM maps the input pattern of any dimension to a low-dimensional space, which not only reduces the vector dimension but also reduces the computational complexity of iterative training, while maintaining the original topological structure of the sample. On the other hand, the text feature and its neighborhood feature are adjusted by using its self-organizing mapping feature. SOM network structure is shown in Fig. 2. The input layer is mainly responsible for receiving external information. Each neuron of the input layer connects with the neuron of the competition layer for weights and then transmits the external information to the competition layer. The competitive layer is mainly responsible for the analysis of input information, acquiring winning neurons through competitive learning, and inhibiting the excitement of neighboring neurons. The core of the SOM algorithm is competitive learning and neighborhood weight adjustment. The formula is defined as follows: Formula (1) represents the inner product of text and neuron , and the subscript of the largest inner product is the winning neuron. Formula (2) is the domain radius of the winning neighbor. Formula (3) updates the weights of the winning neuron and the winning neighbor.

WPSO Algorithm
Traditional PSO clustering algorithm made cluster center as a particle position, calculated the weights of the sample with all the fitness value of particles, and then updated the particle's optimal location and the global optimal position. The number of iterations or fitness threshold algorithm is over, but this way of clustering is strongly dependent on the data pretreatment process. If this is not the same kind of data, the result of clustering is pointless, and the iterative process just moves the clustering center. There is no process of optimizing the weights of the sample, the calculation efficiency is not improved.
Inspired by the PSO clustering algorithm, this paper proposes a new clustering algorithm-WPSO (Weight of Particle Swarm Optimization), using the sample weight instead of particle position. The original clustering center as the particle's traction makes the particles toward the nearest near the particles. And in the iterative process, the velocity of particles is affected by the global optimal particles and individual optimal conditions, and the iteration stops when the optimal fitness value is reached.
The particle of WPSO adopts the encoding format of sample weight, it means the position of each particle is no longer composed of clustering center, but a sample represents a particle. The size of the particle swarm is determined by the number of samples. Besides position, particles also have velocity and fitness values. The particle encoding method is as follows: 1 , 2 , 3 , … … , represents the weight of each sample; 1 , 2 , 3 , … … , represents the velocity of the sample, i.e., particle velocity; ( , ) represents the fitness value of the particle from the nearest cluster center. Speed is: (4) Position: ( + 1) = ( ) + ( + 1) (5) The choice of the fitness function directly affects the convergence speed of the clustering algorithm and whether it can find the optimal solution. It has an overall understanding of clustering and the judgment of the correct rate of clustering results. Introducing the WPSO algorithm into the K-means algorithm, the criterion function for evaluating the clustering quality can be used as the fitness function of the particle swarm. The intra-class tightness MSE is used to indicate the quality of the cluster, the smaller the MSE, the better the clustering effect.
The fitness value of the particle represents the similarity between the data objects in each class. The smaller the fitness value, the closer the degree of binding of the data objects within the class, and the better the clustering effect. The fitness function can be expressed as: = (7) Although the moving direction of the particles is pulled by the cluster centroids, the completion of each iteration cannot fall to a specific sample point, which increases the difficulty of cluster centroids selection. To reduce the influence of noise points, this paper uses the idea of K-medoids to select the sample points closest to the cluster mean value as the cluster centroids after each K-means clustering is completed, which not only accelerates the convergence speed but also prevents the occurrence of empty clusters.
( , ) = { ( , )} (8) This formula represents the distance between the i-th cluster average and all samples during the t-th iteration, and the nearest sample is the cluster centroid within the cluster.

K-means Algorithm
In 1967, MacQueen proposed a classical clustering algorithm based on the partition-K-means algorithm, which is simple in calculation and fast in convergence. The algorithm randomly selected K sample points as the initial cluster centroids and then divided other samples into clusters nearest to K samples according to the nearest neighbor principle. After each iteration, the cluster centroids, namely the mean of all the samples in the cluster, was recalculated. The algorithm stops when the nearest cluster of all samples in the data set is not changed.
K-means algorithm steps are as follows: Input: The data set D containing n data objects and the number of clusters k. Output: A set of k clusters that satisfy the convergence of the clustering criterion function. k samples were randomly selected from the data set D as the initial cluster centroids , j = 1, 2, 3, ..., k.
Calculate the distance ( , ), I = 1, 2, 3, ..., n of each sample of the data set from the k cluster centroids.
Recalculate the cluster centroids: Calculate class cohesion:

Experimental Data
To verify the effectiveness and feasibility of the algorithm, Iris data set, Wine data set and Glass data set of UCI were used for experiments in this paper, the basic information was shown in Tab. 1 below.

Evaluation Criteria
Different clustering algorithms have different application scenarios, so we need a variety of evaluation criteria to analyze the merits of the algorithms. To verify that the accuracy of the proposed algorithm is higher than other algorithms, Purity, fitness value, Davies-Bouldin index, Dunn's index, and Silhouette coefficient were adopted as the evaluation criteria for clustering results.

Experimental Parameters
The clustering algorithm is not universally applicable to all data, so it is necessary to select the appropriate algorithm according to the data and set different parameters for different data. The specific parameters are shown in Tab. 2.

Result Analysis
To verify the cluster purity of the algorithm, the algorithm model SOM-WPSO was compared with SOM, PSO/K-means and K-means algorithms in terms of purity respectively, and the data set of each group is repeated 20 times. Take the highest accuracy comparison, as shown in Fig. 3.
It can be seen from Fig. 3 that in the comparison of the algorithm model, SOM worked best on lowdimensional data. On the medium-dimensional data set, the accuracy of the three algorithms is not much different. On the high-dimensional data set, K-means has the lowest accuracy. Combining the advantages of the three algorithm models, this paper improved the PSO algorithm into WPSO and used the PAM idea to make the cluster centroids fall on the specific sample, to avoid the cluster centroids, and found that the accuracy rate was greatly improved.  It can be seen from Tab. 3 that K-means has the lowest accuracy for low-dimensional Iris data set, medium-dimensional Wine data set, and high-dimensional Glass data set. This is because the K-means algorithm is very sensitive to the selection of the initial cluster centroids. The choice of K-means cluster centroids is random and will directly affect the clustering results. By introducing the PSO optimization algorithm into the K-means algorithm, it is found that the pso-km algorithm can eliminate the influence of the cluster centroids on the clustering result to some extent, and determine the cluster centroids by searching the global optimal position of the particle swarm. Based on the idea of pso-km, the algorithm in this paper changed the sample weight to improve the clustering accuracy and found that it was superior to other algorithms on the three data sets.
Tab. 4 shows that on the Iris data set and Wine data set fitness value is relatively small, on the Glass data set is relatively large, the fitness value fluctuations in different dimensions, since both the K-means algorithm and the pso-km algorithm are designed to optimize the cluster centroids, and the algorithm in this paper is to optimize the sample weights. At the same time, in this paper, the cluster centroids are selected by K-means re-clustering, and the idea of K-medoids is used to project the cluster mean to the nearest sample so that the distance of some samples from the cluster centroid becomes far. It also shows that the algorithm of this paper has a better clustering effect on medium-and low-dimensional data.
The experiment will also be compared from the Davies-Bouldin index, Dunn's index and Silhouette coefficient. The smaller the DB, the smaller the distance within the class, and the greater the distance between classes, that is, the smaller the DB, the better the clustering effect. A larger DI means that the distance within the class is smaller, the distance between classes is larger, and the clustering effect is better. The Silhouette coefficient is between [−1,1], and the closer to 1 means that the cohesion and resolution are relatively better.
Tab. 5 and Tab. 6 shows that the proposed algorithm is superior to other clustering algorithms on the three evaluation index, while performed worse on the Glass data set, which is also consistent with the comparison of the previous fitness values. The algorithm in this paper does improve the clustering accuracy of data, the clustering effect is good on low-and medium-dimensional data set, but the evaluation index for high-dimensional data clustering is not very good, and clustering results on highdimensional data set is not very stable, so the algorithm in this paper is not very suitable for highdimensional data.   To verify the convergence degree of the algorithm, this paper compared it with the pso-km algorithm on the Iris data set, as shown in Fig. 4    It can be seen from Fig. 4 that the convergence curve of the algorithm fluctuates greatly, and the particles randomly move in the global space until the position with the smallest fitness value is found, and the comparison is within the set threshold range, and if it exceeds the comparison number, it converges. Compared with the pso-km algorithm in Fig. 5, this algorithm converges quickly and does not converge slowly through stepwise iteration, this is due to the characteristics of the algorithm. The goal of this algorithm is to find the best accuracy rate through the fitness value and finally determine the position of the cluster centroids.

Summary
Aiming at the sensitivity of the K-means clustering algorithm to the initial cluster centroids, a combined clustering algorithm combining SOM and optimized particle swarm weights is proposed in this paper. The algorithm overcomes the shortcomings of traditional clustering algorithms. Although it is flawed for high-dimensional data, compared with K-means and pso-km algorithms, the accuracy of the clustering algorithm is improved and verifies that the algorithm in this paper is feasible and has a good clustering effect on the medium-dimensional data. Of course, this paper also has some shortcomings. Because the article aims to improve the accuracy of the algorithm and ignores the fitness value of the algorithm and the stability of the convergence curve. In the next work, how to choose a good cluster centroid to reduce the fitness value and let the convergence curve not fluctuate greatly will become the focus of work.

Funding Statement:
The author(s) received no specific funding for this study.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.