Open Access iconOpen Access


Analysis of CLARANS Algorithm for Weather Data Based on Spark

Jiahao Zhang, Honglin Wang*

College of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing, 210044, China

* Corresponding Author: Honglin Wang. Email: email

Computers, Materials & Continua 2023, 76(2), 2427-2441.


With the rapid development of technology, processing the explosive growth of meteorological data on traditional standalone computing has become increasingly time-consuming, which cannot meet the demands of scientific research and business. Therefore, this paper proposes the implementation of the parallel Clustering Large Application based upon RANdomized Search (CLARANS) clustering algorithm on the Spark cloud computing platform to cluster China’s climate regions using meteorological data from 1988 to 2018. The aim is to address the challenge of applying clustering algorithms to large datasets. In this paper, the morphological similarity distance is adopted as the similarity measurement standard instead of Euclidean distance, which improves clustering accuracy. Furthermore, the issue of local optima caused by an improper selection of initial clustering centers is addressed by utilizing the max-distance criterion. Compared to the k-means clustering algorithm already implemented in the Spark platform, the proposed algorithm has strong robustness, can reduce the interference of outliers in the dataset on clustering results, and has higher parallel performance than the frequently used serial algorithms, thus improving the efficiency of big data analysis. This experiment compares the clustered centroid data with the annual average meteorological data of representative cities in the five typical meteorological regions that exist in China, and the results show that the clustering results are in good agreement with the meteorological data obtained from the National Meteorological Science Data Center. This algorithm has a positive effect on the clustering analysis of massive meteorological data and deserves attention in scientific research activities.


Cite This Article

APA Style
Zhang, J., Wang, H. (2023). Analysis of CLARANS algorithm for weather data based on spark. Computers, Materials & Continua, 76(2), 2427-2441.
Vancouver Style
Zhang J, Wang H. Analysis of CLARANS algorithm for weather data based on spark. Comput Mater Contin. 2023;76(2):2427-2441
IEEE Style
J. Zhang and H. Wang, "Analysis of CLARANS Algorithm for Weather Data Based on Spark," Comput. Mater. Contin., vol. 76, no. 2, pp. 2427-2441. 2023.

cc This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 439


  • 236


  • 0


Share Link