Vol.35, No.1, 2023, pp.553-566, doi:10.32604/iasc.2023.027579
OPEN ACCESS
ARTICLE
P-ROCK: A Sustainable Clustering Algorithm for Large Categorical Datasets
  • Ayman Altameem1, Ramesh Chandra Poonia2, Ankit Kumar3, Linesh Raja4, Abdul Khader Jilani Saudagar5,*
1 Department of Computer Science and Engineering, College of Applied Studies and Community Services, King Saud University, Riyadh, 11533, Saudi Arabia
2 Department of Computer Science, CHRIST (Deemed to be University), Bangalore, 560029, India
3 Department of Computer Engineering and Applications, GLA University, Mathura, UP, India
4 Department of Computer Application, Manipal University Jaipur, Rajasthan, 303007, India
5 Information Systems Department, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, 11432, Saudi Arabia
* Corresponding Author: Abdul Khader Jilani Saudagar. Email:
Received 20 January 2022; Accepted 02 March 2022; Issue published 06 June 2022
Abstract
Data clustering is crucial when it comes to data processing and analytics. The new clustering method overcomes the challenge of evaluating and extracting data from big data. Numerical or categorical data can be grouped. Existing clustering methods favor numerical data clustering and ignore categorical data clustering. Until recently, the only way to cluster categorical data was to convert it to a numeric representation and then cluster it using current numeric clustering methods. However, these algorithms could not use the concept of categorical data for clustering. Following that, suggestions for expanding traditional categorical data processing methods were made. In addition to expansions, several new clustering methods and extensions have been proposed in recent years. ROCK is an adaptable and straightforward algorithm for calculating the similarity between data sets to cluster them. This paper aims to modify the algorithm by creating a parameterized version that takes specific algorithm parameters as input and outputs satisfactory cluster structures. The parameterized ROCK algorithm is the name given to the modified algorithm (P-ROCK). The proposed modification makes the original algorithm more flexible by using user-defined parameters. A detailed hypothesis was developed later validated with experimental results on real-world datasets using our proposed P-ROCK algorithm. A comparison with the original ROCK algorithm is also provided. Experiment results show that the proposed algorithm is on par with the original ROCK algorithm with an accuracy of 97.9%. The proposed P-ROCK algorithm has improved the runtime and is more flexible and scalable.
Keywords
ROCK; K-means algorithm; clustering approaches; unsupervised learning; K-histogram
Cite This Article
A. Altameem, R. Chandra Poonia, A. Kumar, L. Raja and A. Khader Jilani Saudagar, "P-rock: a sustainable clustering algorithm for large categorical datasets," Intelligent Automation & Soft Computing, vol. 35, no.1, pp. 553–566, 2023.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.