Subspace Clustering in High-Dimensional Data Streams: A Systematic Literature Review

Nur Laila; Izzatdin Aziz; Said AbdulKadir

doi:10.32604/cmc.2023.035987

Open Access icon Open Access

REVIEW

Subspace Clustering in High-Dimensional Data Streams: A Systematic Literature Review

Nur Laila Ab Ghani^1,2,*, Izzatdin Abdul Aziz^1,2, Said Jadid AbdulKadir^1,2

1 Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, Seri Iskandar, 32610, Perak, Malaysia
2 Centre for Research in Data Science (CeRDAS), Universiti Teknologi PETRONAS, Seri Iskandar, 32610, Perak, Malaysia

* Corresponding Author: Nur Laila Ab Ghani. Email: email

Computers, Materials & Continua 2023, 75(2), 4649-4668. https://doi.org/10.32604/cmc.2023.035987

Received 13 September 2022; Accepted 12 November 2022; Issue published 31 March 2023

Abstract

Clustering high dimensional data is challenging as data dimensionality increases the distance between data points, resulting in sparse regions that degrade clustering performance. Subspace clustering is a common approach for processing high-dimensional data by finding relevant features for each cluster in the data space. Subspace clustering methods extend traditional clustering to account for the constraints imposed by data streams. Data streams are not only high-dimensional, but also unbounded and evolving. This necessitates the development of subspace clustering algorithms that can handle high dimensionality and adapt to the unique characteristics of data streams. Although many articles have contributed to the literature review on data stream clustering, there is currently no specific review on subspace clustering algorithms in high-dimensional data streams. Therefore, this article aims to systematically review the existing literature on subspace clustering of data streams in high-dimensional streaming environments. The review follows a systematic methodological approach and includes 18 articles for the final analysis. The analysis focused on two research questions related to the general clustering process and dealing with the unbounded and evolving characteristics of data streams. The main findings relate to six elements: clustering process, cluster search, subspace search, synopsis structure, cluster maintenance, and evaluation measures. Most algorithms use a two-phase clustering approach consisting of an initialization stage, a refinement stage, a cluster maintenance stage, and a final clustering stage. The density-based top-down subspace clustering approach is more widely used than the others because it is able to distinguish true clusters and outliers using projected micro-clusters. Most algorithms implicitly adapt to the evolving nature of the data stream by using a time fading function that is sensitive to outliers. Future work can focus on the clustering framework, parameter optimization, subspace search techniques, memory-efficient synopsis structures, explicit cluster change detection, and intrinsic performance metrics. This article can serve as a guide for researchers interested in high-dimensional subspace clustering methods for data streams.

Keywords

Clustering; subspace clustering; projected clustering; data stream; stream clustering; high dimensionality; evolving data stream; concept drift

Cite This Article

APA Style

Ab Ghani, N.L., Aziz, I.A., AbdulKadir, S.J. (2023). Subspace Clustering in High-Dimensional Data Streams: A Systematic Literature Review. Computers, Materials & Continua, 75(2), 4649–4668. https://doi.org/10.32604/cmc.2023.035987

Vancouver Style

Ab Ghani NL, Aziz IA, AbdulKadir SJ. Subspace Clustering in High-Dimensional Data Streams: A Systematic Literature Review. Comput Mater Contin. 2023;75(2):4649–4668. https://doi.org/10.32604/cmc.2023.035987

IEEE Style

N. L. Ab Ghani, I. A. Aziz, and S. J. AbdulKadir, “Subspace Clustering in High-Dimensional Data Streams: A Systematic Literature Review,” Comput. Mater. Contin., vol. 75, no. 2, pp. 4649–4668, 2023. https://doi.org/10.32604/cmc.2023.035987

BibTex EndNote RIS

Copyright © 2023 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Subspace Clustering in High-Dimensional Data Streams: A Systematic Literature Review

Abstract

Keywords

Cite This Article

2163

1474

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link