Vol.41, No.3, 2022, pp.1027-141, doi:10.32604/csse.2022.020634
OPEN ACCESS
ARTICLE
Clustering Gene Expression Data Through Modified Agglomerative M-CURE Hierarchical Algorithm
  • E. Kavitha1,*, R. Tamilarasan2, N. Poonguzhali3, M. K. Jayanthi Kannan4
1 A Constituent College of Anna University, University College of Engineering, Villupuram, 605103, India
2 A Constituent College of Anna University, University College of Engineering, Pattukkottai, 614701, India
3 Department of Computer Science and Engineering, Manakula Vinayagar Institute of Technology, Puducherry, 605107, India
4 Department of Computer Science Engineering, Faculty of Engineering and Technology, JAIN (Deemed-To-Be University), Bangalore, 562112, India
* Corresponding Author: E. Kavitha. Email:
Received 01 June 2021; Accepted 11 July 2021; Issue published 10 November 2021
Abstract
Gene expression refers to the process in which the gene information is used in the functional gene product synthesis. They basically encode the proteins which in turn dictate the functionality of the cell. The first step in gene expression study involves the clustering usage. This is due to the reason that biological networks are very complex and the genes volume increases the comprehending challenges along with the data interpretation which itself inhibit vagueness, noise and imprecision. For a biological system to function, the essential cellular molecules must interact with its surrounding including RNA, DNA, metabolites and proteins. Clustering methods will help to expose the structures and the patterns in the original data for taking further decisions. The traditional clustering techniques involve hierarchical, model based, partitioning, density based, grid based and soft clustering methods. Though many of these methods provide a reliable output in clustering, they fail to incorporate huge data of gene expressions. Also, there are statistical issues along with choosing the right method and the choice of dissimilarity matrix when dealing with gene expression data. We propose to use a modified clustering algorithm using representatives (M-CURE) in this work which is more robust to outliers as compared to K-means clustering and also able to find clusters with size variances.
Keywords
Clustering; gene identifiers; representatives; dimension reduction
Cite This Article
Kavitha, E., Tamilarasan, R., Poonguzhali, N., K., M. (2022). Clustering Gene Expression Data Through Modified Agglomerative M-CURE Hierarchical Algorithm. Computer Systems Science and Engineering, 41(3), 1027–141.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.