Improved KNN Imputation for Missing Values in Gene Expression Data

Phimmarin Keerin; Tossapon Boongoen

doi:10.32604/cmc.2022.020261

Open Access icon Open Access

ARTICLE

Improved KNN Imputation for Missing Values in Gene Expression Data

Phimmarin Keerin¹, Tossapon Boongoen^2,*

1 Faculty of Science and Technology, Pibulsongkram Rajabhat University, Thailand
2 Center of Excellence in Artificial Intelligence and Emerging Technologies, School of Information Technology, Mae Fah Luang University, Chiang Rai 57100, Thailand

* Corresponding Author: Tossapon Boongoen. Email: email

(This article belongs to the Special Issue: Digital Technology and Artificial Intelligence in Medicine and Dentistry)

Computers, Materials & Continua 2022, 70(2), 4009-4025. https://doi.org/10.32604/cmc.2022.020261

Received 17 May 2021; Accepted 12 July 2021; Issue published 27 September 2021

Abstract

The problem of missing values has long been studied by researchers working in areas of data science and bioinformatics, especially the analysis of gene expression data that facilitates an early detection of cancer. Many attempts show improvements made by excluding samples with missing information from the analysis process, while others have tried to fill the gaps with possible values. While the former is simple, the latter safeguards information loss. For that, a neighbour-based (KNN) approach has proven more effective than other global estimators. The paper extends this further by introducing a new summarization method to the KNN model. It is the first study that applies the concept of ordered weighted averaging (OWA) operator to such a problem context. In particular, two variations of OWA aggregation are proposed and evaluated against their baseline and other neighbor-based models. Using different ratios of missing values from 1%–20% and a set of six published gene expression datasets, the experimental results suggest that new methods usually provide more accurate estimates than those compared methods. Specific to the missing rates of 5% and 20%, the best NRMSE scores as averages across datasets is 0.65 and 0.69, while the highest measures obtained by existing techniques included in this study are 0.80 and 0.84, respectively.

Keywords

Gene expression; missing value; imputation; KNN; OWA operator

Cite This Article

APA Style

Keerin, P., Boongoen, T. (2022). Improved KNN Imputation for Missing Values in Gene Expression Data. Computers, Materials & Continua, 70(2), 4009–4025. https://doi.org/10.32604/cmc.2022.020261

Vancouver Style

Keerin P, Boongoen T. Improved KNN Imputation for Missing Values in Gene Expression Data. Comput Mater Contin. 2022;70(2):4009–4025. https://doi.org/10.32604/cmc.2022.020261

IEEE Style

P. Keerin and T. Boongoen, “Improved KNN Imputation for Missing Values in Gene Expression Data,” Comput. Mater. Contin., vol. 70, no. 2, pp. 4009–4025, 2022. https://doi.org/10.32604/cmc.2022.020261

BibTex EndNote RIS

Citations

1

[click to view]

Copyright © 2022 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Improved KNN Imputation for Missing Values in Gene Expression Data

Abstract

Keywords

Cite This Article

Citations

4223

2705

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link