Open Access iconOpen Access



A Work Review on Clinical Laboratory Data Utilizing Machine Learning Use-Case Methodology

Uma Ramasamy*, Sundar Santhoshkumar

Department of Computer Science, Alagappa University, Karaikudi, Tamil Nadu, India

* Corresponding Author: Uma Ramasamy. Email: email

Journal of Intelligent Medicine and Healthcare 2024, 2, 1-14.


More than 140 autoimmune diseases have distinct autoantibodies and symptoms, and it makes it challenging to construct an appropriate model using Machine Learning (ML) for autoimmune disease. Arthritis-related autoimmunity requires special attention. Although many conventional biomarkers for arthritis have been established, more biomarkers of arthritis autoimmune diseases remain to be identified. This review focuses on the research conducted using data obtained from clinical laboratory testing of real-time arthritis patients. The collected data is labelled the Arthritis Profile Data (APD) dataset. The APD dataset is the retrospective data with many missing values. We undertook a comprehensive APD dataset study comprising four key steps. Initially, we identified suitable imputation techniques for the APD dataset. Subsequently, we conducted a comparative analysis with different benchmark disease datasets. We determined the most effective ML model for the APD dataset. Finally, identified the hidden biomarkers in the APD dataset. We applied various imputation techniques to handle these missing data on the APD dataset, and the best imputation techniques were determined using the degree of proximity (DoP) and degree of residual (DoR) procedure. The random value imputer and mode imputer are the suitable imputation techniques identified. Different benchmark disease datasets were compared using different hold-out (HO) methods and cross-validation (CV) folds, which highlights that the dataset properties significantly impact the performance of ML models. Random Forest (RndF) and XGBoost (XGB) are the best performing ML algorithms for most diseases, with accuracy consistently above 80%. The appropriate ML model for the APD dataset is the XGB (Extreme Gradient Boosting). Moreover, using the XGB feature importance concept significant features were identified for the APD dataset. The substantial and hidden biomarkers identified were Erythrocyte Sedimentation Rate (ESR), Antistreptolysin O (ASO), C-Reactive Protein (CRP), Rheumatoid Factor (RF), Lymphocytes (L), Absolute Eosinophil count (Abs), Uric_Acid, Red Blood Cell count (RBC), and Blood for Total Count (TC).


Cite This Article

Ramasamy, U., Santhoshkumar, S. (2024). A Work Review on Clinical Laboratory Data Utilizing Machine Learning Use-Case Methodology. Journal of Intelligent Medicine and Healthcare, 2(1), 1–14.

cc This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 216


  • 121


  • 0


Share Link