Ensemble Machine Learning Framework for PFAS Risk Screening in Public Water Systems

Menahil Rahman¹, Waqas Ishtiaq², Amerah Alabrah^3,*, Arif Mehmood⁴, Rana Faraz Ahmed⁴, Iqra Khalid⁵, Farhan Amin^6,*
1 College of Medicine, University of Cincinnati, Cincinnati, OH, USA
2 Lindner College of Business, University of Cincinnati, Cincinnati, OH, USA
3 Department of Information Systems College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
4 Department of Information Security, The Islamia University of Bahawalpur, Bahawalpur, Punjab, Pakistan
5 Department of Biochemistry & Biotechnology, The Islamia University of Bahawalpur, Bahawalpur, Punjab, Pakistan
6 School of Computer Science and Engineering, Yeungnam University, Gyeongsan, Republic of Korea
* Corresponding Author: Amerah Alabrah. Email: email ; Farhan Amin. Email: email
(This article belongs to the Special Issue: Explainable AI, Digital Twin, and Hybrid Deep Learning Approaches for Urban–Regional Hydrology, Water Quality, and Risk Modeling under Uncertainty)

Computer Modeling in Engineering & Sciences https://doi.org/10.32604/cmes.2026.078549

Received 03 January 2026; Accepted 30 March 2026; Published online 27 April 2026

Download PDF

Abstract

Access to safe drinking water is a fundamental determinant of global health. The presence of contaminated water affects the citizens’ health. Per- and polyfluoroalkyl substances (PFAS) are often referred to as forever chemicals. They pose a persistent and growing threat to drinking water. In the literature, machine learning methods are used to identify the forever chemicals in water. However, traditional methods are not efficient and scalable. Thus, to solve this issue. This study develops a large-scale machine-learning framework for PFAS risk screening in US public water systems. The proposed framework incorporates data ingestion, preprocessing, and feature engineering. We have used SMOTE for correcting imbalanced data. We performed experimentation and also evaluated our ensemble-based framework integrating Gradient boosting, bagging, and meta-learning strategies. The proposed framework achieves a maximum ROC-AUC of 0.9574, with the best-performing stacking ensemble achieving a precision of 0.75, a recall of 0.68, and an F1-score of 0.71. The simulation results show that the proposed ensemble learning framework is useful for screening and identifying water systems.