Open Access
ARTICLE
A Hybrid Feature Selection and Clustering-Based Ensemble Learning Approach for Real-Time Fraud Detection in Financial Transactions
1 Department of Management Information Systems (MIS), School of Business, King Faisal University (KFU), Al-Ahsa, 31982, Saudi Arabia
2 Department of Network Technology, College of Information Engineering, Jinhua University of Vocational Technology, Jinhua, 321017, China
3 Department of Network Technology, Jinhua Polytechnic Wintec International College, Waikato Institute of Technology (Wintec), Hamilton, 3204, New Zealand
* Corresponding Author: Naif Almusallam. Email:
(This article belongs to the Special Issue: Advanced Algorithms for Feature Selection in Machine Learning)
Computers, Materials & Continua 2025, 85(2), 3653-3687. https://doi.org/10.32604/cmc.2025.067220
Received 27 April 2025; Accepted 23 July 2025; Issue published 23 September 2025
Abstract
This paper proposes a novel hybrid fraud detection framework that integrates multi-stage feature selection, unsupervised clustering, and ensemble learning to improve classification performance in financial transaction monitoring systems. The framework is structured into three core layers: (1) feature selection using Recursive Feature Elimination (RFE), Principal Component Analysis (PCA), and Mutual Information (MI) to reduce dimensionality and enhance input relevance; (2) anomaly detection through unsupervised clustering using K-Means, Density-Based Spatial Clustering (DBSCAN), and Hierarchical Clustering to flag suspicious patterns in unlabeled data; and (3) final classification using a voting-based hybrid ensemble of Support Vector Machine (SVM), Random Forest (RF), and Gradient Boosting Classifier (GBC). The experimental evaluation is conducted on a synthetically generated dataset comprising one million financial transactions, with 5% labelled as fraudulent, simulating realistic fraud rates and behavioural features, including transaction time, origin, amount, and geo-location. The proposed model demonstrated a significant improvement over baseline classifiers, achieving an accuracy of 99%, a precision of 99%, a recall of 97%, and an F1-score of 99%. Compared to individual models, it yielded a 9% gain in overall detection accuracy. It reduced the false positive rate to below 3.5%, thereby minimising the operational costs associated with manually reviewing false alerts. The model’s interpretability is enhanced by the integration of Shapley Additive Explanations (SHAP) values for feature importance, supporting transparency and regulatory auditability. These results affirm the practical relevance of the proposed system for deployment in real-time fraud detection scenarios such as credit card transactions, mobile banking, and cross-border payments. The study also highlights future directions, including the deployment of lightweight models and the integration of multimodal data for scalable fraud analytics.Keywords
Cite This Article
Copyright © 2025 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools