Open AccessOpen Access


Data and Ensemble Machine Learning Fusion Based Intelligent Software Defect Prediction System

Sagheer Abbas1, Shabib Aftab1,2, Muhammad Adnan Khan3,4, Taher M. Ghazal5,6, Hussam Al Hamadi7, Chan Yeob Yeun8,*

1 School of Computer Science, National College of Business Administration & Economics, Lahore, 54000, Pakistan
2 Department of Computer Science, Virtual University of Pakistan, Lahore, 54000, Pakistan
3 Department of Software, Faculty of Artificial Intelligence and Software, Gachon University, Seongnam, 13120, Korea
4 Riphah School of Computing & Innovation, Faculty of Computing, Riphah International University, Lahore Campus, Lahore, 54000, Pakistan
5 School of Information Technology, Skyline University College, University City Sharjah, Sharjah, UAE
6 Center for Cyber Security, Faculty of Information Science and Technology, UKM, Bangi, Selangor, 43600, Malaysia
7 College of Engineering and IT, University of Dubai, 14143, UAE
8 EECS Department, Center for Cyber Physical Systems, Khalifa University, Abu Dhabi, 127788, UAE

* Corresponding Author: Chan Yeob Yeun. Email:

Computers, Materials & Continua 2023, 75(3), 6083-6100.


The software engineering field has long focused on creating high-quality software despite limited resources. Detecting defects before the testing stage of software development can enable quality assurance engineers to concentrate on problematic modules rather than all the modules. This approach can enhance the quality of the final product while lowering development costs. Identifying defective modules early on can allow for early corrections and ensure the timely delivery of a high-quality product that satisfies customers and instills greater confidence in the development team. This process is known as software defect prediction, and it can improve end-product quality while reducing the cost of testing and maintenance. This study proposes a software defect prediction system that utilizes data fusion, feature selection, and ensemble machine learning fusion techniques. A novel filter-based metric selection technique is proposed in the framework to select the optimum features. A three-step nested approach is presented for predicting defective modules to achieve high accuracy. In the first step, three supervised machine learning techniques, including Decision Tree, Support Vector Machines, and Naïve Bayes, are used to detect faulty modules. The second step involves integrating the predictive accuracy of these classification techniques through three ensemble machine-learning methods: Bagging, Voting, and Stacking. Finally, in the third step, a fuzzy logic technique is employed to integrate the predictive accuracy of the ensemble machine learning techniques. The experiments are performed on a fused software defect dataset to ensure that the developed fused ensemble model can perform effectively on diverse datasets. Five NASA datasets are integrated to create the fused dataset: MW1, PC1, PC3, PC4, and CM1. According to the results, the proposed system exhibited superior performance to other advanced techniques for predicting software defects, achieving a remarkable accuracy rate of 92.08%.


Cite This Article

S. Abbas, S. Aftab, M. A. Khan, T. M. Ghazal, H. A. Hamadi et al., "Data and ensemble machine learning fusion based intelligent software defect prediction system," Computers, Materials & Continua, vol. 75, no.3, pp. 6083–6100, 2023.

This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 395


  • 193


  • 0


Share Link