Open Access iconOpen Access

ARTICLE

crossmark

Cross-Project Software Defect Prediction Based on SMOTE and Deep Canonical Correlation Analysis

Xin Fan1,2, Shuqing Zhang1,2,*, Kaisheng Wu1,2, Wei Zheng1,2, Yu Ge1,2

1 School of Software, Nanchang Hangkong University, Nanchang, 330063, China
2 Software Testing and Evaluation Center, Nanchang Hangkong University, Nanchang, 330063, China

* Corresponding Author: Shuqing Zhang. Email: email

Computers, Materials & Continua 2024, 78(2), 1687-1711. https://doi.org/10.32604/cmc.2023.046187

Abstract

Cross-Project Defect Prediction (CPDP) is a method that utilizes historical data from other source projects to train predictive models for defect prediction in the target project. However, existing CPDP methods only consider linear correlations between features (indicators) of the source and target projects. These models are not capable of evaluating non-linear correlations between features when they exist, for example, when there are differences in data distributions between the source and target projects. As a result, the performance of such CPDP models is compromised. In this paper, this paper proposes a novel CPDP method based on Synthetic Minority Oversampling Technique (SMOTE) and Deep Canonical Correlation Analysis (DCCA), referred to as S-DCCA. Canonical Correlation Analysis (CCA) is employed to address the issue of non-linear correlations between features of the source and target projects. S-DCCA extends CCA by incorporating the MlpNet model for feature extraction from the dataset. The redundant features are then eliminated by maximizing the correlated feature subset using the CCA loss function. Finally, cross-project defect prediction is achieved through the application of the SMOTE data sampling technique. Area Under Curve (AUC) and F1 scores (F1) are used as evaluation metrics. This paper conducted experiments on 27 projects from four public datasets to validate the proposed method. The results demonstrate that, on average, our method outperforms all baseline approaches by at least 1.2% in AUC and 5.5% in F1 score. This indicates that the proposed method exhibits favorable performance characteristics.

Keywords


Cite This Article

APA Style
Fan, X., Zhang, S., Wu, K., Zheng, W., Ge, Y. (2024). Cross-project software defect prediction based on SMOTE and deep canonical correlation analysis. Computers, Materials & Continua, 78(2), 1687-1711. https://doi.org/10.32604/cmc.2023.046187
Vancouver Style
Fan X, Zhang S, Wu K, Zheng W, Ge Y. Cross-project software defect prediction based on SMOTE and deep canonical correlation analysis. Computers Materials Continua . 2024;78(2):1687-1711 https://doi.org/10.32604/cmc.2023.046187
IEEE Style
X. Fan, S. Zhang, K. Wu, W. Zheng, and Y. Ge "Cross-Project Software Defect Prediction Based on SMOTE and Deep Canonical Correlation Analysis," Computers Materials Continua , vol. 78, no. 2, pp. 1687-1711. 2024. https://doi.org/10.32604/cmc.2023.046187



cc This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 354

    View

  • 141

    Download

  • 0

    Like

Share Link