Open Access iconOpen Access

ARTICLE

Predicting Software Security Bugs Using Machine Learning and Quality Metrics: An Empirical Study

Mohamed Diouf1, Elisée Toe1,*, Manel Grichi2, Haïfa Nakouri1,3, Fehmi Jaafar1

1 Department of Computer Science and Mathematics (DIM), University of Quebec at Chicoutimi (UQAC), Chicoutimi, QC, Canada
2 Research and Development Department, VibroSystM, Longueuil, QC, Canada
3 LARODEC, Institut Supérieur de Gestion (ISG), University of Tunis, Tunis, Tunisia

* Corresponding Author: Elisée Toe. Email: email

Computers, Materials & Continua 2026, 87(3), 12 https://doi.org/10.32604/cmc.2026.077139

Abstract

Software security bugs present significant security risks to modern systems, leading to unauthorized access, data breaches, and severe operational and financial consequences. Early prediction of such vulnerabilities is therefore essential for strengthening software reliability and reducing remediation costs. This study investigates the extent to which static software quality metrics can identify vulnerable code and evaluates the effectiveness of machine learning models for large-scale security-bug prediction. We analyze a dataset of 338,442 source files, including 33,294 buggy files, collected from seven major open-source ecosystems. These ecosystems include GitHub Security Advisories (GHSA), Python Package Index (PyPI), OSS-Fuzz (Google’s open-source fuzzing service), Node Package Manager (npm), Packagist (the PHP package repository), Apache Maven, and NuGet (the.NET package manager). Using the Open Source Vulnerabilities (OSV) platform, we identify 7685 confirmed security bugs and extract 25 static software quality metrics per file with the Understand analysis tool. We apply five complementary feature-importance techniques and evaluate eleven machine-learning classifiers under a time-series cross-validation protocol. Our analysis reveals three key findings. First, six core metrics consistently show strong associations with the presence of security bugs across all feature-selection methods. Second, buggy files exhibit substantially higher metric values, with medians approximately three times those of non-buggy files, a pattern we term the “3× rule”; Mann–Whitney U tests confirm that these differences are statistically significant. Third, the machine-learning models achieve strong predictive performance, with XGBoost providing the best results (recall = 0.82, precision = 0.95, Receiver Operating Characteristic–Area Under the Curve (ROC-AUC) = 0.91). Based on these findings, we propose data-driven warning and critical thresholds for the most influential metrics to support proactive security assessment. Overall, this work provides large-scale empirical evidence that software quality metrics, combined with machine learning, offer actionable signals for detecting security bugs and for integrating automated vulnerability prediction into software development workflows.

Keywords

Security bugs; vulnerability prediction; software metrics; machine learning; code complexity; software quality

Cite This Article

APA Style
Diouf, M., Toe, E., Grichi, M., Nakouri, H., Jaafar, F. (2026). Predicting Software Security Bugs Using Machine Learning and Quality Metrics: An Empirical Study. Computers, Materials & Continua, 87(3), 12. https://doi.org/10.32604/cmc.2026.077139
Vancouver Style
Diouf M, Toe E, Grichi M, Nakouri H, Jaafar F. Predicting Software Security Bugs Using Machine Learning and Quality Metrics: An Empirical Study. Comput Mater Contin. 2026;87(3):12. https://doi.org/10.32604/cmc.2026.077139
IEEE Style
M. Diouf, E. Toe, M. Grichi, H. Nakouri, and F. Jaafar, “Predicting Software Security Bugs Using Machine Learning and Quality Metrics: An Empirical Study,” Comput. Mater. Contin., vol. 87, no. 3, pp. 12, 2026. https://doi.org/10.32604/cmc.2026.077139



cc Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 998

    View

  • 669

    Download

  • 0

    Like

Share Link