Predicting Software Security Bugs Using Machine Learning and Quality Metrics: An Empirical Study

Mohamed Diouf; Elisée Toe; Manel Grichi; Haïfa Nakouri; Fehmi Jaafar

doi:10.32604/cmc.2026.077139

Open Access icon Open Access

ARTICLE

Predicting Software Security Bugs Using Machine Learning and Quality Metrics: An Empirical Study

Mohamed Diouf¹, Elisée Toe^1,*, Manel Grichi², Haïfa Nakouri^1,3, Fehmi Jaafar¹

1 Department of Computer Science and Mathematics (DIM), University of Quebec at Chicoutimi (UQAC), Chicoutimi, QC, Canada
2 Research and Development Department, VibroSystM, Longueuil, QC, Canada
3 LARODEC, Institut Supérieur de Gestion (ISG), University of Tunis, Tunis, Tunisia

* Corresponding Author: Elisée Toe. Email: email

Computers, Materials & Continua 2026, 87(3), 12 https://doi.org/10.32604/cmc.2026.077139

Received 03 December 2025; Accepted 15 January 2026; Issue published 09 April 2026

Abstract

Software security bugs present significant security risks to modern systems, leading to unauthorized access, data breaches, and severe operational and financial consequences. Early prediction of such vulnerabilities is therefore essential for strengthening software reliability and reducing remediation costs. This study investigates the extent to which static software quality metrics can identify vulnerable code and evaluates the effectiveness of machine learning models for large-scale security-bug prediction. We analyze a dataset of 338,442 source files, including 33,294 buggy files, collected from seven major open-source ecosystems. These ecosystems include GitHub Security Advisories (GHSA), Python Package Index (PyPI), OSS-Fuzz (Google’s open-source fuzzing service), Node Package Manager (npm), Packagist (the PHP package repository), Apache Maven, and NuGet (the.NET package manager). Using the Open Source Vulnerabilities (OSV) platform, we identify 7685 confirmed security bugs and extract 25 static software quality metrics per file with the Understand analysis tool. We apply five complementary feature-importance techniques and evaluate eleven machine-learning classifiers under a time-series cross-validation protocol. Our analysis reveals three key findings. First, six core metrics consistently show strong associations with the presence of security bugs across all feature-selection methods. Second, buggy files exhibit substantially higher metric values, with medians approximately three times those of non-buggy files, a pattern we term the “3× rule”; Mann–Whitney U tests confirm that these differences are statistically significant. Third, the machine-learning models achieve strong predictive performance, with XGBoost providing the best results (recall = 0.82, precision = 0.95, Receiver Operating Characteristic–Area Under the Curve (ROC-AUC) = 0.91). Based on these findings, we propose data-driven warning and critical thresholds for the most influential metrics to support proactive security assessment. Overall, this work provides large-scale empirical evidence that software quality metrics, combined with machine learning, offer actionable signals for detecting security bugs and for integrating automated vulnerability prediction into software development workflows.

Keywords

Security bugs; vulnerability prediction; software metrics; machine learning; code complexity; software quality

Cite This Article

APA Style

Diouf, M., Toe, E., Grichi, M., Nakouri, H., Jaafar, F. (2026). Predicting Software Security Bugs Using Machine Learning and Quality Metrics: An Empirical Study. Computers, Materials & Continua, 87(3), 12. https://doi.org/10.32604/cmc.2026.077139

Vancouver Style

Diouf M, Toe E, Grichi M, Nakouri H, Jaafar F. Predicting Software Security Bugs Using Machine Learning and Quality Metrics: An Empirical Study. Comput Mater Contin. 2026;87(3):12. https://doi.org/10.32604/cmc.2026.077139

IEEE Style

M. Diouf, E. Toe, M. Grichi, H. Nakouri, and F. Jaafar, “Predicting Software Security Bugs Using Machine Learning and Quality Metrics: An Empirical Study,” Comput. Mater. Contin., vol. 87, no. 3, pp. 12, 2026. https://doi.org/10.32604/cmc.2026.077139

BibTex EndNote RIS

Copyright © 2026 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Table of Content

Predicting Software Security Bugs Using Machine Learning and Quality Metrics: An Empirical Study

Abstract

Keywords

Cite This Article

1189

723

0

Related articles

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link