Predicting Software Security Bugs Using Machine Learning and Quality Metrics: An Empirical Study
Mohamed Diouf1, Elisée Toe1,*, Manel Grichi2, Haïfa Nakouri1,3, Fehmi Jaafar1
1 Department of Computer Science and Mathematics (DIM), University of Quebec at Chicoutimi (UQAC), Chicoutimi, QC, Canada
2 Research and Development Department, VibroSystM, Longueuil, QC, Canada
3 LARODEC, Institut Supérieur de Gestion (ISG), University of Tunis, Tunis, Tunisia
* Corresponding Author: Elisée Toe. Email:
Computers, Materials & Continua https://doi.org/10.32604/cmc.2026.077139
Received 03 December 2025; Accepted 15 January 2026; Published online 14 February 2026
Abstract
Software security bugs present significant security risks to modern systems, leading to unauthorized access, data breaches, and severe operational and financial consequences. Early prediction of such vulnerabilities is therefore essential for strengthening software reliability and reducing remediation costs. This study investigates the extent to which static software quality metrics can identify vulnerable code and evaluates the effectiveness of machine learning models for large-scale security-bug prediction. We analyze a dataset of 338,442 source files, including 33,294 buggy files, collected from seven major open-source ecosystems. These ecosystems include GitHub Security Advisories (GHSA), Python Package Index (PyPI), OSS-Fuzz (Google’s open-source fuzzing service), Node Package Manager (npm), Packagist (the PHP package repository), Apache Maven, and NuGet (the.NET package manager). Using the Open Source Vulnerabilities (OSV) platform, we identify 7685 confirmed security bugs and extract 25 static software quality metrics per file with the Understand analysis tool. We apply five complementary feature-importance techniques and evaluate eleven machine-learning classifiers under a time-series cross-validation protocol. Our analysis reveals three key findings. First, six core metrics consistently show strong associations with the presence of security bugs across all feature-selection methods. Second, buggy files exhibit substantially higher metric values, with medians approximately three times those of non-buggy files, a pattern we term the “3
× rule”; Mann–Whitney U tests confirm that these differences are statistically significant. Third, the machine-learning models achieve strong predictive performance, with XGBoost providing the best results (recall = 0.82, precision = 0.95, Receiver Operating Characteristic–Area Under the Curve (ROC-AUC) = 0.91). Based on these findings, we propose data-driven warning and critical thresholds for the most influential metrics to support proactive security assessment. Overall, this work provides large-scale empirical evidence that software quality metrics, combined with machine learning, offer actionable signals for detecting security bugs and for integrating automated vulnerability prediction into software development workflows.
Keywords
Security bugs; vulnerability prediction; software metrics; machine learning; code complexity; software quality