Home / Journals / CMC / Online First / doi:10.32604/cmc.2026.077782
Special Issues
Table of Content

Open Access

ARTICLE

Data-Driven Test Case Prioritization (DD-TCP): A Machine Learning Framework for Intelligent Software Quality Assurance

Hafiz Arslan Ramzan1,*, Kamrul Islam2, Md Ahbab Hussain3, Raiyan Muntasir Monim4, Sabit Md Asad4, Sadia Ramzan5
1 School of Electrical Engineering and Computer Science, National University of Sciences and Technology, Islamabad, Pakistan
2 Gabelli School of Business, Fordham University, New York, NY, USA
3 Ketner School of Business, Trine University, Angola, IN, USA
4 College of Graduate and Professional Studies, Trine University, Angola, IN, USA
5 Department of Computer Science, Emerson University, Multan, Pakistan
* Corresponding Author: Hafiz Arslan Ramzan. Email: email

Computers, Materials & Continua https://doi.org/10.32604/cmc.2026.077782

Received 16 December 2025; Accepted 25 March 2026; Published online 17 April 2026

Abstract

Regression testing of large-scale, data-intensive software systems demands efficient test-case prioritization strategies to detect faults early while minimizing computational cost. Conventional prioritization methods, such as coverage-based and risk-based approaches, lack adaptability to evolving project dynamics and fail to leverage the rich test-execution data accumulated over continuous integration cycles. This study presents a Data-Driven Test-Case Prioritization (DD-TCP) Framework that incorporates statistical and machine-learning techniques to model the relationship between test-case features and historical fault detection outcomes. The framework extracts multidimensional attributes including code-change frequency, dependency metrics, execution duration, and past failure density, which are normalized and embedded into a predictive ranking model based on gradient-boosted decision trees. Test cases are then dynamically reordered using a probabilistic gain function that maximizes early fault detection probability. Comprehensive simulations on representative open-source project datasets and synthetically generated large-scale test suites reveal that the proposed Data-Driven Test-Case Prioritization (DD-TCP) framework consistently achieves superior performance, yielding a 32.4% improvement in Average Percentage of Faults Detected (APFD) and a 27.1% reduction in execution overhead relative to baseline methods. The results demonstrate the feasibility of data-centric intelligence for scalable regression testing and provide an analytical foundation for integrating machine learning into next-generation Software Quality Assurance pipelines.

Keywords

Data-driven test-case prioritization; regression testing; software quality assurance; machine learning; continuous integration; fault detection efficiency; intelligent software systems
  • 249

    View

  • 29

    Download

  • 0

    Like

Share Link