Integration of Large Language Models (LLMs) and Static Analysis for Improving the Efficacy of Security Vulnerability Detection in Source Code
José Armando Santas Ciavatta, Juan Ramón Bermejo Higuera*, Javier Bermejo Higuera, Juan Antonio Sicilia Montalvo, Tomás Sureda Riera, Jesús Pérez Melero
School of Engineering and Technology, International University of La Rioja, Avda.de La Paz, 137, Logroño, 26006, La Rioja, Spain
* Corresponding Author: Juan Ramón Bermejo Higuera. Email:
Computers, Materials & Continua https://doi.org/10.32604/cmc.2025.074566
Received 14 October 2025; Accepted 21 November 2025; Published online 09 December 2025
Abstract
As artificial Intelligence (AI) continues to expand exponentially, particularly with the emergence of generative pre-trained transformers (GPT) based on a transformer’s architecture, which has revolutionized data processing and enabled significant improvements in various applications. This document seeks to investigate the security vulnerabilities detection in the source code using a range of large language models (LLM). Our primary objective is to evaluate the effectiveness of Static Application Security Testing (SAST) by applying various techniques such as prompt persona, structure outputs and zero-shot. To the selection of the LLMs (CodeLlama 7B, DeepSeek coder 7B, Gemini 1.5 Flash, Gemini 2.0 Flash, Mistral 7b Instruct, Phi 3 8b Mini 128K instruct, Qwen 2.5 coder, StartCoder 2 7B) with comparison and combination with Find Security Bugs. The evaluation method will involve using a selected dataset containing vulnerabilities, and the results to provide insights for different scenarios according to the software criticality (Business critical, non-critical, minimum effort, best effort) In detail, the main objectives of this study are to investigate if large language models outperform or exceed the capabilities of traditional static analysis tools, if the combining LLMs with Static Application Security Testing (SAST) tools lead to an improvement and the possibility that local machine learning models on a normal computer produce reliable results. Summarizing the most important conclusions of the research, it can be said that while it is true that the results have improved depending on the size of the LLM for business-critical software, the best results have been obtained by SAST analysis. This differs in “Non-Critical,” “Best Effort,” and “Minimum Effort” scenarios, where the combination of LLM (Gemini) + SAST has obtained better results.
Keywords
AI + SAST; secure code; LLM; benchmarking LLM; vulnerability detection