Open Access iconOpen Access

REVIEW

crossmark

Enhancing Security in Large Language Models: A Comprehensive Review of Prompt Injection Attacks and Defenses

Eleena Sarah Mathew*

Department of Computer Science and Engineering, Motilal Nehru National Institute of Technology, Allahabad, 211004, India

* Corresponding Author: Eleena Sarah Mathew. Email: email

Journal on Artificial Intelligence 2025, 7, 347-363. https://doi.org/10.32604/jai.2025.069841

Abstract

This review paper explores advanced methods to prompt Large Language Models (LLMs) into generating objectionable or unintended behaviors through adversarial prompt injection attacks. We examine a series of novel projects like HOUYI, Robustly Aligned LLM (RA-LLM), StruQ, and Virtual Prompt Injection that compel LLMs to produce affirmative responses to harmful queries. Several new benchmarks, such as PromptBench, AdvBench, AttackEval, INJECAGENT, and RobustnessSuite, have been created to evaluate the performance and resilience of LLMs against these adversarial attacks. Results show significant success rates in misleading models like Vicuna-7B, LLaMA-2-7B-Chat, GPT-3.5, and GPT-4. The review highlights limitations in existing defense mechanisms and proposes future directions for enhancing LLM alignment and safety protocols, including the concept of LLM SELF DEFENSE. Our study emphasizes the need for improved robustness in LLMs, which will potentially shape the future of Artificial Intelligence (AI) driven applications and security protocols. Understanding the vulnerabilities of LLMs is crucial for developing effective defenses against adversarial prompt injection attacks. This paper proposes a systemic classification framework that discusses various types of prompt injection attacks and defenses. We also go through a broad spectrum of state-of-the-art attack methods (such as HouYi and Virtual Prompt Injection) alongside advanced defense mechanisms (like RA-LLM, StruQ, and LLM Self-Defense), providing critical insights into vulnerabilities and robustness. We also integrate and compare results from multiple recent benchmarks, including PromptBench, INJECENT, and BIPIA.

Keywords

Natural language processing; prompt injection; ChatGPT; large language models (LLMs); adversarial exploitation; jailbreak; sensitive data leakage

Cite This Article

APA Style
Mathew, E.S. (2025). Enhancing Security in Large Language Models: A Comprehensive Review of Prompt Injection Attacks and Defenses. Journal on Artificial Intelligence, 7(1), 347–363. https://doi.org/10.32604/jai.2025.069841
Vancouver Style
Mathew ES. Enhancing Security in Large Language Models: A Comprehensive Review of Prompt Injection Attacks and Defenses. J Artif Intell. 2025;7(1):347–363. https://doi.org/10.32604/jai.2025.069841
IEEE Style
E. S. Mathew, “Enhancing Security in Large Language Models: A Comprehensive Review of Prompt Injection Attacks and Defenses,” J. Artif. Intell., vol. 7, no. 1, pp. 347–363, 2025. https://doi.org/10.32604/jai.2025.069841



cc Copyright © 2025 The Author(s). Published by Tech Science Press.
This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • 1223

    View

  • 618

    Download

  • 0

    Like

Share Link