Open Access
ARTICLE
Addressing Prompt Injection in Large Language Models via In-Context Learning
1 Department of Informatics, University of Electro-Communications, Tokyo, Japan
2 Artificial Intelligence Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
* Corresponding Author: Yuichi Sei. Email:
(This article belongs to the Special Issue: Artificial Intelligence Methods and Techniques to Cybersecurity)
Computers, Materials & Continua 2026, 87(2), 99 https://doi.org/10.32604/cmc.2026.078188
Received 25 December 2025; Accepted 12 February 2026; Issue published 12 March 2026
Abstract
While Large Language Models (LLMs) possess the capability to perform a wide range of tasks, security attacks known as prompt injection and jailbreaking remain critical challenges. Existing defense approaches addressing this problem face challenges such as the over-refusal of prompts that contain harmful vocabulary but are semantically benign, and the limited accuracy improvement in machine learning-based approaches due to the ease of distinguishing benign prompts in existing datasets. Therefore, we propose a multi-LLM agent framework aimed at achieving both the accurate rejection of harmful prompts and appropriate responses to benign prompts. Distinct from prior studies, the proposed method adopts In-Context Learning (ICL) during the learning phase, presenting a novel approach that obviates the need for computationally expensive parameter updates required by conventional fine-tuning. To demonstrate the proposed method’s capability for rapid and easy deployment, this study targets LLMs with insufficient alignment. In the experiments, macro-averaged binary classification metrics were used to comprehensively evaluate harmfulness detection. Experimental results using three LLMs demonstrated that the proposed method achieved performance that surpassed four baselines across all evaluation metrics for the target LLMs, evidencing significant effectiveness with an average improvement of 16.6 points in F1-score compared to the vanilla models. The significance of this study lies in the proposal of a novel approach based on ICL that does not require parameter updates. This framework offers high sustainability in practical deployment, as it allows for the adaptive enhancement of detection performance against continuously evolving attack methods solely through the accumulation of logs, without the necessity of retraining the LLM itself. By mitigating the trade-off between safety and utility, this research contributes to the implementation of robust LLMs.Keywords
Cite This Article
Copyright © 2026 The Author(s). Published by Tech Science Press.This work is licensed under a Creative Commons Attribution 4.0 International License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Submit a Paper
Propose a Special lssue
View Full Text
Download PDF
Downloads
Citation Tools