TY - EJOU
AU - Tao, Jialing
AU - Huang, Song
AU - Zheng, Changyou
TI - Scaling the Strategy Wall: Efficient Jailbreaking of LLMs via Component-Based Multi-Objective Optimization
T2 - Computers, Materials \& Continua
PY -
VL -
IS -
SN - 1546-2226
AB - Background: Jailbreak attacks, which use crafted prompts to bypass safety alignments of Large Language Models (LLMs) and generate harmful content, pose a significant security threat. Existing methods often optimize for a single objective (e.g., attack success rate), neglecting critical factors like query efficiency, which limits their practicality and generalization. Methods: We propose a Componentized Multi-Objective Optimization Framework (CMOOF), which introduces a paradigm shift: it searches for generalizable and query-efficient attack strategy templates within a structured, component-based strategy space. CMOOF leverages the NSGA-II algorithm to explicitly co-optimize two first-class objectives: Attack Success Rate (ASR) and Query Efficiency, thereby discovering their Pareto-optimal trade-off frontier. Results: Experiments on benchmark datasets show significant improvements, with the highest jailbreak success rate reaching 98.75% on models like Llama3, and query efficiency surpassing baselines. Conclusions: CMOOF redefines jailbreak optimization from instance-level prompt crafting to strategy-level template discovery. The work provides an efficient, scalable, and generalizable jailbreak solution, and the framework offers broader insights for automated red teaming and LLM security defense.
KW - Jailbreak attacks; LLMs; multi-objective optimization; NSGA-II
DO - 10.32604/cmc.2026.080119