LaRP-CLIP: Layer-Aware Refinement with Prototype Guidance for Zero-Shot Anomaly Detection

Xing Fang¹, Yuanfang Chen^1,2,*, Qiang Lin³, Kun Yang^2,4, Gyu Myoung Lee⁵
1 School of Cyberspace, Hangzhou Dianzi University, Hangzhou, China
2 The State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China
3 School of Computer Science and Technology, University of Science and Technology of China, Hefei, China
4 College of Computer Science and Technology, Zhejiang University, Hangzhou, China
5 School of Computer Science and Mathematics, Liverpool John Moores University, Liverpool, UK
* Corresponding Author: Yuanfang Chen. Email: email

Computers, Materials & Continua https://doi.org/10.32604/cmc.2026.084208

Received 18 April 2026; Accepted 22 May 2026; Published online 22 June 2026

Download PDF

Abstract

The deployment of supervised anomaly detection is typically limited by the high cost of annotation, privacy constraints, and the scarcity of anomalous samples. These constraints have motivated the use of vision-language pre-trained models for zero-shot anomaly detection. However, existing CLIP-based methods still face three limitations: a shared set of prompts is applied across feature layers, anomaly maps are fused by fixed strategies, and image-level anomaly scores are determined solely by global image-text similarity. These limitations reduce the accuracy of pixel-level localization and weaken the reliability of image-level anomaly prediction. To overcome these limitations, LaRP-CLIP is proposed. It introduces layer-aware prompt decoupling to better match feature layers with different semantic characteristics, adaptive fusion with error-prior-guided local refinement to produce cleaner and more precise anomaly maps, and a prototype branch to improve image-level scoring. Experiments on four industrial datasets and seven medical datasets show that LaRP-CLIP achieves strong performance in both image-level detection and pixel-level localization.

Keywords

Zero-shot anomaly detection; vision-language models; layer-aware prompts; local refinement; prototype branch

Downloads
- Full-Text PDF
Citation Tools
- BibTex
- EndNote
- RIS

305

View
58

Download
0

Like

A Review on Vision-Language-Based Approaches: Challenges and Applications
Huu-Tuong Ho, Luong Vuong Nguyen,...
Rethinking Chart Understanding Using Multimodal Large Language Models
Andreea-Maria Tanasă, Simona-Vasilica...
Proactive Disentangled Modeling of Trigger–Object Pairings for Backdoor Defense
Kyle Stein, Andrew A. Mahyari,...
Industrial EdgeSign: NAS-Optimized Real-Time Hand Gesture Recognition for Operator Communication in Smart Factories
Meixi Chu, Xinyu Jiang, Yushu...
Adaptive Meta-Loss Networks: Learning Task-Agnostic Loss Functions via Evolutionary Optimization
Mirna Yunita, Xiabi Liu, Zhaoyang...

All issues

Online First

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

LaRP-CLIP: Layer-Aware Refinement with Prototype Guidance for Zero-Shot Anomaly Detection

Abstract

Keywords

305

58

0

Further Information

Guidelines

Follow Us

Join Us

Contact Us

WhatsApp:

Share Link