Anomaly Detection by Adapting a pre-trained Vision Language Model

AI-generated keywords: Anomaly Detection

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Recent advancements in large vision and language models have shown efficacy in anomaly detection across various tasks.
  • The CLIP-ADA framework is introduced for Anomaly Detection by Adapting a pre-trained CLIP model, incorporating two key enhancements:
  • Introduction of a learnable prompt linked with abnormal patterns through self-supervised learning for consistent anomaly detection.
  • Proposal of an anomaly region refinement strategy to improve localization accuracy and fully utilize CLIP's representation capabilities.
  • During testing, anomalies are pinpointed by assessing similarity between the representation of the learnable prompt and the image.
  • Extensive experiments demonstrate superior performance of CLIP-ADA, achieving state-of-the-art results on MVTec-AD and VisA datasets for both anomaly detection and localization tasks.
  • The method shows promising performance even with limited training data, showcasing robustness in challenging scenarios.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yuxuan Cai, Xinwei He, Dingkang Liang, Ao Tong, Xiang Bai

Abstract: Recently, large vision and language models have shown their success when adapting them to many downstream tasks. In this paper, we present a unified framework named CLIP-ADA for Anomaly Detection by Adapting a pre-trained CLIP model. To this end, we make two important improvements: 1) To acquire unified anomaly detection across industrial images of multiple categories, we introduce the learnable prompt and propose to associate it with abnormal patterns through self-supervised learning. 2) To fully exploit the representation power of CLIP, we introduce an anomaly region refinement strategy to refine the localization quality. During testing, the anomalies are localized by directly calculating the similarity between the representation of the learnable prompt and the image. Comprehensive experiments demonstrate the superiority of our framework, e.g., we achieve the state-of-the-art 97.5/55.6 and 89.3/33.1 on MVTec-AD and VisA for anomaly detection and localization. In addition, the proposed method also achieves encouraging performance with marginal training data, which is more challenging.

Submitted to arXiv on 14 Mar. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2403.09493v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

, , , , In the realm of anomaly detection, recent advancements in large vision and language models have showcased their efficacy across various downstream tasks. In this study, a novel unified framework dubbed CLIP-ADA is introduced for Anomaly Detection by Adapting a pre-trained CLIP model. This framework incorporates two key enhancements to enhance anomaly detection performance. Firstly, to achieve consistent anomaly detection across diverse industrial image categories, a learnable prompt is introduced and linked with abnormal patterns through self-supervised learning. Secondly, to fully harness the representation capabilities of CLIP, an anomaly region refinement strategy is proposed to improve localization accuracy. During testing, anomalies are pinpointed by directly assessing the similarity between the representation of the learnable prompt and the image. Extensive experiments conducted demonstrate the superior performance of the CLIP-ADA framework, achieving state-of-the-art results with 97.5/55.6 and 89.3/33.1 on MVTec-AD and VisA datasets for both anomaly detection and localization tasks respectively. Notably, this method also exhibits promising performance even with limited training data, showcasing its robustness in challenging scenarios. Authored by Yuxuan Cai, Xinwei He, Dingkang Liang, Ao Tong, and Xiang Bai, this paper titled "Anomaly Detection by Adapting a pre-trained Vision Language Model" presents a significant contribution to the field of anomaly detection through innovative adaptations of pre-trained models for enhanced performance in industrial image analysis applications.
Created on 25 Jul. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.