Greetings,
Welcome to the 22nd edition of the State of AI. In this installment, we explore a wide array of cutting-edge innovations that are shaping the landscape of artificial intelligence. From AnomalyGPT's prowess in industrial anomaly detection to the game-changing capabilities of Lucene-based Vector Search, this edition is packed with compelling insights.
Dive into the groundbreaking RoboTAP technology, which enables unprecedented precision in few-shot visual imitation through tracking arbitrary points. Uncover the capabilities of FaceChain, a novel framework for generating identity-preserving portraits that holds vast potential for future applications. And journey through the virtual landscapes created by CityDreamer, our foray into the world of generative 3D city modeling.
Each of these topics not only represents the latest strides in AI technology but also serves as a testament to the endless possibilities this field holds. Your journey through this issue promises to be enlightening, engaging, and a glimpse into the future of AI.
Enjoy the read!
Best regards,
Contents
AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models
Vector Search with OpenAI Embeddings: Lucene Is All You Need
RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation
FaceChain: A Playground for Identity-Preserving Portrait Generation
CityDreamer: Compositional Generative Model of Unbounded 3D Cities
AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models
Authors: Zhaopeng Gu, Bingke Zhu, Guibo Zhu, Yingying Chen, Ming Tang, Jinqiao Wang
Source & References: https://arxiv.org/abs/2308.15366
Introduction
AnomalyGPT is a novel Industrial Anomaly Detection (IAD) approach based on Large Vision-Language Models (LVLMs). The goal is to detect and localize anomalies in industrial product images without the need for manual threshold adjustments. Additionally, AnomalyGPT supports multi-turn dialogues and demonstrates impressive few-shot in-context learning capabilities.
Existing IAD methods typically require manual tuning of thresholds and only provide anomaly scores, which restricts their practical implementation. On the other hand, general LVLMs cannot effectively address the IAD task due to their lack of specific domain knowledge and weaker understanding of localized details within objects. AnomalyGPT addresses this issue by leveraging LVLMs to understand images and generate corresponding textual descriptions.
Method Overview
AnomalyGPT includes an image encoder, a lightweight decoder, and a prompt learner. The image encoder is used to extract an image embedding from the input image, which is then fed into the LLM. The decoder generates pixel-level anomaly localization results based on patch-level features extracted from the intermediate layers of the image encoder. These localization results are transformed into prompt embeddings through the prompt learner, which are then used as inputs for the LLM. This enables the LLM to determine the presence and location of anomalies in images without the need for manually specified thresholds.
Decoder and Prompt Learner
The decoder computes the similarity between patch-level features from the image and text features representing normal and abnormal semantics. It supports both unsupervised IAD and few-shot IAD. In the unsupervised setting, patch-level features are compared with text features to produce localization results. In the few-shot setting, patch-level features from normal samples are stored in memory banks, and localization results are calculated based on the distance between query patches and their most similar counterparts in the memory bank.
The prompt learner is responsible for transforming the localization result into prompt embeddings. It includes learnable base prompt embeddings, which are unrelated to the decoder outputs, and a convolutional neural network that converts the localization result into additional prompt embeddings. These embeddings are combined with the image embedding as input for the LLM.
Data Generation and Alignment
Simulated anomaly data is generated using the Anomaly Simulation (NSA) technique, which builds upon the Cut-paste method and incorporates Poisson image editing to alleviate discontinuity introduced by pasting image segments. This technique enables the generation of realistic anomalous images for training the model.
Textual queries are created based on the simulated anomalous images. Each query consists of a description of the input image and a question about the presence of anomalies. The LLM responds to the queries by indicating the presence of anomalies and specifying their location.
Experiments and Results
Extensive experiments were conducted on the MVTec-AD and VisA datasets. In the unsupervised setting, AnomalyGPT achieved an accuracy of 93.3%, an image-level AUC of 97.4%, and a pixel-level AUC of 93.1% on the MVTec-AD dataset. When one-shot transferred to the VisA dataset, it achieved an accuracy of 77.4%, an image-level AUC of 87.4%, and a pixel-level AUC of 96.2%. Conversely, after unsupervised training on the VisA dataset, one-shot transferred to the MVTec-AD dataset resulted in an accuracy of 86.1%, an image-level AUC of 94.1%, and a pixel-level AUC of 95.3%.
AnomalyGPT demonstrates robust transferability and the ability to engage in in-context few-shot learning on new datasets, yielding outstanding performance.
Key Takeaways
AnomalyGPT is a pioneering approach towards utilizing LVLMs to address the IAD task. Key contributions of this research include:
The successful application of LVLMs to the domain of industrial anomaly detection, with the ability to detect and locate anomalies without manual threshold adjustments.
The introduction of a lightweight, visual-textual feature-matching-based decoder to address the LLM's weaker discernment of fine-grained semantic and constraint of text-only outputs.
The use of prompt embeddings for fine-tuning and concurrent training with pre-training data, preserving the LVLM's inherent capabilities and enabling multi-turn dialogues.
Robust transferability and the capability for in-context few-shot learning on new datasets, achieving state-of-the-art performance.
Overall, AnomalyGPT offers a promising solution for the IAD problem by leveraging the capabilities of LVLMs while overcoming the limitations of existing IAD methods and general LVLMs.
Vector Search with OpenAI Embeddings: Lucene Is All You Need
Authors: Jimmy Lin, Ronak Pradeep, Tommaso Teofili, Jasper Xian
Source & References: https://arxiv.org/abs/2308.14963v1
Introduction
The paper "Vector Search with OpenAI Embeddings: Lucene Is All You Need" by Jimmy Lin and his team challenges the prevailing notion that you need a dedicated vector store to take advantage of advancements in deep neural networks for search applications. Instead, the authors argue that the core features of Lucene, a popular open-source search library, are enough to provide vector search capabilities within a typical bi-encoder architecture.
The primary argument is based on a cost-benefit analysis considering that many organizations have already made substantial investments in search capabilities, mainly within the Lucene ecosystem. As a result, introducing a dedicated vector store into an already complex enterprise "AI stack" appears unnecessary.
Keep reading with a 7-day free trial
Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.