Bi-Weekly AI Research Roundup

Latest research summaries in ML, Robotics, CV, NLP and AI

State of AI

Oct 29, 2024

∙ Paid

Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements
Brain-like Functional Organization within Large Language Models
Simpler Diffusion (SiD2): 1.5 FID on ImageNet512 with pixel-space diffusion
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
Multi-view biomedical foundation models for molecule-target and property prediction
Diffusion Attribution Score: Evaluating Training Data Influence in Diffusion Model
Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead
AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs
SPRIG: Improving Large Language Model Performance by System Prompt Optimization
Towards End-to-End Open Conversational Machine Reading
RepairAgent: An Autonomous, LLM-Based Agent for Program Repair
SAM 2: Segment Anything in Images and Videos
BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network Inference
EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation

Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements

Authors: Isamu Isozaki, Manil Shrestha, Rick Console, Edward Kim

Source and references: https://arxiv.org/abs/2410.17141v2

Introduction

This paper introduces a novel open benchmark for evaluating the performance of large language models (LLMs) in the domain of automated penetration testing. The goal is to drive progress and assess the capabilities of these models in security contexts.

Key Points

The authors introduce a comprehensive, open-source benchmark for LLM-based automated penetration testing, addressing a critical gap in existing research.
They evaluate the performance of two popular LLMs, GPT-4o and Llama 3.1-405B, using the state-of-the-art PentestGPT tool.
The findings reveal that while Llama 3.1 demonstrates an edge over GPT-4o, both models currently fall short of performing fully automated, end-to-end penetration testing.
The authors present ablation studies that provide insights into improving the PentestGPT tool and illuminate the challenges LLMs face in each aspect of penetration testing.

Methodology

The researchers developed a novel benchmark using Vulnhub virtual machines and established clear rules to minimize human involvement in the evaluation process. They assessed the performance of PentestGPT with GPT-4o and Llama 3.1-405B, limiting each test to a maximum of five attempts, except for the initial enumeration task which allowed ten.

Results and Findings

The results show that Llama 3.1-405B outperforms GPT-4o across various difficulty levels and task categories, particularly in reconnaissance, general techniques, and exploitation tasks. However, both models struggle with privilege escalation tasks, and neither was able to gain root-level privileges in any of the machines without failure.

Implications and Conclusions

This work contributes to the growing body of knowledge on AI-assisted cybersecurity and lays the foundation for future research in automated penetration testing using large language models. The benchmark and findings provide valuable insights into the current limitations of LLMs in security contexts, guiding future improvements in this field.

Brain-like Functional Organization within Large Language Models

Authors: H. Sun, L. Zhao, Z. Wu, X. Gao, Y. Hu, M. Zuo, W. Zhang, J. Han, T. Liu, X. Hu

Source and references: https://arxiv.org/abs/2410.19542v1

Introduction

This research paper explores the functional organization within large language models (LLMs), such as BERT and the Llama family, and investigates whether they exhibit brain-like organizational principles.

Get 7 day free trial

Key Points

The study aims to directly couple sub-groups of artificial neurons (ANs) in LLMs with functional brain networks (FBNs) observed in the human brain.
Representative temporal response patterns of ANs are extracted and used as regressors to construct voxel-wise encoding models that predict brain activity recorded via fMRI.
This framework enables the delineation of brain-like functional organization within LLMs by linking AN sub-groups to FBNs.
The analysis examines the evolution of brain-like functional organization across four LLMs (BERT and Llama 1-3).
The findings reveal that LLMs exhibit brain-like functional architecture, with AN sub-groups mirroring the organizational patterns of well-established FBNs.

Methodology

The researchers define ANs as the neurons in the second fully connected layer of the feed-forward network within each transformer block of the LLMs. They extract representative temporal response patterns of these ANs using a sparse representation scheme and use these patterns as regressors in voxel-wise encoding models to predict fMRI brain activity. The encoding coefficients associated with each pattern reveal how that pattern couples with the functional activity of the entire brain.

Results and Findings

The results show that the temporal responses of ANs can be effectively represented by the learned dictionary, with higher R2values observed for BERT compared to the Llama family. The brain maps reveal intricate functional interactions and competitions among well-established FBNs, with consistent patterns of FBN engagement observed across the four LLMs.

The analysis of the evolution of brain-like functional organization within the LLMs suggests that more advanced models, such as Llama3, promote more compact brain-like functional organizations. Specifically, Llama3 exhibits a higher degree of temporal consistency and hierarchical organization of ANs compared to the other models, indicating an improved balance between the diversity of computational behaviors and the consistency of functional specializations.

Implications and Conclusions

This research represents the first exploration of brain-like functional organization within LLMs, offering novel insights that could inform the development of artificial general intelligence (AGI) systems inspired by the organizational principles of the human brain.

Simpler Diffusion (SiD2): 1.5 FID on ImageNet512 with pixel-space diffusion

Authors: Emiel Hoogeboom, Thomas Mensink, Jonathan Heek, Kay Lamerigts, Ruiqi Gao, Tim Salimans

Source and references: https://arxiv.org/abs/2410.19324v1

Introduction

This paper presents "Simpler Diffusion (SiD2)", an approach to scaling up end-to-end pixel-space diffusion models for high-resolution image synthesis, achieving 1.5 FID on ImageNet512.

Key Points

Analyze and tune the sigmoid loss weighting, showing it should be preferred over the EDM monotonic weighting.
Propose architectural improvements like "flop heavy scaling" and removing block skip-connections ("Residual U-ViTs") that lead to better performance.
Improve the state-of-the-art of pixel-space diffusion from 2.65 to 1.50 FID on ImageNet512.
Achieve overall state-of-the-art on ImageNet128 and ImageNet256 without heavy tuning.
Demonstrate that end-to-end pixel-space diffusion models can be competitive with latent approaches.

Methodology

The authors use a simple recipe for scaling end-to-end pixel-space diffusion models: 1) Use the sigmoid loss with prescribed hyperparameters, 2) Use a simplified memory-efficient architecture with fewer skip-connections, and 3) Scale the model to favor processing the image at high resolution with fewer parameters. They combine these steps with recently proposed techniques like guidance intervals.

Results and Findings

The authors show that the sigmoidal weighting outperforms other loss weightings, including the EDM monotonic weighting, on ImageNet512. They also find that flop heavy scaling, where the model has a higher flop-to-parameter ratio, performs better than simply increasing the number of channels. Removing block skip-connections in the "Residual U-ViT" architecture does not degrade performance and may even improve it for larger models.

Implications and Conclusions

The authors demonstrate that end-to-end pixel-space diffusion models can be competitive with latent approaches, outperforming existing end-to-end models by a large margin. While latent diffusion still has slightly better scaling properties, the authors believe their work is an important contribution that allows for end-to-end modeling with a single unified approach.

FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality

Authors: Zhengyao Lv, Chenyang Si, Junhao Song, Zhenyu Yang, Yu Qiao, Ziwei Liu, Kwan-Yee K. Wong

Source and references: https://arxiv.org/abs/2410.19355v1

Introduction

This paper presents FasterCache, a novel training-free strategy designed to significantly accelerate the inference of video diffusion models while preserving high-quality generation.

Get 7 day free trial

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.

Bi-Weekly AI Research Roundup

Latest research summaries in ML, Robotics, CV, NLP and AI

Contents

Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements

Introduction

Key Points

Methodology

Results and Findings

Implications and Conclusions

Brain-like Functional Organization within Large Language Models

Introduction

Key Points

Methodology

Results and Findings

Implications and Conclusions

Simpler Diffusion (SiD2): 1.5 FID on ImageNet512 with pixel-space diffusion

Introduction

Key Points

Methodology

Results and Findings

Implications and Conclusions

FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality

Introduction

Keep reading with a 7-day free trial