Structured Reasoning, LLM Agent Fuzzing, and Multimodal Segmentation
Latest research summaries in ML, Robotics, CV, NLP and AI
Welcome to Today’s Edition of State of AI
👋 And a big welcome to our 70new subscribers since last edition!
Today’s research covers everything from reasoning-based recommender systems to fuzzing LLM agents through indirect prompt injection. If you care about secure model deployment, vision-language grounding, compressed inference, or how to scale social simulations with LLMs you’re in for a treat.
Here’s what stood out:
R2Rec introduces structured interaction-of-thought chains to boost recommendation accuracy and interpretability outperforming classic LLM-based recsys by over 130%.
EnIGMA shows how interactive tools like debuggers can drastically improve an LM agent’s ability to solve CTF challenges, setting new records on three major security benchmarks.
AGENTFUZZER presents a black-box fuzzing approach that exposes how indirect prompt injection can quietly subvert LLM agents, no access to internals required.
Refer to Anything tackles the problem of segmenting images and videos based on multimodal prompts, bridging language and vision to allow free-form semantic querying.
DREAM dives into multimodal safety, using risk disentanglement to align large models without sacrificing task performance.
PAM builds on Segment Anything, letting models caption, explain, and recognize regions in video and images with LLM precision and lightweight speed.
KV Cache Compression introduces inference-time hyper-scaling using an 8× compression technique that maintains reasoning quality on long-context tasks.
There’s also new work on fast Shapley-based data valuation, test-time training via MesaNet, politics-altering LLM bias, and what social simulations with LLMs can teach us about ourselves.
Let’s get into it 👇
Contents
Reason-to-Recommend: Using Interaction-of-Thought Reasoning to Enhance LLM Recommendation
EnIGMA: Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities
AGENTFUZZER: Generic Black-Box Fuzzing for Indirect Prompt Injection against LLM Agents
Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos
DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models
Fast-DataShapley: Neural Modeling for Training Data Valuation
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text
MesaNet: Sequence Modeling by Locally Optimal Test-Time Training
Towards Effective Multidisciplinary Health and HCI Teams based on AI Framework
Reason-to-Recommend: Using Interaction-of-Thought Reasoning to Enhance LLM Recommendation
Authors: Keyu Zhao, Fengli Xu, Yong Li
Source and references: https://arxiv.org/abs/2506.05069v1
Introduction
This paper explores the integration of Large Language Models (LLMs) into recommendation tasks, with a focus on enhancing their reasoning capabilities to improve performance and interpretability.
Keep reading with a 7-day free trial
Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.