State of AI

State of AI

Share this post

State of AI
State of AI
Structured Reasoning, LLM Agent Fuzzing, and Multimodal Segmentation

Structured Reasoning, LLM Agent Fuzzing, and Multimodal Segmentation

Latest research summaries in ML, Robotics, CV, NLP and AI

State of AI's avatar
State of AI
Jun 07, 2025
∙ Paid
11

Share this post

State of AI
State of AI
Structured Reasoning, LLM Agent Fuzzing, and Multimodal Segmentation
4
Share

Welcome to Today’s Edition of State of AI
👋 And a big welcome to our 70new subscribers since last edition!

Today’s research covers everything from reasoning-based recommender systems to fuzzing LLM agents through indirect prompt injection. If you care about secure model deployment, vision-language grounding, compressed inference, or how to scale social simulations with LLMs you’re in for a treat.

Here’s what stood out:

  • R2Rec introduces structured interaction-of-thought chains to boost recommendation accuracy and interpretability outperforming classic LLM-based recsys by over 130%.

  • EnIGMA shows how interactive tools like debuggers can drastically improve an LM agent’s ability to solve CTF challenges, setting new records on three major security benchmarks.

  • AGENTFUZZER presents a black-box fuzzing approach that exposes how indirect prompt injection can quietly subvert LLM agents, no access to internals required.

  • Refer to Anything tackles the problem of segmenting images and videos based on multimodal prompts, bridging language and vision to allow free-form semantic querying.

  • DREAM dives into multimodal safety, using risk disentanglement to align large models without sacrificing task performance.

  • PAM builds on Segment Anything, letting models caption, explain, and recognize regions in video and images with LLM precision and lightweight speed.

  • KV Cache Compression introduces inference-time hyper-scaling using an 8× compression technique that maintains reasoning quality on long-context tasks.

There’s also new work on fast Shapley-based data valuation, test-time training via MesaNet, politics-altering LLM bias, and what social simulations with LLMs can teach us about ourselves.

Let’s get into it 👇

Contents

  1. Reason-to-Recommend: Using Interaction-of-Thought Reasoning to Enhance LLM Recommendation

  2. EnIGMA: Interactive Tools Substantially Assist LM Agents in Finding Security Vulnerabilities

  3. AGENTFUZZER: Generic Black-Box Fuzzing for Indirect Prompt Injection against LLM Agents

  4. Refer to Anything with Vision-Language Prompts

  5. Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos

  6. DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models

  7. Exploring Diffusion Transformer Designs via Grafting

  8. Power Law Guided Dynamic Sifting for Efficient Attention

  9. Fast-DataShapley: Neural Modeling for Training Data Valuation

  10. Inference-Time Hyper-Scaling with KV Cache Compression

  11. The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

  12. MesaNet: Sequence Modeling by Locally Optimal Test-Time Training

  13. Biased AI can Influence Political Decision-Making

  14. LLM Social Simulations Are a Promising Research Method

  15. Towards Effective Multidisciplinary Health and HCI Teams based on AI Framework

Reason-to-Recommend: Using Interaction-of-Thought Reasoning to Enhance LLM Recommendation

Authors: Keyu Zhao, Fengli Xu, Yong Li

Source and references: https://arxiv.org/abs/2506.05069v1


Introduction

This paper explores the integration of Large Language Models (LLMs) into recommendation tasks, with a focus on enhancing their reasoning capabilities to improve performance and interpretability.

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 StateOfAI
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share