Structured Knowledge Flows and Entropy-Constrained Activations
Latest research summaries in ML, Robotics, CV, NLP and AI
Welcome to today’s edition of State of AI 👋 And a warm welcome to our new subscribers since last edition!
This edition explores breakthroughs in scaling up reinforcement learning for step-level theorem proving, innovative frameworks for structured knowledge management in deep research, and a novel paradigm for enhancing the performance of large language models, continuous control agents, and image classifiers through entropy-constrained activations.
Here’s what caught our attention:
Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers: A novel approach to training and inference scaling for large language model-based theorem provers.
FlowSearch: Advancing Deep Research with Dynamic Structured Knowledge Flow: A multi-agent system that constructs and evolves a dynamic knowledge flow to drive efficient and systematic problem-solving.
Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification: A new activation function that constrains the sampling entropy to improve performance across diverse domains.
SPAD: Specialized Prefill and Decode Hardware for Disaggregated LLM Inference: A heterogeneous system design that improves the efficiency of disaggregated large language model inference.
Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding: A novel algorithm that accelerates diffusion language model inference using lossless speculative decoding.
Let’s get into it 👇
Contents
Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers
BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem Proving
FlowSearch: Advancing deep research with dynamic structured knowledge flow
SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models
Improving Reasoning for Diffusion Language Models via Group Diffusion Policy Optimization
SPAD: Specialized Prefill and Decode Hardware for Disaggregated LLM Inference
Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
AutoMLGen: Navigating Fine-Grained Optimization for Coding Agents
BLAZER: Bootstrapping LLM-based Manipulation Agents with Zero-Shot Data Generation
NovaFlow: Zero-Shot Manipulation via Actionable Flow from Generated Videos
DexNDM: Closing the Reality Gap for Dexterous In-Hand Rotation via Joint-Wise Neural Dynamics Model
Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers
Authors: Ran Xin, Zeyu Zheng, Yanchen Nie, Kun Yuan, Xia Xiao
Source and references: https://arxiv.org/abs/2509.06493v2
Introduction
This paper introduces BFS-Prover-V2, an open-source step-level theorem proving system designed to address the dual challenges of scaling up both training-time reinforcement learning (RL) and inference-time compute in the integration of Large Language Models (LLMs) with automated theorem proving.
Key Points
A novel multi-turn off-policy RL framework for continually improving the performance of the LLM step-prover at training time, featuring adaptive tactic-level data filtering and periodic retraining.
A planner-enhanced multi-agent system that scales reasoning capabilities at inference time, using a general reasoning model as a high-level planner to iteratively decompose complex theorems into a sequence of simpler subgoals.
State-of-the-art results on established formal mathematics benchmarks, with BFS-Prover-V2 achieving 95.08% on the miniF2F test set and 41.4% on the ProofNet test set.
Methodology
Keep reading with a 7-day free trial
Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.


