State of AI

State of AI

Structured Knowledge Flows and Entropy-Constrained Activations

Latest research summaries in ML, Robotics, CV, NLP and AI

State of AI's avatar
State of AI
Oct 11, 2025
∙ Paid

Welcome to today’s edition of State of AI 👋 And a warm welcome to our new subscribers since last edition!

This edition explores breakthroughs in scaling up reinforcement learning for step-level theorem proving, innovative frameworks for structured knowledge management in deep research, and a novel paradigm for enhancing the performance of large language models, continuous control agents, and image classifiers through entropy-constrained activations.

Here’s what caught our attention:

  • Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers: A novel approach to training and inference scaling for large language model-based theorem provers.

  • FlowSearch: Advancing Deep Research with Dynamic Structured Knowledge Flow: A multi-agent system that constructs and evolves a dynamic knowledge flow to drive efficient and systematic problem-solving.

  • Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification: A new activation function that constrains the sampling entropy to improve performance across diverse domains.

  • SPAD: Specialized Prefill and Decode Hardware for Disaggregated LLM Inference: A heterogeneous system design that improves the efficiency of disaggregated large language model inference.

  • Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding: A novel algorithm that accelerates diffusion language model inference using lossless speculative decoding.

Let’s get into it 👇

Contents

  1. Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers

  2. BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem Proving

  3. FlowSearch: Advancing deep research with dynamic structured knowledge flow

  4. How to Teach Large Multimodal Models New Skills

  5. NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints

  6. SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models

  7. Improving Reasoning for Diffusion Language Models via Group Diffusion Policy Optimization

  8. Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints

  9. SPAD: Specialized Prefill and Decode Hardware for Disaggregated LLM Inference

  10. Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding

  11. CaRT: Teaching LLM Agents to Know When They Know Enough

  12. AutoMLGen: Navigating Fine-Grained Optimization for Coding Agents

  13. BLAZER: Bootstrapping LLM-based Manipulation Agents with Zero-Shot Data Generation

  14. NovaFlow: Zero-Shot Manipulation via Actionable Flow from Generated Videos

  15. DexNDM: Closing the Reality Gap for Dexterous In-Hand Rotation via Joint-Wise Neural Dynamics Model

Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers

Authors: Ran Xin, Zeyu Zheng, Yanchen Nie, Kun Yuan, Xia Xiao

Source and references: https://arxiv.org/abs/2509.06493v2


Introduction

This paper introduces BFS-Prover-V2, an open-source step-level theorem proving system designed to address the dual challenges of scaling up both training-time reinforcement learning (RL) and inference-time compute in the integration of Large Language Models (LLMs) with automated theorem proving.

Key Points

  • A novel multi-turn off-policy RL framework for continually improving the performance of the LLM step-prover at training time, featuring adaptive tactic-level data filtering and periodic retraining.

  • A planner-enhanced multi-agent system that scales reasoning capabilities at inference time, using a general reasoning model as a high-level planner to iteratively decompose complex theorems into a sequence of simpler subgoals.

  • State-of-the-art results on established formal mathematics benchmarks, with BFS-Prover-V2 achieving 95.08% on the miniF2F test set and 41.4% on the ProofNet test set.

Methodology

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 StateOfAI
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture