Structured Knowledge Flows and Entropy-Constrained Activations

Latest research summaries in ML, Robotics, CV, NLP and AI

Oct 11, 2025

∙ Paid

Welcome to today’s edition of State of AI 👋 And a warm welcome to our new subscribers since last edition!

This edition explores breakthroughs in scaling up reinforcement learning for step-level theorem proving, innovative frameworks for structured knowledge management in deep research, and a novel paradigm for enhancing the performance of large language models, continuous control agents, and image classifiers through entropy-constrained activations.

Here’s what caught our attention:

Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers: A novel approach to training and inference scaling for large language model-based theorem provers.
FlowSearch: Advancing Deep Research with Dynamic Structured Knowledge Flow: A multi-agent system that constructs and evolves a dynamic knowledge flow to drive efficient and systematic problem-solving.
Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification: A new activation function that constrains the sampling entropy to improve performance across diverse domains.
SPAD: Specialized Prefill and Decode Hardware for Disaggregated LLM Inference: A heterogeneous system design that improves the efficiency of disaggregated large language model inference.
Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding: A novel algorithm that accelerates diffusion language model inference using lossless speculative decoding.

Let’s get into it 👇

Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers

Authors: Ran Xin, Zeyu Zheng, Yanchen Nie, Kun Yuan, Xia Xiao

Source and references: https://arxiv.org/abs/2509.06493v2

Introduction

This paper introduces BFS-Prover-V2, an open-source step-level theorem proving system designed to address the dual challenges of scaling up both training-time reinforcement learning (RL) and inference-time compute in the integration of Large Language Models (LLMs) with automated theorem proving.

Key Points

A novel multi-turn off-policy RL framework for continually improving the performance of the LLM step-prover at training time, featuring adaptive tactic-level data filtering and periodic retraining.
A planner-enhanced multi-agent system that scales reasoning capabilities at inference time, using a general reasoning model as a high-level planner to iteratively decompose complex theorems into a sequence of simpler subgoals.
State-of-the-art results on established formal mathematics benchmarks, with BFS-Prover-V2 achieving 95.08% on the miniF2F test set and 41.4% on the ProofNet test set.

Methodology

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.