Automated Research, and Reasoning Breakthroughs

Sep 10, 2025

∙ Paid

Welcome to today's edition of State of AI 🚀

This issue highlights the latest advancements in unifying different modalities within large language models, automating the creation of research software, and pushing the boundaries of complex reasoning capabilities. We'll explore how these technical innovations are shaping the future of AI-powered scientific discovery and practical applications.

Here's what caught our attention:

Unified Multimodal LLMs with Discrete Representations: The AnyGPT model demonstrates how discrete representations can effectively unify the processing of speech, text, images, and music within a single language model.
Automating Empirical Software Creation: The AI system presented in this paper can systematically explore a vast solution space and integrate research ideas to generate expert-level empirical software across diverse scientific domains.
Evaluating LLM Reasoning with SearchBench: This new benchmark tests the ability of LLMs to reason about complex combinatorial search problems, revealing opportunities to improve their generalized reasoning capabilities.
Self-Improving Multimodal Models with Dual Rewards: The SUDER framework leverages the inherent duality between understanding and generation tasks to provide self-supervised optimization signals, enhancing the performance of unified LLMs.
Adaptive Multi-Turn RL for LLM Step-Provers: The BFS-Prover-V2 system scales up both training-time RL and inference-time compute to advance the integration of LLMs into automated theorem proving.

Let's get into it 👇

Bi-Weekly AI Research Roundup

Latest research summaries in ML, Robotics, CV, NLP and AI

An AI system to help scientists write expert-level empirical software

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.

Automated Research, and Reasoning Breakthroughs

Bi-Weekly AI Research Roundup

Contents

An AI system to help scientists write expert-level empirical software

Keep reading with a 7-day free trial