State of AI

State of AI

Efficient RL, Robust Evaluation, and Expressive Diffusion Models

Latest research summaries in ML, Robotics, CV, NLP and AI

State of AI's avatar
State of AI
Aug 09, 2025
∙ Paid
9
1
Share

Welcome to today's edition of State of AI 👋 And a warm welcome to our 114 new subscribers since last edition!

This edition explores novel reinforcement learning techniques for large language models, innovative approaches to agent evaluation, and the advantages of diffusion models in data-constrained settings. We'll dive into how researchers are enhancing reasoning, prediction, and generation capabilities through principled modeling and training frameworks.

Here's what caught our attention:

  • Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle - A novel RL framework that dynamically prioritizes informative samples and reshapes the training data distribution to boost the reasoning capabilities of large language models.

  • Auto-Eval Judge: Towards a General Agentic Framework for Task Completion Evaluation - A modular, scalable evaluation framework that can assess agent performance across diverse domains by decomposing tasks and validating each step using available information.

  • Diffusion Beats Autoregressive in Data-Constrained Settings - A systematic study showing that diffusion models significantly outperform autoregressive models when compute is abundant but data is scarce, offering a promising alternative for efficient use of finite data.

Let's get into it 👇

Contents

  1. InfiAlign: A Scalable and Sample-Efficient Framework for Aligning LLMs to Enhance Reasoning Capabilities

  2. GRAIL:Learning to Interact with Large Knowledge Graphs for Retrieval Augmented Reasoning

  3. Auto-Eval Judge: Towards a General Agentic Framework for Task Completion Evaluation

  4. Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision

  5. LLaVA-RE: Binary Image-Text Relevancy Evaluation with Multimodal Large Language Model

  6. WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction

  7. Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle

  8. CAMA: Enhancing Mathematical Reasoning in Large Language Models with Causal Knowledge

  9. Eliciting Latent Predictions from Transformers with the Tuned Lens

  10. TreeDiff: AST-Guided Code Generation with Diffusion LLMs

  11. Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models

  12. OmniEAR: Benchmarking Agent Reasoning in Embodied Tasks

  13. Diffusion Beats Autoregressive in Data-Constrained Settings

  14. TrajEvo: Trajectory Prediction Heuristics Design via LLM-driven Evolution

  15. Motion Planning Diffusion: Learning and Adapting Robot Motion Planning with Diffusion Models

InfiAlign: A Scalable and Sample-Efficient Framework for Aligning LLMs to Enhance Reasoning Capabilities

Authors: Shuo Cai, Su Lu, Qi Zhou, Kejing Yang, Zhijie Sang, Congkai Xie, Hongxia Yang

Source and references: https://arxiv.org/abs/2508.05496v1


Introduction

This paper introduces InfiAlign, a scalable and sample-efficient framework for aligning large language models (LLMs) to enhance their reasoning capabilities.

Key Points

  • InfiAlign integrates supervised fine-tuning (SFT) with Direct Preference Optimization (DPO) to align LLMs for enhanced reasoning.

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 StateOfAI
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture