Efficient RL, Robust Evaluation, and Expressive Diffusion Models
Latest research summaries in ML, Robotics, CV, NLP and AI
Welcome to today's edition of State of AI 👋 And a warm welcome to our 114 new subscribers since last edition!
This edition explores novel reinforcement learning techniques for large language models, innovative approaches to agent evaluation, and the advantages of diffusion models in data-constrained settings. We'll dive into how researchers are enhancing reasoning, prediction, and generation capabilities through principled modeling and training frameworks.
Here's what caught our attention:
Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle - A novel RL framework that dynamically prioritizes informative samples and reshapes the training data distribution to boost the reasoning capabilities of large language models.
Auto-Eval Judge: Towards a General Agentic Framework for Task Completion Evaluation - A modular, scalable evaluation framework that can assess agent performance across diverse domains by decomposing tasks and validating each step using available information.
Diffusion Beats Autoregressive in Data-Constrained Settings - A systematic study showing that diffusion models significantly outperform autoregressive models when compute is abundant but data is scarce, offering a promising alternative for efficient use of finite data.
Let's get into it 👇
Contents
GRAIL:Learning to Interact with Large Knowledge Graphs for Retrieval Augmented Reasoning
Auto-Eval Judge: Towards a General Agentic Framework for Task Completion Evaluation
Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
LLaVA-RE: Binary Image-Text Relevancy Evaluation with Multimodal Large Language Model
WeTok: Powerful Discrete Tokenization for High-Fidelity Visual Reconstruction
CAMA: Enhancing Mathematical Reasoning in Large Language Models with Causal Knowledge
Eliciting Latent Predictions from Transformers with the Tuned Lens
Cooper: Co-Optimizing Policy and Reward Models in Reinforcement Learning for Large Language Models
TrajEvo: Trajectory Prediction Heuristics Design via LLM-driven Evolution
Motion Planning Diffusion: Learning and Adapting Robot Motion Planning with Diffusion Models
InfiAlign: A Scalable and Sample-Efficient Framework for Aligning LLMs to Enhance Reasoning Capabilities
Authors: Shuo Cai, Su Lu, Qi Zhou, Kejing Yang, Zhijie Sang, Congkai Xie, Hongxia Yang
Source and references: https://arxiv.org/abs/2508.05496v1
Introduction
This paper introduces InfiAlign, a scalable and sample-efficient framework for aligning large language models (LLMs) to enhance their reasoning capabilities.
Key Points
InfiAlign integrates supervised fine-tuning (SFT) with Direct Preference Optimization (DPO) to align LLMs for enhanced reasoning.
Keep reading with a 7-day free trial
Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.