GenAI Robot Navigation, and Multiagent Systems
Latest research summaries in ML, Robotics, CV, NLP and AI
Welcome to today's edition of State of AI 👋 And a warm welcome to our 32 new subscribers since last edition!
Let's get into it 👇
Contents
Bridging SFT and DPO for Diffusion Model Alignment with Self-Sampling Preference Optimization
Prompt-Guided Latent Diffusion with Predictive Class Conditioning for 3D Prostate MRI Generation
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors
Discrete Diffusion in Large Language and Multimodal Models: A Survey
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought
Studying and Improving Graph Neural Network-based Motif Estimation
Large Language Model Confidence Estimation via Black-Box Access
MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research
Benchmarking the Pedagogical Knowledge of Large Language Models
ParticleFormer: A 3D Point Cloud World Model for Multi-Object, Multi-Material Robotic Manipulation
Diffuse-CLoC: Guided Diffusion for Physics-based Character Look-ahead Control
A Dataset for Enhancing MLLMs in Visualization Understanding and Reconstruction
MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks
Authors: Guiyao Tie, Xueyang Zhou, Tianhe Gu, Ruihang Zhang, Chaoran Hu, Sizhe Zhang, Mengqu Sun, Yan Zhang, Pan Zhou, Lichao Sun
Source and references: https://arxiv.org/abs/2505.16459v3
Introduction
This paper introduces MMLU-Reason, a new benchmark designed to rigorously evaluate the multi-modal reasoning capabilities of large language models, particularly those with intermediate thinking mechanisms (MLLMs-T).
Keep reading with a 7-day free trial
Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.