GenAI Robot Navigation, and Multiagent Systems

Latest research summaries in ML, Robotics, CV, NLP and AI

State of AI

Jul 02, 2025

∙ Paid

Welcome to today's edition of State of AI 👋 And a warm welcome to our 32 new subscribers since last edition!

Let's get into it 👇

MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks
Evaluating GPT- and Reasoning-based Large Language Models on Physics Olympiad Problems: Surpassing Human Performance and Implications for Educational Assessment
Privacy-Preserving LLM Interaction with Socratic Chain-of-Thought Reasoning and Homomorphically Encrypted Vector Databases
Bridging SFT and DPO for Diffusion Model Alignment with Self-Sampling Preference Optimization
Prompt-Guided Latent Diffusion with Predictive Class Conditioning for 3D Prostate MRI Generation
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors
Discrete Diffusion in Large Language and Multimodal Models: A Survey
Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought
Studying and Improving Graph Neural Network-based Motif Estimation
Large Language Model Confidence Estimation via Black-Box Access
MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research
Benchmarking the Pedagogical Knowledge of Large Language Models
ParticleFormer: A 3D Point Cloud World Model for Multi-Object, Multi-Material Robotic Manipulation
Diffuse-CLoC: Guided Diffusion for Physics-based Character Look-ahead Control
A Dataset for Enhancing MLLMs in Visualization Understanding and Reconstruction

MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks

Authors: Guiyao Tie, Xueyang Zhou, Tianhe Gu, Ruihang Zhang, Chaoran Hu, Sizhe Zhang, Mengqu Sun, Yan Zhang, Pan Zhou, Lichao Sun

Source and references: https://arxiv.org/abs/2505.16459v3

Introduction

This paper introduces MMLU-Reason, a new benchmark designed to rigorously evaluate the multi-modal reasoning capabilities of large language models, particularly those with intermediate thinking mechanisms (MLLMs-T).

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.

GenAI Robot Navigation, and Multiagent Systems

Latest research summaries in ML, Robotics, CV, NLP and AI

Contents

MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks

Introduction

Keep reading with a 7-day free trial