State of AI

State of AI

Share this post

State of AI
State of AI
GenAI Robot Navigation, and Multiagent Systems

GenAI Robot Navigation, and Multiagent Systems

Latest research summaries in ML, Robotics, CV, NLP and AI

State of AI's avatar
State of AI
Jul 02, 2025
∙ Paid
4

Share this post

State of AI
State of AI
GenAI Robot Navigation, and Multiagent Systems
Share

Welcome to today's edition of State of AI 👋 And a warm welcome to our 32 new subscribers since last edition!

Let's get into it 👇

Contents

  1. MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks

  2. Evaluating GPT- and Reasoning-based Large Language Models on Physics Olympiad Problems: Surpassing Human Performance and Implications for Educational Assessment

  3. Privacy-Preserving LLM Interaction with Socratic Chain-of-Thought Reasoning and Homomorphically Encrypted Vector Databases

  4. Bridging SFT and DPO for Diffusion Model Alignment with Self-Sampling Preference Optimization

  5. Prompt-Guided Latent Diffusion with Predictive Class Conditioning for 3D Prostate MRI Generation

  6. Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors

  7. Discrete Diffusion in Large Language and Multimodal Models: A Survey

  8. Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought

  9. Studying and Improving Graph Neural Network-based Motif Estimation

  10. Large Language Model Confidence Estimation via Black-Box Access

  11. MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research

  12. Benchmarking the Pedagogical Knowledge of Large Language Models

  13. ParticleFormer: A 3D Point Cloud World Model for Multi-Object, Multi-Material Robotic Manipulation

  14. Diffuse-CLoC: Guided Diffusion for Physics-based Character Look-ahead Control

  15. A Dataset for Enhancing MLLMs in Visualization Understanding and Reconstruction

MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks

Authors: Guiyao Tie, Xueyang Zhou, Tianhe Gu, Ruihang Zhang, Chaoran Hu, Sizhe Zhang, Mengqu Sun, Yan Zhang, Pan Zhou, Lichao Sun

Source and references: https://arxiv.org/abs/2505.16459v3


Introduction

This paper introduces MMLU-Reason, a new benchmark designed to rigorously evaluate the multi-modal reasoning capabilities of large language models, particularly those with intermediate thinking mechanisms (MLLMs-T).

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 StateOfAI
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share