Visual Tokens, Strategic Reasoning, Protein Design, and Cloud Debugging with LLMs

Latest research summaries in ML, Robotics, CV, NLP and AI

May 28, 2025

∙ Paid

👋 Welcome to this week’s edition of State of AI, and a big hello to our 647 new subscribers since last edition!

This edition is stacked with research that rethinks the boundaries of AI systems, from how models reason about chemical reactions and strategy games to how they navigate human-filled environments and control physical robots. We’re seeing work that pushes into long-standing frontiers: hallucination in vision-language models, real-time protein design, memory systems in AI agents, and hardware-efficient decoding at scale. It’s a good week to be curious.

Some highlights:

ChemCoTBench raises the bar for LLMs in chemistry, trading shallow QA for modular reasoning grounded in real-world tasks.
TMGBench tests whether LLMs can actually think strategically across 2x2 game spaces—and where they still fall short.
Selftok proposes a unification of autoregressive and diffusion models through discrete visual tokens, with stunning performance boosts in vision tasks.
Dash, a low-code framework for cloud debugging with multi-modal RAG, hints at a future where LLMs help debug production systems in real-time.
Hume and EquAct bring System-2 reasoning and SE(3)-equivariance to robotics, enabling smarter manipulation and planning.

And don’t miss the papers on attention acceleration, protein generation, hallucination mitigation, and the evolving taxonomy of AI memory systems.

Let’s dig in 👇

Diagnosing and Resolving Cloud Platform Instability with Multi-modal RAG LLMs

Authors: Yifan Wang, Kenneth P. Birman

Source and references: https://arxiv.org/abs/2505.21419v1

Introduction

This paper introduces Dash, a low-code development platform for creating AI applications in industrial settings, such as product inspection on factory shop floors.

Key Points

Dash aims to address three main challenges in low-code AI development: lack of composable tools for AI experts, difficulty in customizing AI models for deployment environments, and the need for better runtime support and troubleshooting.
Dash is designed to work with a distributed edge computing infrastructure, using Cascade as the underlying data and compute hosting framework.
Dash provides model recommendation and type checking features to assist AI experts in selecting and composing the right AI components.
Dash seeks to simplify the deployment process for non-expert deployment specialists by automating many customization tasks.

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.