State of AI

State of AI

Share this post

State of AI
State of AI
Visual Tokens, Strategic Reasoning, Protein Design, and Cloud Debugging with LLMs

Visual Tokens, Strategic Reasoning, Protein Design, and Cloud Debugging with LLMs

Latest research summaries in ML, Robotics, CV, NLP and AI

State of AI's avatar
State of AI
May 28, 2025
∙ Paid
4

Share this post

State of AI
State of AI
Visual Tokens, Strategic Reasoning, Protein Design, and Cloud Debugging with LLMs
3
Share

👋 Welcome to this week’s edition of State of AI, and a big hello to our 647 new subscribers since last edition!

This edition is stacked with research that rethinks the boundaries of AI systems, from how models reason about chemical reactions and strategy games to how they navigate human-filled environments and control physical robots. We’re seeing work that pushes into long-standing frontiers: hallucination in vision-language models, real-time protein design, memory systems in AI agents, and hardware-efficient decoding at scale. It’s a good week to be curious.

Some highlights:

  • ChemCoTBench raises the bar for LLMs in chemistry, trading shallow QA for modular reasoning grounded in real-world tasks.

  • TMGBench tests whether LLMs can actually think strategically across 2x2 game spaces—and where they still fall short.

  • Selftok proposes a unification of autoregressive and diffusion models through discrete visual tokens, with stunning performance boosts in vision tasks.

  • Dash, a low-code framework for cloud debugging with multi-modal RAG, hints at a future where LLMs help debug production systems in real-time.

  • Hume and EquAct bring System-2 reasoning and SE(3)-equivariance to robotics, enabling smarter manipulation and planning.

And don’t miss the papers on attention acceleration, protein generation, hallucination mitigation, and the evolving taxonomy of AI memory systems.

Let’s dig in 👇

Contents

  1. Diagnosing and Resolving Cloud Platform Instability with Multi-modal RAG LLMs

  2. Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations

  3. TMGBench: A Systematic Game Benchmark for Evaluating Strategic Reasoning Abilities of LLMs

  4. Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning

  5. Mitigating Hallucination in Large Vision-Language Models via Adaptive Attention Calibration

  6. ID-Align: RoPE-Conscious Position Remapping for Dynamic High-Resolution Adaptation in Vision-Language Models

  7. Guide your favorite protein sequence generative model

  8. Annealing Flow Generative Models Towards Sampling High-Dimensional and Multi-Modal Distributions

  9. Designing Cyclic Peptides via Harmonic SDE with Atom-Bond Modeling

  10. Hardware-Efficient Attention for Fast Decoding

  11. Accelerating Diffusion Language Model Inference via Efficient KV Caching and Guided Diffusion

  12. Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions

  13. EquAct: An SE(3)-Equivariant Multi-Task Transformer for Open-Loop Robotic Manipulation

  14. Hume: Introducing System-2 Thinking in Visual-Language-Action Model

  15. HA-VLN: A Benchmark for Human-Aware Navigation in Discrete-Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard

Diagnosing and Resolving Cloud Platform Instability with Multi-modal RAG LLMs

Authors: Yifan Wang, Kenneth P. Birman

Source and references: https://arxiv.org/abs/2505.21419v1


Introduction

This paper introduces Dash, a low-code development platform for creating AI applications in industrial settings, such as product inspection on factory shop floors.

Key Points

  • Dash aims to address three main challenges in low-code AI development: lack of composable tools for AI experts, difficulty in customizing AI models for deployment environments, and the need for better runtime support and troubleshooting.

  • Dash is designed to work with a distributed edge computing infrastructure, using Cascade as the underlying data and compute hosting framework.

  • Dash provides model recommendation and type checking features to assist AI experts in selecting and composing the right AI components.

  • Dash seeks to simplify the deployment process for non-expert deployment specialists by automating many customization tasks.

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 StateOfAI
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share