Generative Models, Optimization, and Multimodal Reasoning

Latest research summaries in ML, Robotics, CV, NLP and AI

Aug 15, 2025

∙ Paid

Welcome to today's edition of State of AI 👋 And a warm welcome to our 27 new subscribers since last edition!

This edition covers a range of topics, including cutting-edge progress in generative models for text-to-image and autoregressive image generation, novel optimization techniques for training large language models, and advancements in multimodal reasoning and understanding. We'll also explore the latest developments in Earth observation data processing and database security.

Here's what caught our attention:

NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale - This 14-billion-parameter autoregressive model achieves state-of-the-art performance in text-to-image generation, demonstrating exceptional compositional and linguistic understanding.
FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training - This new memory-efficient optimization framework reduces the memory overhead of the optimizer state, enabling more efficient training of large language models.
GLM-4.1V-Thinking and GLM-4.5V: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning - These vision-language models achieve state-of-the-art performance on 42 public benchmarks, highlighting the potential of reasoning-centric training for advancing general-purpose multimodal intelligence.
MAESTRO: Masked AutoEncoders for Multimodal, Multitemporal, and Multispectral Earth Observation Data - This novel adaptation of the Masked Autoencoder framework effectively handles the heterogeneity of Earth observation data, setting new state-of-the-art results on several benchmarks.
Leveraging large language models for SQL behavior-based database intrusion detection - This two-tier anomaly detection framework leverages large language models to identify both external and internal attacks on database systems, enhancing overall security.

Let's get into it 👇

FROGENT: An End-to-End Full-process Drug Design Agent

Authors: Qihua Pan, Dong Xu, Jenna Xinyi Yao, Lijia Ma, Zexuan Zhu, Junkai Ji

Source and references: https://arxiv.org/abs/2508.10760v1

Introduction

This paper introduces Frogent, an end-to-end full-process drug design agent that utilizes a Large Language Model and the Model Context Protocol to integrate multiple dynamic biochemical databases, extensible tool libraries, and task-specific AI models.

Key Points

Frogent is the first drug design agent for small molecules that integrates diverse drug discovery tools into a coherent and fully automated workflow.
Frogent supports the end-to-end execution of the entire drug discovery process, ranging from target identification to retrosynthetic planning.
Frogent accommodates continuously updated databases and tool libraries, enabling dynamic composition of tasks and teams working across disciplines.
Frogent can provide very competitive performance on diverse benchmark tasks, greatly reducing difficulty and increasing efficiency in drug research and development.

Methodology

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.