State of AI

State of AI

Generative Models, Optimization, and Multimodal Reasoning

Latest research summaries in ML, Robotics, CV, NLP and AI

State of AI's avatar
State of AI
Aug 15, 2025
∙ Paid
10
1
Share

Welcome to today's edition of State of AI 👋 And a warm welcome to our 27 new subscribers since last edition!

This edition covers a range of topics, including cutting-edge progress in generative models for text-to-image and autoregressive image generation, novel optimization techniques for training large language models, and advancements in multimodal reasoning and understanding. We'll also explore the latest developments in Earth observation data processing and database security.

Here's what caught our attention:

  • NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale - This 14-billion-parameter autoregressive model achieves state-of-the-art performance in text-to-image generation, demonstrating exceptional compositional and linguistic understanding.

  • FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training - This new memory-efficient optimization framework reduces the memory overhead of the optimizer state, enabling more efficient training of large language models.

  • GLM-4.1V-Thinking and GLM-4.5V: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning - These vision-language models achieve state-of-the-art performance on 42 public benchmarks, highlighting the potential of reasoning-centric training for advancing general-purpose multimodal intelligence.

  • MAESTRO: Masked AutoEncoders for Multimodal, Multitemporal, and Multispectral Earth Observation Data - This novel adaptation of the Masked Autoencoder framework effectively handles the heterogeneity of Earth observation data, setting new state-of-the-art results on several benchmarks.

  • Leveraging large language models for SQL behavior-based database intrusion detection - This two-tier anomaly detection framework leverages large language models to identify both external and internal attacks on database systems, enhancing overall security.

Let's get into it 👇

Contents

  1. FROGENT: An End-to-End Full-process Drug Design Agent

  2. Modeling Human Responses to Multimodal AI Content

  3. MSRS: Adaptive Multi-Subspace Representation Steering for Attribute Alignment in Large Language Models

  4. GLM-4.1V-Thinking and GLM-4.5V: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

  5. NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale

  6. MAESTRO: Masked AutoEncoders for Multimodal, Multitemporal, and Multispectral Earth Observation Data

  7. Leveraging large language models for SQL behavior-based database intrusion detection

  8. FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training

  9. Conic Formulations of Transport Metrics for Unbalanced Measure Networks and Hypernetworks

  10. FreeKV: Boosting KV Cache Retrieval for Efficient LLM Inference

  11. BitDecoding: Unlocking Tensor Cores for Long-Context LLMs with Low-Bit KV Cache

  12. A Survey on Diffusion Language Models

  13. Towards Embodied Agentic AI: Review and Classification of LLM- and VLM-Driven Robot Autonomy and Interaction

  14. UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving

  15. Scaling Up without Fading Out: Goal-Aware Sparse GNN for RL-based Generalized Planning

FROGENT: An End-to-End Full-process Drug Design Agent

Authors: Qihua Pan, Dong Xu, Jenna Xinyi Yao, Lijia Ma, Zexuan Zhu, Junkai Ji

Source and references: https://arxiv.org/abs/2508.10760v1


Introduction

This paper introduces Frogent, an end-to-end full-process drug design agent that utilizes a Large Language Model and the Model Context Protocol to integrate multiple dynamic biochemical databases, extensible tool libraries, and task-specific AI models.

Key Points

  • Frogent is the first drug design agent for small molecules that integrates diverse drug discovery tools into a coherent and fully automated workflow.

  • Frogent supports the end-to-end execution of the entire drug discovery process, ranging from target identification to retrosynthetic planning.

  • Frogent accommodates continuously updated databases and tool libraries, enabling dynamic composition of tasks and teams working across disciplines.

  • Frogent can provide very competitive performance on diverse benchmark tasks, greatly reducing difficulty and increasing efficiency in drug research and development.

Methodology

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 StateOfAI
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture