Scaling Diverse Generation, Efficient LLM Training, and Simulated Robots for Task Planning

Latest research summaries in ML, Robotics, CV, NLP and AI

Aug 22, 2025

∙ Paid

Welcome to today's edition of State of AI 🚀

👋 And a warm welcome to our 117 new subscribers since last edition!

This edition covers a range of exciting advancements, from techniques for generating diverse and high-quality samples from diffusion models, to communication-efficient training of large language models, and the development of a comprehensive benchmark for evaluating robotic task planning and control in a simulated kitchen environment.

Here's what caught our attention:

Scaling Group Inference for Diverse and High-Quality Generation: A scalable method for optimizing sample quality and diversity in diffusion models, outperforming independent sampling and recent single-sample inference algorithms.
Communication Efficient LLM Pre-training with SparseLoCo: A communication-efficient training algorithm for large language models that achieves extreme compression ratios while maintaining or improving performance.
Mind and Motion Aligned: A Joint Evaluation IsaacSim Benchmark for Task Planning and Low-Level Policies in Mobile Manipulation: A unified benchmark for evaluating high-level task planning and low-level robot control in a realistic simulated kitchen environment.

Let's get into it 👇

Measuring the environmental impact of delivering AI at Google Scale

Authors: Cooper Elsworth, Keguo Huang, David Patterson, Ian Schneider, Robert Sedivy, Savannah Goodman, Ben Townsend, Parthasarathy Ranganathan, Jeff Dean, Amin Vahdat, Ben Gomes, James Manyika

Source and references: https://arxiv.org/abs/2508.15734v1

Introduction

This paper addresses the critical gap in understanding the environmental impact of delivering AI at scale by proposing and executing a comprehensive methodology for measuring the energy usage, carbon emissions, and water consumption of AI inference workloads in a large-scale, AI production environment.

Key Points

Propose a full-stack measurement approach that accounts for all material energy sources, including active AI accelerator power, host system energy, idle machine capacity, and data center energy overhead.
Apply this methodology to Google's Gemini Apps product to provide the first analysis of three AI serving environmental metrics: energy/prompt, emissions/prompt, and water consumption/prompt.
Demonstrate that existing measurement approaches are missing material energy consumption activities for AI serving.
Illustrate the compounding AI serving efficiency gains across the serving stack over a year of development, resulting in a 44x reduction in the total emissions generated for the median Gemini Apps prompt.

Methodology

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.