Scaling Diverse Generation, Efficient LLM Training, and Simulated Robots for Task Planning
Latest research summaries in ML, Robotics, CV, NLP and AI
Welcome to today's edition of State of AI 🚀
👋 And a warm welcome to our 117 new subscribers since last edition!
This edition covers a range of exciting advancements, from techniques for generating diverse and high-quality samples from diffusion models, to communication-efficient training of large language models, and the development of a comprehensive benchmark for evaluating robotic task planning and control in a simulated kitchen environment.
Here's what caught our attention:
Scaling Group Inference for Diverse and High-Quality Generation: A scalable method for optimizing sample quality and diversity in diffusion models, outperforming independent sampling and recent single-sample inference algorithms.
Communication Efficient LLM Pre-training with SparseLoCo: A communication-efficient training algorithm for large language models that achieves extreme compression ratios while maintaining or improving performance.
Mind and Motion Aligned: A Joint Evaluation IsaacSim Benchmark for Task Planning and Low-Level Policies in Mobile Manipulation: A unified benchmark for evaluating high-level task planning and low-level robot control in a realistic simulated kitchen environment.
Let's get into it 👇
Contents
Measuring the environmental impact of delivering AI at Google Scale
VerilogLAVD: LLM-Aided Rule Generation for Vulnerability Detection in Verilog
Scaling Group Inference for Diverse and High-Quality Generation
CineScale: Free Lunch in High-Resolution Cinematic Visual Generation
OPDR: Order-Preserving Dimension Reduction for Semantic Embedding of Multimodal Scientific Data
Measuring the environmental impact of delivering AI at Google Scale
Authors: Cooper Elsworth, Keguo Huang, David Patterson, Ian Schneider, Robert Sedivy, Savannah Goodman, Ben Townsend, Parthasarathy Ranganathan, Jeff Dean, Amin Vahdat, Ben Gomes, James Manyika
Source and references: https://arxiv.org/abs/2508.15734v1
Introduction
This paper addresses the critical gap in understanding the environmental impact of delivering AI at scale by proposing and executing a comprehensive methodology for measuring the energy usage, carbon emissions, and water consumption of AI inference workloads in a large-scale, AI production environment.
Key Points
Propose a full-stack measurement approach that accounts for all material energy sources, including active AI accelerator power, host system energy, idle machine capacity, and data center energy overhead.
Apply this methodology to Google's Gemini Apps product to provide the first analysis of three AI serving environmental metrics: energy/prompt, emissions/prompt, and water consumption/prompt.
Demonstrate that existing measurement approaches are missing material energy consumption activities for AI serving.
Illustrate the compounding AI serving efficiency gains across the serving stack over a year of development, resulting in a 44x reduction in the total emissions generated for the median Gemini Apps prompt.
Methodology
Keep reading with a 7-day free trial
Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.