Large Language 3D Modeling, Hardware Acceleration, and Code Verification
Latest research summaries in ML, Robotics, CV, NLP and AI
Welcome to today's edition of State of AI 👋 And a warm welcome to our 86 new subscribers since last edition!
Unlock Your Deepest Focus 50% Off for 2 Days
I’ve written before about how Forget Work is a quiet game changer for deep work and I stand by it. If you’ve ever lost an afternoon to tab chaos, dopamine-click traps, or “just one more Slack check,” this is your reset button. Forget Work strips away the noise, locks you into one task, and lets you finally finish what you start.
For the next 2 days only, you can grab it for 50% off. Don’t wait, your future focused self will thank you.
This issue dives into the latest research in large language model-powered 3D asset generation, novel hardware architectures for efficient attention computation, and the use of language models to enable formal verification of Python code. These cutting-edge advancements showcase the versatility of large language models and their growing impact across diverse domains.
Here's what caught our attention:
LL3M: Large Language 3D Modelers - A multi-agent system that leverages pre-trained language models to generate 3D assets by writing interpretable Blender scripts, enabling iterative, user-guided refinement of the generated content.
SystolicAttention: Fusing FlashAttention within a Single Systolic Array - A novel hardware architecture that enables the full execution of the FlashAttention algorithm within a single systolic array, significantly improving hardware utilization.
PyVeritas: On Verifying Python via LLM-Based Transpilation and Bounded Model Checking for C - A framework that combines LLM-based code transpilation, bounded model checking, and fault localization to enable formal verification and bug diagnosis for Python programs.
TBAC-UniImage: Unified Understanding and Generation by Ladder-Side Diffusion Tuning - A unified multimodal model that deeply integrates a pre-trained diffusion model with a language model to achieve high-quality and versatile text-to-image generation.
SynthVLM: Towards High-Quality and Efficient Synthesis of Image-Caption Datasets for Vision-Language Models - A data synthesis and curation method that generates high-quality, precisely aligned image-caption pairs to train advanced vision-language models.
Let's get into it 👇
Contents
SystolicAttention: Fusing FlashAttention within a Single Systolic Array
PyVeritas: On Verifying Python via LLM-Based Transpilation and Bounded Model Checking for C
TBAC-UniImage: Unified Understanding and Generation by Ladder-Side Diffusion Tuning
Spotter+GPT: Turning Sign Spottings into Sentences with LLMs
Multi-head Transformers Provably Learn Symbolic Multi-step Reasoning via Gradient Descent
Runtime Monitoring and Enforcement of Conditional Fairness in Generative AIs
MLOps with Microservices: A Case Study on the Maritime Domain
Efficient Speculative Decoding for Llama at Scale: Challenges and Solutions
Bringing Everyone to the Table: An Experimental Study of LLM-Facilitated Group Decision Making
Graffiti: Enabling an Ecosystem of Personalized and Interoperable Social Applications
LL3M: Large Language 3D Modelers
Authors: Sining Lu, Guan Chen, Nam Anh Dinh, Itai Lang, Ari Holtzman, Rana Hanocka
Source and references: https://arxiv.org/abs/2508.08228v1
Introduction
This paper presents LL3M, a multi-agent system that leverages pre-trained large language models (LLMs) to generate 3D assets by writing interpretable Python code in Blender.
Keep reading with a 7-day free trial
Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.