Panoramic Diffusion, Multimodal Anomaly Detection, and Efficient Robotic Policies

Latest research summaries in ML, Robotics, CV, NLP and AI

Aug 05, 2025

∙ Paid

Welcome to today's edition of State of AI 🤖

👋 And a warm welcome to our 66 new subscribers since last edition!

This edition covers some fascinating research on generating coherent panoramic images using large language models, novel techniques for detecting anomalies in tabular data, and methods for deploying efficient robotic control policies on mobile devices. We'll also delve into advancements in multimodal models for Chinese hate speech detection and efficient document-level relation extraction.

Here's what caught our attention:

PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs - A novel framework that redefines panoramic image generation as a next-token prediction task, enabling endless and coherent panorama generation.
Diffusion-Scheduled Denoising Autoencoders for Anomaly Detection in Tabular Data - A framework that integrates diffusion-based noise scheduling and contrastive learning to enhance tabular data representation and anomaly detection performance.
On-Device Diffusion Transformer Policy for Efficient Robot Manipulation - A method to accelerate diffusion-based robotic control policies for real-time deployment on mobile devices, achieving significant latency improvements without compromising performance.

Let's get into it 👇

Unraveling Hidden Representations: A Multi-Modal Layer Analysis for Better Synthetic Content Forensics

Authors: Tom Or, Omri Azencot

Source and references: https://arxiv.org/abs/2508.00784v1

Introduction

This paper proposes a novel approach for detecting synthetic content across multiple modalities, including images and audio, by leveraging the latent representations of large pre-trained multi-modal models.

Key Points

The paper extends the recent paradigm of using CLIP-ViT features for deepfake detection to the multi-modal setting, providing an in-depth analysis of such models.

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.