Anomaly Detection, and RF Fingerprinting in Machine Learning Models
Latest research summaries in ML, Robotics, CV, NLP and AI
Welcome to today's edition of State of AI 👋 And a warm welcome to our 48 new subscribers since last edition!
This edition covers secure and trustworthy machine learning models all the way to advanced techniques for video generation and language model adaptation. We'll explore novel approaches to enhancing the reliability and safety of machine learning systems, as well as innovative methods for improving the reasoning and planning capabilities of AI agents.
Boost Your Productivity!
🔎 Research takes deep focus, but distractions always creep in. Forget is the productivity tool trusted by 20,000+ professionals to tackle ADHD challenges and time blindness by keeping one task always in sight. Whether you’re analyzing papers, coding, or writing, Forget helps you single-task your way to real progress.
Here's what caught our attention:
- Watermarking and Anomaly Detection in Machine Learning Models for LORA RF Fingerprinting: A defense-in-depth approach that integrates deep learning, watermarking, and anomaly detection to achieve high classification accuracy while defending against model theft, weight tampering, and input-space evasion. 
- Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation: A systematic investigation into applying the next-token prediction paradigm to the visual domain, proposing techniques to enhance the image understanding capabilities of autoregressive models. 
- WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance: A training-free framework that leverages the prior world knowledge of video diffusion models to enable precise trajectory control and high-quality synthesis in both static 3D scenes and dynamic 4D scenes. 
- Debias your Large Multi-Modal Model at Test-Time via Non-Contrastive Visual Attribute Steering: A training-free approach for debiasing large multi-modal models by identifying and ablating linear directions in the model's activation space that correspond to its propensity to mention protected attributes. 
- Self-Adapting Language Models: A framework that enables large language models to self-adapt by generating their own finetuning data and update directives, directly using the model's generation to parameterize and control its own adaptation process. 
Let's get into it 👇
Contents
- Listening, Imagining & Refining: A Heuristic Optimized ASR Correction Framework with LLMs 
- Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning 
- Watermarking and Anomaly Detection in Machine Learning Models for LORA RF Fingerprinting 
- Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation 
- WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance 
- Debias your Large Multi-Modal Model at Test-Time via Non-Contrastive Visual Attribute Steering 
- Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models 
- Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting 
- Fair-GPTQ: Bias-Aware Quantization for Large Language Models 
- Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation 
- ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning 
- GAF: Gaussian Action Field as a Dynamic World Model for Robotic Manipulation 
Listening, Imagining & Refining: A Heuristic Optimized ASR Correction Framework with LLMs
Authors: Yutong Liu, Ziyue Zhang, Yongbin Yu, Xiangxiang Wang, Yuqing Cai, Nyima Tashi
Source and references: https://arxiv.org/abs/2509.15095v1
Introduction
This paper proposes LIR-ASR, a heuristic optimized iterative correction framework that utilizes large language models (LLMs) to improve the accuracy of automatic speech recognition (ASR) transcripts.
Key Points
- LIR-ASR applies a "Listening-Imagining-Refining" strategy, where uncertain words are replaced with phonetically similar alternatives and then refined within the broader context. 
- A heuristic optimization with a finite state machine (FSM) is introduced to prevent the correction process from being trapped in local optima. 
- Rule-based constraints are designed to guide the correction process and reduce the risk of linguistically plausible but semantically inconsistent substitutions introduced by LLMs. 
- Experiments are conducted on both English and Chinese ASR outputs using Whisper-medium and Whisper-large-v3 models, as well as Qwen3-235B and DeepSeek-V3.1 LLMs. 
- LIR-ASR achieves average reductions in CER/WER of up to 1.5 percentage points compared to baselines, demonstrating substantial accuracy gains in transcription. 
Methodology
Keep reading with a 7-day free trial
Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.


