State of AI

State of AI

Anomaly Detection, and RF Fingerprinting in Machine Learning Models

Latest research summaries in ML, Robotics, CV, NLP and AI

State of AI's avatar
State of AI
Sep 19, 2025
∙ Paid
11
3
Share

Welcome to today's edition of State of AI 👋 And a warm welcome to our 48 new subscribers since last edition!

This edition covers secure and trustworthy machine learning models all the way to advanced techniques for video generation and language model adaptation. We'll explore novel approaches to enhancing the reliability and safety of machine learning systems, as well as innovative methods for improving the reasoning and planning capabilities of AI agents.


Boost Your Productivity!

🔎 Research takes deep focus, but distractions always creep in. Forget is the productivity tool trusted by 20,000+ professionals to tackle ADHD challenges and time blindness by keeping one task always in sight. Whether you’re analyzing papers, coding, or writing, Forget helps you single-task your way to real progress.

Start Focusing!


Here's what caught our attention:

  • Watermarking and Anomaly Detection in Machine Learning Models for LORA RF Fingerprinting: A defense-in-depth approach that integrates deep learning, watermarking, and anomaly detection to achieve high classification accuracy while defending against model theft, weight tampering, and input-space evasion.

  • Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation: A systematic investigation into applying the next-token prediction paradigm to the visual domain, proposing techniques to enhance the image understanding capabilities of autoregressive models.

  • WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance: A training-free framework that leverages the prior world knowledge of video diffusion models to enable precise trajectory control and high-quality synthesis in both static 3D scenes and dynamic 4D scenes.

  • Debias your Large Multi-Modal Model at Test-Time via Non-Contrastive Visual Attribute Steering: A training-free approach for debiasing large multi-modal models by identifying and ablating linear directions in the model's activation space that correspond to its propensity to mention protected attributes.

  • Self-Adapting Language Models: A framework that enables large language models to self-adapt by generating their own finetuning data and update directives, directly using the model's generation to parameterize and control its own adaptation process.

Let's get into it 👇

Contents

  1. Listening, Imagining & Refining: A Heuristic Optimized ASR Correction Framework with LLMs

  2. Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning

  3. Watermarking and Anomaly Detection in Machine Learning Models for LORA RF Fingerprinting

  4. Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation

  5. WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance

  6. Debias your Large Multi-Modal Model at Test-Time via Non-Contrastive Visual Attribute Steering

  7. Modular Machine Learning: An Indispensable Path towards New-Generation Large Language Models

  8. Self-Adapting Language Models

  9. Super-Linear: A Lightweight Pretrained Mixture of Linear Experts for Time Series Forecasting

  10. FlowRL: Matching Reward Distributions for LLM Reasoning

  11. Fair-GPTQ: Bias-Aware Quantization for Large Language Models

  12. Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation

  13. Self-Improving Embodied Foundation Models

  14. ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

  15. GAF: Gaussian Action Field as a Dynamic World Model for Robotic Manipulation

Listening, Imagining & Refining: A Heuristic Optimized ASR Correction Framework with LLMs

Authors: Yutong Liu, Ziyue Zhang, Yongbin Yu, Xiangxiang Wang, Yuqing Cai, Nyima Tashi

Source and references: https://arxiv.org/abs/2509.15095v1


Introduction

This paper proposes LIR-ASR, a heuristic optimized iterative correction framework that utilizes large language models (LLMs) to improve the accuracy of automatic speech recognition (ASR) transcripts.

Key Points

  • LIR-ASR applies a "Listening-Imagining-Refining" strategy, where uncertain words are replaced with phonetically similar alternatives and then refined within the broader context.

  • A heuristic optimization with a finite state machine (FSM) is introduced to prevent the correction process from being trapped in local optima.

  • Rule-based constraints are designed to guide the correction process and reduce the risk of linguistically plausible but semantically inconsistent substitutions introduced by LLMs.

  • Experiments are conducted on both English and Chinese ASR outputs using Whisper-medium and Whisper-large-v3 models, as well as Qwen3-235B and DeepSeek-V3.1 LLMs.

  • LIR-ASR achieves average reductions in CER/WER of up to 1.5 percentage points compared to baselines, demonstrating substantial accuracy gains in transcription.

Methodology

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 StateOfAI
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture