State of AI

State of AI

Routing, Adaptation, and Compression in Large AI Models

Latest research summaries in ML, Robotics, CV, NLP and AI

State of AI's avatar
State of AI
Oct 29, 2025
∙ Paid

Welcome to today’s edition of State of AI 👋 And a warm welcome to our 21 new subscribers since last edition!

This edition covers a range of fascinating research, from scalable mixture-of-experts models for visual synthesis, to data-aware initialization techniques for efficient fine-tuning, and even a new framework for transparently documenting machine learning sensors. We’ll also dive into innovations in long-context multi-modal AI and agentic parallel thinking.

Here’s what caught our attention:

  • Routing Matters in MoE: A Mixture-of-Experts framework with explicit routing guidance that outperforms state-of-the-art models in visual synthesis tasks while using fewer parameters.

  • LoRA-DA: Data-Aware Initialization: A theoretically-grounded initialization method that boosts the performance and stability of low-rank adaptation for fine-tuning large language models.

  • Datasheets for ML Sensors: A comprehensive framework to provide transparency and auditability for a new generation of intelligent sensing devices powered by machine learning.

  • Long-VITA: Scaling Large Multi-modal Models: A fully open-source multi-modal model that can process up to 1 million tokens, setting new state-of-the-art results on long-form video understanding.

  • ParallelMuse: Agentic Parallel Thinking: An innovative two-stage approach to enhance the performance and efficiency of deep information-seeking agents through proactive context management.

Let’s get into it 👇

Contents

  1. Memory Mosaics at scale

  2. Bridging Tool Dependencies and Domain Knowledge: A Graph-Based Framework for In-Context Planning

  3. OrchDAG: Complex Tool Orchestration in Multi-Turn Interactions with Plan DAGs

  4. Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance

  5. ADMN: A Layer-Wise Adaptive Multimodal Network for Dynamic Input Noise and Compute Resources

  6. Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy

  7. Pearl: A Foundation Model for Placing Every Atom in the Right Location

  8. Greedy Sampling Is Provably Efficient for RLHF

  9. LoRA-DA: Data-Aware Initialization for Low-Rank Adaptation via Asymptotic Analysis

  10. Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

  11. ParallelMuse: Agentic Parallel Thinking for Deep Information Seeking

  12. AgentFold: Long-Horizon Web Agents with Proactive Context Management

  13. Datasheets for Machine Learning Sensors

  14. Hybrid Deep Learning Model to Estimate Cognitive Effort from fNIRS Signals

  15. GroundLoc: Efficient Large-Scale Outdoor LiDAR-Only Localization

Memory Mosaics at scale

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 StateOfAI
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture