Routing, Adaptation, and Compression in Large AI Models

Latest research summaries in ML, Robotics, CV, NLP and AI

Oct 29, 2025

∙ Paid

Welcome to today’s edition of State of AI 👋 And a warm welcome to our 21 new subscribers since last edition!

This edition covers a range of fascinating research, from scalable mixture-of-experts models for visual synthesis, to data-aware initialization techniques for efficient fine-tuning, and even a new framework for transparently documenting machine learning sensors. We’ll also dive into innovations in long-context multi-modal AI and agentic parallel thinking.

Here’s what caught our attention:

Routing Matters in MoE: A Mixture-of-Experts framework with explicit routing guidance that outperforms state-of-the-art models in visual synthesis tasks while using fewer parameters.
LoRA-DA: Data-Aware Initialization: A theoretically-grounded initialization method that boosts the performance and stability of low-rank adaptation for fine-tuning large language models.
Datasheets for ML Sensors: A comprehensive framework to provide transparency and auditability for a new generation of intelligent sensing devices powered by machine learning.
Long-VITA: Scaling Large Multi-modal Models: A fully open-source multi-modal model that can process up to 1 million tokens, setting new state-of-the-art results on long-form video understanding.
ParallelMuse: Agentic Parallel Thinking: An innovative two-stage approach to enhance the performance and efficiency of deep information-seeking agents through proactive context management.

Let’s get into it 👇

Memory Mosaics at scale

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.

Routing, Adaptation, and Compression in Large AI Models

Latest research summaries in ML, Robotics, CV, NLP and AI

Contents

Memory Mosaics at scale

Keep reading with a 7-day free trial