Routing, Adaptation, and Compression in Large AI Models
Latest research summaries in ML, Robotics, CV, NLP and AI
Welcome to today’s edition of State of AI 👋 And a warm welcome to our 21 new subscribers since last edition!
This edition covers a range of fascinating research, from scalable mixture-of-experts models for visual synthesis, to data-aware initialization techniques for efficient fine-tuning, and even a new framework for transparently documenting machine learning sensors. We’ll also dive into innovations in long-context multi-modal AI and agentic parallel thinking.
Here’s what caught our attention:
Routing Matters in MoE: A Mixture-of-Experts framework with explicit routing guidance that outperforms state-of-the-art models in visual synthesis tasks while using fewer parameters.
LoRA-DA: Data-Aware Initialization: A theoretically-grounded initialization method that boosts the performance and stability of low-rank adaptation for fine-tuning large language models.
Datasheets for ML Sensors: A comprehensive framework to provide transparency and auditability for a new generation of intelligent sensing devices powered by machine learning.
Long-VITA: Scaling Large Multi-modal Models: A fully open-source multi-modal model that can process up to 1 million tokens, setting new state-of-the-art results on long-form video understanding.
ParallelMuse: Agentic Parallel Thinking: An innovative two-stage approach to enhance the performance and efficiency of deep information-seeking agents through proactive context management.
Let’s get into it 👇
Contents
Bridging Tool Dependencies and Domain Knowledge: A Graph-Based Framework for In-Context Planning
OrchDAG: Complex Tool Orchestration in Multi-Turn Interactions with Plan DAGs
Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance
ADMN: A Layer-Wise Adaptive Multimodal Network for Dynamic Input Noise and Compute Resources
Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy
Pearl: A Foundation Model for Placing Every Atom in the Right Location
LoRA-DA: Data-Aware Initialization for Low-Rank Adaptation via Asymptotic Analysis
Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents
ParallelMuse: Agentic Parallel Thinking for Deep Information Seeking
AgentFold: Long-Horizon Web Agents with Proactive Context Management
Hybrid Deep Learning Model to Estimate Cognitive Effort from fNIRS Signals
GroundLoc: Efficient Large-Scale Outdoor LiDAR-Only Localization
Memory Mosaics at scale
Keep reading with a 7-day free trial
Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.


