State of AI

State of AI

LLM agents for neuroimaging, Minecraft self-evolution, 26× model compression, MoE expert pooling, and 1M-token attention

State of AI's avatar
State of AI
May 08, 2026
∙ Paid

Welcome to today’s edition of State of AI 👋 And a warm welcome to our new subscribers since last edition!

This edition brings a fascinating convergence around three major themes in AI systems. First, we’re seeing LLM agents move beyond simple task completion into sophisticated domain-specific automation—from neuroimaging pipelines to vulnerability reconstruction and long-horizon embodied reasoning in Minecraft. Second, a wave of work on model efficiency and architecture design challenges conventional wisdom: shared expert pools for MoE models, structured knowledge distillation achieving 26× compression, and novel training methods that question the necessity of linear scaling laws. Finally, we have tangible breakthroughs in continuous generation (diffusion-based language models), physics-aware robotics (motion retargeting via bilevel optimization), and long-context training (efficient attention mechanisms that scale to 1M tokens).

Here’s what caught our attention:

  • NeuroAgent: LLM Agents for Multimodal Neuroimaging Analysis and Research — A hierarchical multi-agent system that automates the full neuroimaging pipeline from raw DICOM through preprocessing to Alzheimer’s classification, achieving 0.9518 ROC-AUC on 1,470 subjects while introducing a generate-execute-validate feedback loop that handles errors autonomously.

  • MineEvolve: Self-Evolution with Accumulated Knowledge for Long-Horizon Embodied Minecraft Agents — Demonstrates how converting fine-grained execution signals into typed feedback, structured skills, and remedies enables embodied agents to improve on complex multi-step tasks, outperforming static retrieval and generic reflection across multiple language model backends.

  • DARK: Diagonal-Anchored Repulsive Knowledge Distillation for Vision-Language Models under Extreme Compression — Achieves 26× parameter reduction in vision encoders through asymmetric decomposition of the contrastive loss, where diagonal (matched pairs) remains fixed while off-diagonal terms transition from positive to negative weighting, enabling MobileFetalCLIP to match or exceed its 427M-parameter teacher.

  • Continuous Latent Diffusion Language Model (ColaDLM) — Rethinks text generation as hierarchical latent diffusion with global semantic organization in continuous space followed by conditional decoding, establishing a theoretical Markov-path framework that unifies autoregressive, discrete diffusion, and continuous approaches while demonstrating competitive scaling behavior.

  • UniPool: A Globally Shared Expert Pool for Mixture-of-Experts — Replaces per-layer expert ownership with a globally shared pool and pool-level auxiliary loss, converting expert-parameter growth from linear to sublinear with depth while maintaining or exceeding standard MoE performance at 41.6%–66.7% of expert parameters.

  • Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models — Formulates reasoning strategy selection as a contextual multi-armed bandit problem, achieving 28–35% reduction in inference time compared to comparable methods while improving accuracy by 4–12 percentage points on mathematical and puzzle-solving tasks.

  • Lighthouse Attention: Efficient Long-Context Pre-Training with Symmetric Hierarchical Pooling — Introduces parameter-free L2-norm-based selection with symmetric hierarchical pooling that maintains (Q, K, V) coherence across pyramid levels, enabling 1.69× wall-clock speedup at 98K context and seamless recovery to dense SDPA attention at inference.

  • ReActor: Physics-Aware Motion Retargeting via Reinforcement Learning — Jointly optimizes retargeting parameters and tracking policies through bilevel optimization with derived gradient approximations, eliminating foot sliding and self-penetration artifacts while enabling successful hardware deployment on physical robots.

Let’s get into it 👇

Bi-Weekly AI Research Roundup

Latest research summaries in ML, Robotics, CV, NLP and AI

Contents

  1. NeuroAgent: LLM Agents for Multimodal Neuroimaging Analysis and Research

  2. MineEvolve: Self-Evolution with Accumulated Knowledge for Long-Horizon Embodied Minecraft Agents

  3. Patch2Vuln: Agentic Reconstruction of Vulnerabilities from Linux Distribution Binary Patches

  4. Continuous Latent Diffusion Language Model

  5. ActCam: Zero-Shot Joint Camera and 3D Motion Control for Video Generation

  6. DARK: Diagonal-Anchored Repulsive Knowledge Distillation for Vision-Language Models under Extreme Compression

  7. UniPool: A Globally Shared Expert Pool for Mixture-of-Experts

  8. Meta-Reasoner: Dynamic Guidance for Optimized Inference-time Reasoning in Large Language Models

  9. Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less

  10. EMO: Pretraining Mixture of Experts for Emergent Modularity

  11. Efficient Pre-Training with Token Superposition

  12. Long Context Pre-Training with Lighthouse Attention

  13. Cross-Modal Navigation with Multi-Agent Reinforcement Learning

  14. ReActor: Reinforcement Learning for Physics-Aware Motion Retargeting

  15. Multi-Robot Coordination in V2X Environments

NeuroAgent: LLM Agents for Multimodal Neuroimaging Analysis and Research

Authors: Lujia Zhong, Yihao Xia, Jianwei Zhang, Shuo huang, Jiaxin Yue, Mingyang Xia, Yonggang Shi

Source and references: https://arxiv.org/abs/2605.06584v1


Introduction

NeuroAgent is an LLM-driven autonomous framework that automates multimodal neuroimaging analysis, from raw preprocessing to downstream statistical analysis and disease classification. The system addresses a critical bottleneck in neuroimaging research: the substantial expert labor required to preprocess heterogeneous data across multiple imaging modalities (structural MRI, functional MRI, diffusion MRI, and PET) and coordinate complex, modality-specific toolchains.

Key Points

  • Hierarchical multi-agent architecture: The system employs a Central Orchestrator that decomposes natural-language research goals into structured workflows, dispatching tasks to specialized agents for each imaging modality (sMRI, fMRI, dMRI, PET) with domain-specific knowledge and toolchains.

  • Generate-Execute-Validate feedback loop: Rather than static script execution, NeuroAgent iteratively generates preprocessing code, executes it in a sandboxed environment, validates output integrity against structural schemas (e.g., BIDS compliance), and automatically recovers from runtime errors by analyzing logs and re-issuing corrected tool calls.

  • End-to-end automation from raw data to results: The framework spans the complete neuroimaging lifecycle—converting raw DICOM acquisitions through standardized preprocessing, assembling multimodal datasets, performing group-level statistics via natural-language queries, and training disease classifiers—all with minimal manual intervention through a Human-in-the-Loop interface.

  • Robust performance across LLM backends: Ablation studies show that capable models reach 100% intent-parsing accuracy and up to 84.8% end-to-end preprocessing step correctness (Qwen3.5-27B), with smaller models (4B parameters) matching larger ones on structured tasks when properly prompted.

  • Strong downstream classification performance: On 1,470 ADNI subjects (1,000 cognitively normal, 470 Alzheimer’s disease), the agent ensemble achieves ROC-AUC 0.9518 for AD diagnosis using four modalities, outperforming all single-modality baselines and demonstrating that automated preprocessing preserves diagnostic signal.

Methodology

User's avatar

Continue reading this post for free, courtesy of State of AI.

Or purchase a paid subscription.
© 2026 StateOfAI · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture