Diffusion Models, LLM Benchmarks, and Visuo-Haptic Perception
Latest research summaries in ML, Robotics, CV, NLP and AI
Welcome to today's edition of State of AI 👋 And a warm welcome to our new subscribers since last edition!
This edition covers a range of exciting topics, from the fascinating insights into the underlying mechanisms behind diffusion models, to the introduction of comprehensive benchmarks for evaluating long-context large language models in complex software engineering tasks, and the advancement of visuo-haptic perception for robust robotic manipulation.
Here's what caught our attention:
Locality in Image Diffusion Models Emerges from Data Statistics: An insightful exploration of how the locality patterns in trained diffusion models arise from the statistics of the training data, rather than architectural inductive biases.
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering: A groundbreaking benchmark that provides a comprehensive evaluation framework for assessing long-context understanding capabilities in sophisticated software development scenarios.
V-HOP: Visuo-Haptic 6D Object Pose Tracking: A novel approach that fuses egocentric visual and haptic sensing to achieve accurate real-time in-hand object tracking, showcasing the advantages of combining visual and haptic perception for robust robotic manipulation.
Let's get into it 👇
Contents
Towards Adaptive Memory-Based Optimization for Enhanced Retrieval-Augmented Generation
KROMA: Ontology Matching with Knowledge Retrieval and Large Language Models
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Locality in Image Diffusion Models Emerges from Data Statistics
MM-Prompt: Cross-Modal Prompt Tuning for Continual Visual Question Answering
AU-Harness: An Open-Source Toolkit for Holistic Evaluation of Audio LLMs
Functional Groups are All you Need for Chemically Interpretable Molecular Property Prediction
ButterflyQuant: Ultra-low-bit LLM Quantization through Learnable Orthogonal Butterfly Transforms
CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models
LLMs for sensory-motor control: Combining in-context and iterative learning
Towards Adaptive Memory-Based Optimization for Enhanced Retrieval-Augmented Generation
Keep reading with a 7-day free trial
Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.