Scaling Long Videos, Unifying Multi-Modal AI, and Securing Large Language Models
Latest research summaries in ML, Robotics, CV, NLP and AI
Welcome to today's edition of State of AI 👋 And a warm welcome to our 65 new subscribers since last edition!
This edition covers a range of cutting-edge AI research, from techniques for enhancing legal dispute analysis and multi-modal generative models, to methods for scaling up long video reasoning and improving the security of large language models. We also see exciting developments in areas like biodiversity analysis and efficient deployment of neural networks on microcontrollers.
Here's what caught our attention:
An Integrated Framework of Prompt Engineering and Multidimensional Knowledge Graphs for Legal Dispute Analysis: This research proposes an enhanced framework that combines prompt engineering and a multi-layered knowledge graph architecture to boost the performance of large language models in legal reasoning tasks.
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling: The authors introduce a novel method that aligns the internal representations of video diffusion models with 3D geometric features, leading to more coherent and realistic long-term video generation. For diffusion models, the paper reviews the development of text-to-image and text-to-video generation, including the shift from pixel-based to latent-based approaches and the introduction of Transformer-based diffusion models.
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs): This paper presents a comprehensive taxonomy of potential attacks on large language models and provides a framework for conducting effective red-teaming exercises to improve the security and robustness of LLM-based systems.
Let's get into it 👇
Contents
Establishing Best Practices for Building Rigorous Agentic Benchmarks
Multi-modal Generative AI: Multi-modal LLMs, Diffusions and the Unification
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
Nexus: Taming Throughput-Latency Tradeoff in LLM Serving via Efficient GPU Sharing
UnIT: Scalable Unstructured Inference-Time Pruning for MAC-efficient Neural Inference on MCUs
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Automating Expert-Level Medical Reasoning Evaluation of Large Language Models
ROS Help Desk: GenAI Powered, User-Centric Framework for ROS Error Diagnosis and Debugging
VOTE: Vision-Language-Action Optimization with Trajectory Ensemble Voting
Meek Models Shall Inherit the Earth
Keep reading with a 7-day free trial
Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.