State of AI

State of AI

Share this post

State of AI
State of AI
Scaling Long Videos, Unifying Multi-Modal AI, and Securing Large Language Models

Scaling Long Videos, Unifying Multi-Modal AI, and Securing Large Language Models

Latest research summaries in ML, Robotics, CV, NLP and AI

State of AI's avatar
State of AI
Jul 13, 2025
∙ Paid
3

Share this post

State of AI
State of AI
Scaling Long Videos, Unifying Multi-Modal AI, and Securing Large Language Models
2
2
Share

Welcome to today's edition of State of AI 👋 And a warm welcome to our 65 new subscribers since last edition!

This edition covers a range of cutting-edge AI research, from techniques for enhancing legal dispute analysis and multi-modal generative models, to methods for scaling up long video reasoning and improving the security of large language models. We also see exciting developments in areas like biodiversity analysis and efficient deployment of neural networks on microcontrollers.

Here's what caught our attention:

  • An Integrated Framework of Prompt Engineering and Multidimensional Knowledge Graphs for Legal Dispute Analysis: This research proposes an enhanced framework that combines prompt engineering and a multi-layered knowledge graph architecture to boost the performance of large language models in legal reasoning tasks.

  • Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling: The authors introduce a novel method that aligns the internal representations of video diffusion models with 3D geometric features, leading to more coherent and realistic long-term video generation. For diffusion models, the paper reviews the development of text-to-image and text-to-video generation, including the shift from pixel-based to latent-based approaches and the introduction of Transformer-based diffusion models.

  • Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs): This paper presents a comprehensive taxonomy of potential attacks on large language models and provides a framework for conducting effective red-teaming exercises to improve the security and robustness of LLM-based systems.

Let's get into it 👇

Contents

  1. Meek Models Shall Inherit the Earth

  2. Establishing Best Practices for Building Rigorous Agentic Benchmarks

  3. An Integrated Framework of Prompt Engineering and Multidimensional Knowledge Graphs for Legal Dispute Analysis

  4. Multi-modal Generative AI: Multi-modal LLMs, Diffusions and the Unification

  5. Scaling RL to Long Videos

  6. Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling

  7. BarcodeBERT: Transformers for Biodiversity Analysis

  8. Nexus: Taming Throughput-Latency Tradeoff in LLM Serving via Efficient GPU Sharing

  9. UnIT: Scalable Unstructured Inference-Time Pruning for MAC-efficient Neural Inference on MCUs

  10. Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)

  11. Automating Expert-Level Medical Reasoning Evaluation of Large Language Models

  12. Performance and Practical Considerations of Large and Small Language Models in Clinical Decision Support in Rheumatology

  13. Can Large Language Models Improve Phishing Defense? A Large-Scale Controlled Experiment on Warning Dialogue Explanations

  14. ROS Help Desk: GenAI Powered, User-Centric Framework for ROS Error Diagnosis and Debugging

  15. VOTE: Vision-Language-Action Optimization with Trajectory Ensemble Voting

Meek Models Shall Inherit the Earth

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 StateOfAI
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share