State of AI

State of AI

Vectorizing Figures, Optimizing Workflows, and Enhancing Multilingual Watermarking in AI

State of AI's avatar
State of AI
Mar 28, 2026
∙ Paid

Welcome to today’s edition of State of AI 🚀

👋 And a warm welcome to our new subscribers since last edition!

This edition features a diverse range of AI research topics, from leveraging vision-language models to vectorize complex figures, to using large language models to optimize software development workflows, and enhancing the robustness of multilingual watermarking techniques. We’ll also explore breakthroughs in diffusion model scaling, surgical procedure understanding, and more.

Here’s what caught our attention:

  • VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models - A novel approach to converting raster figures into high-fidelity vector graphics using a coarse-to-fine vision-language model training strategy.

  • LLM-Powered Workflow Optimization for Multidisciplinary Software Development - A practical case study demonstrating how large language models can automate translation and coordination tasks to significantly accelerate software development workflows.

  • Is Multilingual LLM Watermarking Truly Multilingual? - An in-depth examination of the limitations of current multilingual watermarking techniques and the introduction of a robust, scalable defense called STEAM.

Let’s get into it 👇

Bi-Weekly AI Research Roundup

Latest research summaries in ML, Robotics, CV, NLP and AI

Contents

  1. Relationship-Aware Safety Unlearning for Multimodal LLMs

  2. LLM-Powered Workflow Optimization for Multidisciplinary Software Development: An Automotive Industry Case Study

  3. DomAgent: Leveraging Knowledge Graphs and Case-Based Reasoning for Domain-Specific Code Generation

  4. VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models

  5. Uni-DAD: Unified Distillation and Adaptation of Diffusion Models for Few-step Few-shot Image Generation

  6. CliPPER: Contextual Video-Language Pretraining on Long-form Intraoperative Surgical Procedures for Event Recognition

  7. Polynomial Speedup in Diffusion Models with the Multilevel Euler-Maruyama Method

  8. Algorithms with Calibrated Machine Learning Predictions

  9. Scaling Recurrence-aware Foundation Models for Clinical Records via Next-Visit Prediction

  10. MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination

  11. Is Multilingual LLM Watermarking Truly Multilingual? Scaling Robustness to 100+ Languages via Back-Translation

  12. Team of Thoughts: Efficient Test-time Scaling of Agentic Systems through Orchestrated Tool Calling

  13. Vibe Coding XR: Accelerating AI + XR Prototyping with XR Blocks and Gemini

  14. Linguistic Comparison of AI- and Human-Written Responses to Online Mental Health Queries

  15. Negotiating Digital Identities with AI Companions: Motivations, Strategies, and Emotional Outcomes

Relationship-Aware Safety Unlearning for Multimodal LLMs

Authors: Vishnu Narayanan Anilkumar, Abhijith Sreesylesh Babu, Trieu Hai Vo, Mohankrishna Kolla, Alexander Cuneo

Source and references: https://arxiv.org/abs/2603.14185v3


Introduction

This research paper proposes a framework for “relationship-aware safety unlearning” to mitigate unsafe relationships in multimodal large language models (LLMs) while preserving the model’s utility.

Key Points

  • Formalize a schema for unsafe relational tuples with context (actors, actions, objects, attributes, spatial/temporal cues).

  • Develop fine-tuning or parameter-editing procedures (e.g., contrastive unlearning, LoRA masking, causal trace edits) targeted at object-relation-object (O-R-O) tuples.

  • Introduce counterfactual preservation losses and safe exemplars to retain utility on allowed and safe context examples.

  • Evaluate resistance to prompt fuzzing, synonym swaps, and compositional adversaries.

  • Provide testable acceptance criteria and unit tests for safety regression.

Methodology

The method introduces a systematic framework with two components: 1) construction of a relational graph that explicitly represents the unsafe object-relation structures, and 2) a targeted parameter-editing procedure using low-rank adapters (LoRA) to selectively weaken the model’s representations of the unsafe relations while preserving the model’s performance on other concepts.

Results and Findings

Experiments on the CLIP model showed that the proposed “FULL OPTIMAL” method significantly reduced cosine similarity for unsafe relationships across paraphrase, contextual, and out-of-distribution image attacks, with ∆cos values of 0.6878, 0.4881, and 0.7012 respectively. Simultaneously, the absolute drift in cosine similarity for safe and neutral knowledge preservation cases remained minimal, ranging from 0.0115 to 0.0608. Ablation studies confirmed the importance of the balanced multi-objective loss function in achieving both accurate forgetting and essential knowledge retention.

Implications and Conclusions

The research establishes a novel framework for relation-aware unlearning in multimodal LLMs, opening up several future directions, such as extending the unlearning mechanism, improving robustness and adversarial mitigation, scaling to larger generative models, and formalizing auditable safety evaluation criteria.


LLM-Powered Workflow Optimization for Multidisciplinary Software Development: An Automotive Industry Case Study

User's avatar

Continue reading this post for free, courtesy of State of AI.

Or purchase a paid subscription.
© 2026 StateOfAI · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture