Vectorizing Figures, Optimizing Workflows, and Enhancing Multilingual Watermarking in AI

Mar 28, 2026

∙ Paid

Welcome to today’s edition of State of AI 🚀

👋 And a warm welcome to our new subscribers since last edition!

This edition features a diverse range of AI research topics, from leveraging vision-language models to vectorize complex figures, to using large language models to optimize software development workflows, and enhancing the robustness of multilingual watermarking techniques. We’ll also explore breakthroughs in diffusion model scaling, surgical procedure understanding, and more.

Here’s what caught our attention:

VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models - A novel approach to converting raster figures into high-fidelity vector graphics using a coarse-to-fine vision-language model training strategy.
LLM-Powered Workflow Optimization for Multidisciplinary Software Development - A practical case study demonstrating how large language models can automate translation and coordination tasks to significantly accelerate software development workflows.
Is Multilingual LLM Watermarking Truly Multilingual? - An in-depth examination of the limitations of current multilingual watermarking techniques and the introduction of a robust, scalable defense called STEAM.

Let’s get into it 👇

Bi-Weekly AI Research Roundup

Latest research summaries in ML, Robotics, CV, NLP and AI

Relationship-Aware Safety Unlearning for Multimodal LLMs

Authors: Vishnu Narayanan Anilkumar, Abhijith Sreesylesh Babu, Trieu Hai Vo, Mohankrishna Kolla, Alexander Cuneo

Source and references: https://arxiv.org/abs/2603.14185v3

Introduction

This research paper proposes a framework for “relationship-aware safety unlearning” to mitigate unsafe relationships in multimodal large language models (LLMs) while preserving the model’s utility.

Key Points

Formalize a schema for unsafe relational tuples with context (actors, actions, objects, attributes, spatial/temporal cues).
Develop fine-tuning or parameter-editing procedures (e.g., contrastive unlearning, LoRA masking, causal trace edits) targeted at object-relation-object (O-R-O) tuples.
Introduce counterfactual preservation losses and safe exemplars to retain utility on allowed and safe context examples.
Evaluate resistance to prompt fuzzing, synonym swaps, and compositional adversaries.
Provide testable acceptance criteria and unit tests for safety regression.

Methodology

The method introduces a systematic framework with two components: 1) construction of a relational graph that explicitly represents the unsafe object-relation structures, and 2) a targeted parameter-editing procedure using low-rank adapters (LoRA) to selectively weaken the model’s representations of the unsafe relations while preserving the model’s performance on other concepts.

Results and Findings

Experiments on the CLIP model showed that the proposed “FULL OPTIMAL” method significantly reduced cosine similarity for unsafe relationships across paraphrase, contextual, and out-of-distribution image attacks, with ∆cos values of 0.6878, 0.4881, and 0.7012 respectively. Simultaneously, the absolute drift in cosine similarity for safe and neutral knowledge preservation cases remained minimal, ranging from 0.0115 to 0.0608. Ablation studies confirmed the importance of the balanced multi-objective loss function in achieving both accurate forgetting and essential knowledge retention.

Implications and Conclusions

The research establishes a novel framework for relation-aware unlearning in multimodal LLMs, opening up several future directions, such as extending the unlearning mechanism, improving robustness and adversarial mitigation, scaling to larger generative models, and formalizing auditable safety evaluation criteria.

LLM-Powered Workflow Optimization for Multidisciplinary Software Development: An Automotive Industry Case Study

Continue reading this post for free, courtesy of State of AI.

Or purchase a paid subscription.