Vectorizing Figures, Optimizing Workflows, and Enhancing Multilingual Watermarking in AI
Welcome to today’s edition of State of AI 🚀
👋 And a warm welcome to our new subscribers since last edition!
This edition features a diverse range of AI research topics, from leveraging vision-language models to vectorize complex figures, to using large language models to optimize software development workflows, and enhancing the robustness of multilingual watermarking techniques. We’ll also explore breakthroughs in diffusion model scaling, surgical procedure understanding, and more.
Here’s what caught our attention:
VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models - A novel approach to converting raster figures into high-fidelity vector graphics using a coarse-to-fine vision-language model training strategy.
LLM-Powered Workflow Optimization for Multidisciplinary Software Development - A practical case study demonstrating how large language models can automate translation and coordination tasks to significantly accelerate software development workflows.
Is Multilingual LLM Watermarking Truly Multilingual? - An in-depth examination of the limitations of current multilingual watermarking techniques and the introduction of a robust, scalable defense called STEAM.
Let’s get into it 👇
Bi-Weekly AI Research Roundup
Latest research summaries in ML, Robotics, CV, NLP and AI
Contents
DomAgent: Leveraging Knowledge Graphs and Case-Based Reasoning for Domain-Specific Code Generation
VFIG: Vectorizing Complex Figures in SVG with Vision-Language Models
Polynomial Speedup in Diffusion Models with the Multilevel Euler-Maruyama Method
Scaling Recurrence-aware Foundation Models for Clinical Records via Next-Visit Prediction
MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination
Team of Thoughts: Efficient Test-time Scaling of Agentic Systems through Orchestrated Tool Calling
Vibe Coding XR: Accelerating AI + XR Prototyping with XR Blocks and Gemini
Linguistic Comparison of AI- and Human-Written Responses to Online Mental Health Queries
Negotiating Digital Identities with AI Companions: Motivations, Strategies, and Emotional Outcomes
Relationship-Aware Safety Unlearning for Multimodal LLMs
Authors: Vishnu Narayanan Anilkumar, Abhijith Sreesylesh Babu, Trieu Hai Vo, Mohankrishna Kolla, Alexander Cuneo
Source and references: https://arxiv.org/abs/2603.14185v3
Introduction
This research paper proposes a framework for “relationship-aware safety unlearning” to mitigate unsafe relationships in multimodal large language models (LLMs) while preserving the model’s utility.
Key Points
Formalize a schema for unsafe relational tuples with context (actors, actions, objects, attributes, spatial/temporal cues).
Develop fine-tuning or parameter-editing procedures (e.g., contrastive unlearning, LoRA masking, causal trace edits) targeted at object-relation-object (O-R-O) tuples.
Introduce counterfactual preservation losses and safe exemplars to retain utility on allowed and safe context examples.
Evaluate resistance to prompt fuzzing, synonym swaps, and compositional adversaries.
Provide testable acceptance criteria and unit tests for safety regression.
Methodology
The method introduces a systematic framework with two components: 1) construction of a relational graph that explicitly represents the unsafe object-relation structures, and 2) a targeted parameter-editing procedure using low-rank adapters (LoRA) to selectively weaken the model’s representations of the unsafe relations while preserving the model’s performance on other concepts.
Results and Findings
Experiments on the CLIP model showed that the proposed “FULL OPTIMAL” method significantly reduced cosine similarity for unsafe relationships across paraphrase, contextual, and out-of-distribution image attacks, with ∆cos values of 0.6878, 0.4881, and 0.7012 respectively. Simultaneously, the absolute drift in cosine similarity for safe and neutral knowledge preservation cases remained minimal, ranging from 0.0115 to 0.0608. Ablation studies confirmed the importance of the balanced multi-objective loss function in achieving both accurate forgetting and essential knowledge retention.
Implications and Conclusions
The research establishes a novel framework for relation-aware unlearning in multimodal LLMs, opening up several future directions, such as extending the unlearning mechanism, improving robustness and adversarial mitigation, scaling to larger generative models, and formalizing auditable safety evaluation criteria.



