Contents
Buckle Up: Robustifying LLMs at Every Customization Stage via Data Curation
Unraveling Cross-Modality Knowledge Conflict in Large Vision-Language Models
Geometric Representation Condition Improves Equivariant Molecule Generation
Enhance Reasoning by Learning from Mistakes: Peer-Review Knowledge Distillation from Multiple Large Language Models
GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs
Scalable and Accurate Graph Reasoning with LLM-based Multi-Agents
AlphaRouter: Quantum Circuit Routing with Reinforcement Learning and Tree Search
TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens
LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
Stateful Large Language Model Serving with Pensieve
LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management
PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs
TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles
Creative Beam Search: LLM-as-a-Judge For Improving Response Generation
15. When "A Helpful Assistant" Is Not Really Helpful: Personas in System Prompts Do Not Improve Performances of Large Language Models
Buckle Up: Robustifying LLMs at Every Customization Stage via Data Curation
Authors: Xiaoqun Liu, Jiacheng Liang, Luoxi Tang, Chenyu You, Muchao Ye, Zhaohan Xi
Source and references: https://arxiv.org/abs/2410.02220v2
Introduction
This paper proposes a data curation framework called CTRL (CusTomization through CuRated Data over LLMs) to mitigate jailbreaking attacks on large language models (LLMs) at every stage of the customization process.
Key Points
CTRL leverages data curation to revise commonsense texts and enhance their safety implication from the perspective of LLMs.
The curated texts can mitigate jailbreaking attacks before, during, or after the customization process.
CTRL does not introduce additional modules during LLM inference, preserving the original customization process.
Experimental results demonstrate a substantial reduction in jailbreaking effects, with up to a 100% success rate in generating responsible responses.
Methodology
CTRL employs output sampling techniques, such as temperature and nucleus sampling, to curate commonsense texts. It uses a beam search process to filter and retain the most promising curated texts based on perplexity (to increase unfamiliarity) and helpfulness scores (to maintain informative value). The curated texts are then used to fine-tune LLMs at various stages of the customization workflow.
Results and Findings
The all-stage defense approach using CTRL consistently outperforms other baselines, achieving over 96% safety rate and, in some cases, fully preventing jailbreaking (100% safety rate) across different LLMs (Llama-3-8B, Llama-2-13B, Vicuna-13B, and Mistral-7B). The post-attack defense is found to be the most effective among the single-stage defenses, highlighting the importance of the most recent customization in influencing LLM behavior.
Implications and Conclusions
This work represents a significant advancement in mitigating jailbreaking risks and ensuring the secure customization of LLMs. The data-driven and all-stage-oriented nature of CTRL provides a cost-efficient and scalable solution that can be seamlessly integrated into the standard customization pipeline without additional modules.
Unraveling Cross-Modality Knowledge Conflict in Large Vision-Language Models
Authors: Tinghui Zhu, Qin Liu, Fei Wang, Zhengzhong Tu, Muhao Chen
Source and references: https://arxiv.org/abs/2410.03659v1
Introduction
This paper focuses on the problem of cross-modality parametric knowledge conflicts in large vision-language models (LVLMs).
Keep reading with a 7-day free trial
Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.