Contents
CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages
FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation
CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation
Teaching Transformers Causal Reasoning through Axiomatic Training
Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent
MetaUrban: A Simulation Platform for Embodied AI in Urban Spaces
Robotic Control via Embodied Chain-of-Thought Reasoning
ProtoSAM - One Shot Medical Image Segmentation With Foundational Models
DoRA: Weight-Decomposed Low-Rank Adaptation
Does CLIP Know My Face?
Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps
AnyTaskTune: Advanced Domain-Specific Solutions through Task-Fine-Tuning
Distilling System 2 into System 1
Not All Layers of LLMs Are Necessary During Inference
CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation
Authors: Xinying Guo, Mingyuan Zhang, Haozhe Xie, Chenyang Gu, Ziwei Liu
Source and references: https://arxiv.org/abs/2407.06188v1
Introduction
This paper presents CrowdMoGen, a zero-shot text-driven framework for generating realistic and flexible crowd motions that can be tailored to user-specific requirements.
Key Points
CrowdMoGen is a pioneering work that defines the task of Crowd Motion Generation as the creation of crowd motions in response to textual scene descriptions.
The framework consists of two key components: a Crowd Scene Planner that coordinates motions and dynamics according to scene contexts, and a Collective Motion Generator that synthesizes the required collective motions based on the holistic plans.
CrowdMoGen leverages the power of Large Language Models (LLMs) to incorporate collective intelligence as guidance, enabling generalizable planning and generation of crowd motions without paired training data.
The proposed approach addresses the challenges of limited crowd datasets and effectively modeling complex interactions among numerous individuals.
Methodology
CrowdMoGen is a two-stage framework that separates motion decision-making from motion generation. The Crowd Scene Planner uses an LLM to interpret and arrange crowd motions based on textual requirements, providing detailed semantic and spatial attributes for each individual. The Collective Motion Generator then generates individual motions that strictly adhere to the control signals from the planner.
Results and Findings
Extensive quantitative and qualitative experiments demonstrate the effectiveness of CrowdMoGen. It outperforms existing methods in terms of motion quality, text-motion consistency, and spatial control accuracy. The generated crowd motions exhibit high levels of realism and flexibility, addressing the critical gap in crowd motion generation research.
Implications and Conclusions
CrowdMoGen represents a significant advancement in the field of Crowd Motion Generation, providing a scalable and generalizable solution that can be tailored to diverse user requirements across various applications, such as entertainment, urban simulation, and planning.
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages
Authors: Yinquan Lu, Wenhao Zhu, Lei Li, Yu Qiao, Fei Yuan
Source and references: https://arxiv.org/abs/2407.05975v1
Introduction
This paper presents LLaMAX, a large language model (LLM) that has been continually pre-trained to enhance its translation capabilities across more than 100 languages.
Key Points
LLaMAX is the result of extensive multilingual continual pre-training on the LLaMA series models, enabling translation support across more than 100 languages.
The researchers conducted a comprehensive analysis of training strategies, such as vocabulary expansion and data augmentation, to develop LLaMAX.
LLaMAX achieves significantly higher translation performance compared to existing open-source LLMs and performs on-par with specialized translation models on the Flores-101 benchmark.
Enhancing translation capabilities also establishes LLaMAX as a robust multilingual foundation model, with improvements on various general task benchmarks.
Methodology
The researchers performed large-scale, multilingual continual pre-training on LLaMA series models, leveraging both parallel and monolingual data. They also explored techniques like vocabulary expansion and data augmentation to improve the model's translation capabilities.
Results and Findings
LLaMAX2, trained over 60 days using 24 A100 GPUs, significantly enhances translation capabilities across more than 100 languages, achieving an average improvement of more than 10 spBLEU points compared to baseline models in low-resource-centric translation. LLaMAX also matches the performance of the specialized translation model M2M-100-12B on the Flores-101 benchmark, without compromising its general task performance.
Implications and Conclusions
The research demonstrates the potential of continual pre-training techniques to extend the language support of LLMs, closing the gap between open-source LLM translators and specialized encoder-decoder translation systems. The publicly available LLaMAX models can serve as a robust multilingual foundation for various downstream applications.
FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation
Authors: Liqun Ma, Mingjie Sun, Zhiqiang Shen
Source and references: https://arxiv.org/abs/2407.07093v1
Introduction
This paper presents a FullyBInarized Large Language Model (FBI-LLM), which demonstrates for the first time how to train a large-scale binary language model from scratch to match the performance of its full-precision counterparts.
Keep reading with a 7-day free trial
Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.