State of AI

State of AI

Share this post

State of AI
State of AI
State of AI: Bi-Weekly AI Research Roundup

State of AI: Bi-Weekly AI Research Roundup

Week 2, July 2024

State of AI's avatar
State of AI
Jul 12, 2024
∙ Paid
1

Share this post

State of AI
State of AI
State of AI: Bi-Weekly AI Research Roundup
1
Share

Contents

  1. CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation

  2. LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages

  3. FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation

  4. CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation

  5. Teaching Transformers Causal Reasoning through Axiomatic Training

  6. Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

  7. MetaUrban: A Simulation Platform for Embodied AI in Urban Spaces

  8. Robotic Control via Embodied Chain-of-Thought Reasoning

  9. ProtoSAM - One Shot Medical Image Segmentation With Foundational Models

  10. DoRA: Weight-Decomposed Low-Rank Adaptation

  11. Does CLIP Know My Face?

  12. Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps

  13. AnyTaskTune: Advanced Domain-Specific Solutions through Task-Fine-Tuning

  14. Distilling System 2 into System 1

  15. Not All Layers of LLMs Are Necessary During Inference

    Give a gift subscription


CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation

Authors: Xinying Guo, Mingyuan Zhang, Haozhe Xie, Chenyang Gu, Ziwei Liu

Source and references: https://arxiv.org/abs/2407.06188v1


Introduction

This paper presents CrowdMoGen, a zero-shot text-driven framework for generating realistic and flexible crowd motions that can be tailored to user-specific requirements.

Key Points

  • CrowdMoGen is a pioneering work that defines the task of Crowd Motion Generation as the creation of crowd motions in response to textual scene descriptions.

  • The framework consists of two key components: a Crowd Scene Planner that coordinates motions and dynamics according to scene contexts, and a Collective Motion Generator that synthesizes the required collective motions based on the holistic plans.

  • CrowdMoGen leverages the power of Large Language Models (LLMs) to incorporate collective intelligence as guidance, enabling generalizable planning and generation of crowd motions without paired training data.

  • The proposed approach addresses the challenges of limited crowd datasets and effectively modeling complex interactions among numerous individuals.

Methodology

CrowdMoGen is a two-stage framework that separates motion decision-making from motion generation. The Crowd Scene Planner uses an LLM to interpret and arrange crowd motions based on textual requirements, providing detailed semantic and spatial attributes for each individual. The Collective Motion Generator then generates individual motions that strictly adhere to the control signals from the planner.

Results and Findings

Extensive quantitative and qualitative experiments demonstrate the effectiveness of CrowdMoGen. It outperforms existing methods in terms of motion quality, text-motion consistency, and spatial control accuracy. The generated crowd motions exhibit high levels of realism and flexibility, addressing the critical gap in crowd motion generation research.

Implications and Conclusions

CrowdMoGen represents a significant advancement in the field of Crowd Motion Generation, providing a scalable and generalizable solution that can be tailored to diverse user requirements across various applications, such as entertainment, urban simulation, and planning.


LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages

Authors: Yinquan Lu, Wenhao Zhu, Lei Li, Yu Qiao, Fei Yuan

Source and references: https://arxiv.org/abs/2407.05975v1


Introduction

This paper presents LLaMAX, a large language model (LLM) that has been continually pre-trained to enhance its translation capabilities across more than 100 languages.

Key Points

  • LLaMAX is the result of extensive multilingual continual pre-training on the LLaMA series models, enabling translation support across more than 100 languages.

  • The researchers conducted a comprehensive analysis of training strategies, such as vocabulary expansion and data augmentation, to develop LLaMAX.

  • LLaMAX achieves significantly higher translation performance compared to existing open-source LLMs and performs on-par with specialized translation models on the Flores-101 benchmark.

  • Enhancing translation capabilities also establishes LLaMAX as a robust multilingual foundation model, with improvements on various general task benchmarks.

Methodology

The researchers performed large-scale, multilingual continual pre-training on LLaMA series models, leveraging both parallel and monolingual data. They also explored techniques like vocabulary expansion and data augmentation to improve the model's translation capabilities.

Results and Findings

LLaMAX2, trained over 60 days using 24 A100 GPUs, significantly enhances translation capabilities across more than 100 languages, achieving an average improvement of more than 10 spBLEU points compared to baseline models in low-resource-centric translation. LLaMAX also matches the performance of the specialized translation model M2M-100-12B on the Flores-101 benchmark, without compromising its general task performance.

Implications and Conclusions

The research demonstrates the potential of continual pre-training techniques to extend the language support of LLMs, closing the gap between open-source LLM translators and specialized encoder-decoder translation systems. The publicly available LLaMAX models can serve as a robust multilingual foundation for various downstream applications.


FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation

Authors: Liqun Ma, Mingjie Sun, Zhiqiang Shen

Source and references: https://arxiv.org/abs/2407.07093v1


Introduction

This paper presents a FullyBInarized Large Language Model (FBI-LLM), which demonstrates for the first time how to train a large-scale binary language model from scratch to match the performance of its full-precision counterparts.

Get 14 day free trial

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 StateOfAI
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share