Contents
Non-autoregressive Generative Models for Reranking Recommendation
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Kilometer-Scale Convection Allowing Model Emulation using Generative Diffusion Modeling
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
Causal Reasoning and Large Language Models: Opening a New Frontier for Causality
Leveraging Chemistry Foundation Models to Facilitate Structure Focused Retrieval Augmented Generation in Multi-Agent Workflows for Catalyst and Materials Design
KOSMOS-2.5: A Multimodal Literate Model
NYU CTF Dataset: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security
FocusLLM: Scaling LLM's Context by Parallel Decoding
Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation
Natural Language Programming in Medicine: Administering Evidence Based Clinical Workflows with Autonomous Agents Powered by Generative Large Language Models
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Exploiting Student Parallelism for Low-latency GPU Inference of BERT-like Models in Online Services
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment
Non-autoregressive Generative Models for Reranking Recommendation
Authors: Yuxin Ren, Qiya Yang, Yichun Wu, Wei Xu, Yalong Wang, Zhiqiang Zhang
Source and references: https://arxiv.org/abs/2402.06871v4
Introduction
This paper proposes a non-autoregressive generative model called NAR4Rec for reranking recommendations in real-time recommendation systems.
Key Points
The paper addresses the challenges of using autoregressive models in real-time recommendation systems, including slow inference speed, training-inference discrepancy, and limited information utilization.
NAR4Rec, a non-autoregressive generative model, is introduced to generate all items in the recommendation list simultaneously, improving efficiency.
The authors address the challenges of sparse training data and dynamic candidates by introducing a matching model.
Unlikelihood training is used to differentiate feasible and unfeasible sequences, and contrastive decoding is proposed to capture correlations among target items.
Extensive offline experiments and online A/B tests demonstrate the superior performance of NAR4Rec compared to state-of-the-art reranking methods.
Methodology
The proposed NAR4Rec model consists of a candidates encoder and a position encoder. The candidates encoder uses a Transformer architecture to encode the candidate items, while the position encoder captures position-specific information. A matching mechanism is employed to match each candidate with each position in the target sequence, generating probabilities for each candidate at every position.
Results and Findings
Offline experiments show that NAR4Rec outperforms state-of-the-art reranking methods. Online A/B tests further validate the effectiveness of NAR4Rec, which has been fully deployed in the popular video app Kuaishou with over 300 million daily active users, notably improving the user experience.
Implications and Conclusions
The non-autoregressive approach of NAR4Rec significantly improves the efficiency of real-time recommendation systems by enabling simultaneous generation of all items in the recommendation list, addressing the challenges associated with autoregressive models.
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Authors: Chunting Zhou, Lili Yu, Arun Babu, Kushal Tirumala, Michihiro Yasunaga, Leonid Shamis, Jacob Kahn, Xuezhe Ma, Luke Zettlemoyer, Omer Levy
Source and references: https://arxiv.org/abs/2408.11039v1
Introduction
This paper introduces Transfusion, a method for training a single unified model to understand and generate both discrete (text) and continuous (image) data.
Keep reading with a 7-day free trial
Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.