Contents
Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models
NT-LLM: A Novel Node Tokenizer for Integrating Graph Structure into Large Language Models
Converging Paradigms: The Synergy of Symbolic and Connectionist AI in LLM-Empowered Autonomous Agents
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
TALK-Act: Enhance Textural-Awareness for 2D Speaking Avatar Reenactment with Diffusion Model
Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention
Reducing the Barriers to Entry for Foundation Model Training
SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators
The Future of Large Language Model Pre-training is Federated
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning
AutoTurb: Using Large Language Models for Automatic Algebraic Model Discovery of Turbulence Closure
Neuro-Vision to Language: Enhancing Brain Recording-based Visual Reconstruction and Language Interaction
Mindalogue: LLM -- Powered Nonlinear Interaction for Effective Learning and Task Exploration
Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models
Authors: Yangzhen Wu, Zhiqing Sun, Shanda Li, Sean Welleck, Yiming Yang
Source and references: https://arxiv.org/abs/2408.00724v2
Introduction
This paper explores inference scaling laws and compute-optimal inference strategies for large language models (LLMs) on problem-solving tasks, with a focus on mathematical reasoning.
Key Points
The authors study the trade-offs between model size, inference strategies, and computational budget during inference.
They propose a novel tree search algorithm called Reward Balanced Search (REBASE) that is compute-optimal compared to existing methods like sampling and Monte Carlo Tree Search (MCTS).
Their findings indicate that smaller models can outperform larger ones under the same compute budget by increasing the number of samples during inference.
The authors provide theoretical analysis on the asymptotic behavior and convergence of voting-based inference strategies, highlighting the need for more sophisticated algorithms.
REBASE consistently outperforms other inference strategies across different model sizes and tasks, often achieving a Pareto-optimal trade-off between accuracy and compute.
Methodology
The authors explore various inference strategies, including greedy search, best-of-n, majority voting, weighted voting, and their tree-search variants. They formulate the compute-optimal inference problem and propose the REBASE algorithm, which uses a reward model to guide the tree search process efficiently.
Results and Findings
The authors find that smaller models (e.g., Llemma-7B) can outperform larger models (e.g., Llemma-34B) under the same computational budget by using more samples during inference. Their theoretical analysis shows that standard voting-based strategies have performance limits and diminishing returns as the number of samples increases.
The proposed REBASE algorithm consistently outperforms sampling-based methods and MCTS across all settings, models, and tasks. REBASE with a smaller language model often achieves a Pareto-optimal trade-off between accuracy and computational cost.
Implications and Conclusions
The results demonstrate the value of using smaller models with advanced inference-time algorithms to achieve better returns on inference-time compute. The authors' findings contribute to a broader understanding of inference scaling laws for LLMs and highlight the importance of developing new, compute-optimal inference algorithms.
NT-LLM: A Novel Node Tokenizer for Integrating Graph Structure into Large Language Models
Authors: Yanbiao Ji, Chang Liu, Xin Chen, Yue Ding, Dan Luo, Mei Li, Wenqing Lin, Hongtao Lu
Source and references: https://arxiv.org/abs/2410.10743v1
Introduction
This paper introduces NT-LLM, a novel framework that efficiently encodes graph structures for seamless integration with Large Language Models (LLMs). The core of the method is the strategic selection of key nodes, referred to as anchors, which serve as reference points for encoding the graph topology.
Keep reading with a 7-day free trial
Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.