Bi-Weekly AI Research Roundup

Latest research summaries in ML, Robotics, CV, NLP and AI

State of AI

Sep 27, 2024

∙ Paid

VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models
Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents
Harnessing Diversity for Important Data Selection in Pretraining Large Language Models
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models
DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion
Attention Prompting on Image for Large Vision-Language Models
Differential Privacy Regularization: Protecting Training Data Through Loss Function Regularization
Accumulator-Aware Post-Training Quantization
Efficient Feature Interactions with Transformers: Improving User Spending Propensity Predictions in Gaming
FineZip : Pushing the Limits of Large Language Models for Practical Lossless Text Compression
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale
Using LLM for Real-Time Transcription and Summarization of Doctor-Patient Interactions into ePuskesmas in Indonesia
Evolutionary Greedy Algorithm for Optimal Sensor Placement Problem in Urban Sewage Surveillance
PokeFlex: Towards a Real-World Dataset of Deformable Objects for Robotic Manipulation
Semantically-Driven Disambiguation for Human-Robot Interaction

VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models

Authors: Yifei Liu, Jicheng Wen, Yang Wang, Shengyu Ye, Li Lyna Zhang, Ting Cao, Cheng Li, Mao Yang

Source and references: https://arxiv.org/abs/2409.17066v1

Introduction

This paper introduces Vector Post-Training Quantization (VPTQ), a novel technique for extremely low-bit quantization of large language models (LLMs) to address the challenges of model size, storage, and inference efficiency.

Key Points

The authors use Second-Order Optimization to formulate the LLM vector quantization (VQ) problem and guide the quantization algorithm design.
They propose a Channel-Independent Second-Order Optimization method for a more granular VQ.
The authors introduce a brief and effective codebook initialization algorithm by decomposing the optimization problem.
The paper extends VPTQ to support residual and outlier quantization, which enhances model accuracy and further compresses the model.
The experimental results show that VPTQ outperforms state-of-the-art techniques in terms of model quantization perplexity and accuracy on various LLM benchmarks.

Methodology

The authors use Second-Order Optimization to formulate the LLM VQ problem and guide the quantization algorithm design. They further refine the weights using Channel-Independent Second-Order Optimization for a more granular VQ. The paper also introduces a brief and effective codebook initialization algorithm by decomposing the optimization problem. Additionally, the authors extend VPTQ to support residual and outlier quantization to enhance model accuracy and further compress the model.

Results and Findings

The experimental results show that VPTQ reduces model quantization perplexity by 0.01-0.34 on LLaMA-2, 0.38-0.68 on Mistral-7B, and 4.41-7.34 on LLaMA-3 compared to the state-of-the-art at 2-bit. VPTQ also achieves an average accuracy improvement of 0.79-1.5% on LLaMA-2, 1% on Mistral-7B, and 11-22% on LLaMA-3 on QA tasks on average. Additionally, VPTQ only utilizes 10.4-18.6% of the quantization algorithm execution time, resulting in a 1.6-1.8× increase in inference throughput compared to the state-of-the-art.

Implications and Conclusions

The proposed VPTQ technique enables extremely low-bit quantization of LLMs, addressing the challenges of model size, storage, and inference efficiency. The significant improvements in model quantization perplexity, accuracy, and inference throughput demonstrate the effectiveness of VPTQ, making it a promising solution for the practical deployment of large language models.

Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents

Authors: Junting Lu, Zhiyang Zhang, Fangkai Yang, Jue Zhang, Lu Wang, Chao Du, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

Source and references: https://arxiv.org/abs/2409.17140v1

Introduction

This paper proposes AXIS, a framework that enables efficient human-agent-computer interaction (HACI) by leveraging API-first LLM-based agents to replace traditional UI-based agents.

Get 30 day free trial

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.

Bi-Weekly AI Research Roundup

Latest research summaries in ML, Robotics, CV, NLP and AI

Contents

VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models

Introduction

Key Points

Methodology

Results and Findings

Implications and Conclusions

Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents

Introduction

Keep reading with a 7-day free trial