State of AI

State of AI

Share this post

State of AI
State of AI
Bi-Weekly AI Research Roundup

Bi-Weekly AI Research Roundup

Latest research summaries in ML, Robotics, CV, NLP and AI

State of AI's avatar
State of AI
Sep 27, 2024
∙ Paid
1

Share this post

State of AI
State of AI
Bi-Weekly AI Research Roundup
2
Share

Contents

  1. VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models

  2. Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents

  3. Harnessing Diversity for Important Data Selection in Pretraining Large Language Models

  4. Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Multimodal Models

  5. DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion

  6. Attention Prompting on Image for Large Vision-Language Models

  7. Differential Privacy Regularization: Protecting Training Data Through Loss Function Regularization

  8. Accumulator-Aware Post-Training Quantization

  9. Efficient Feature Interactions with Transformers: Improving User Spending Propensity Predictions in Gaming

  10. FineZip : Pushing the Limits of Large Language Models for Practical Lossless Text Compression

  11. Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale

  12. Using LLM for Real-Time Transcription and Summarization of Doctor-Patient Interactions into ePuskesmas in Indonesia

  13. Evolutionary Greedy Algorithm for Optimal Sensor Placement Problem in Urban Sewage Surveillance

  14. PokeFlex: Towards a Real-World Dataset of Deformable Objects for Robotic Manipulation

  15. Semantically-Driven Disambiguation for Human-Robot Interaction


VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models

Authors: Yifei Liu, Jicheng Wen, Yang Wang, Shengyu Ye, Li Lyna Zhang, Ting Cao, Cheng Li, Mao Yang

Source and references: https://arxiv.org/abs/2409.17066v1


Introduction

This paper introduces Vector Post-Training Quantization (VPTQ), a novel technique for extremely low-bit quantization of large language models (LLMs) to address the challenges of model size, storage, and inference efficiency.

Key Points

  • The authors use Second-Order Optimization to formulate the LLM vector quantization (VQ) problem and guide the quantization algorithm design.

  • They propose a Channel-Independent Second-Order Optimization method for a more granular VQ.

  • The authors introduce a brief and effective codebook initialization algorithm by decomposing the optimization problem.

  • The paper extends VPTQ to support residual and outlier quantization, which enhances model accuracy and further compresses the model.

  • The experimental results show that VPTQ outperforms state-of-the-art techniques in terms of model quantization perplexity and accuracy on various LLM benchmarks.

Methodology

The authors use Second-Order Optimization to formulate the LLM VQ problem and guide the quantization algorithm design. They further refine the weights using Channel-Independent Second-Order Optimization for a more granular VQ. The paper also introduces a brief and effective codebook initialization algorithm by decomposing the optimization problem. Additionally, the authors extend VPTQ to support residual and outlier quantization to enhance model accuracy and further compress the model.

Results and Findings

The experimental results show that VPTQ reduces model quantization perplexity by 0.01-0.34 on LLaMA-2, 0.38-0.68 on Mistral-7B, and 4.41-7.34 on LLaMA-3 compared to the state-of-the-art at 2-bit. VPTQ also achieves an average accuracy improvement of 0.79-1.5% on LLaMA-2, 1% on Mistral-7B, and 11-22% on LLaMA-3 on QA tasks on average. Additionally, VPTQ only utilizes 10.4-18.6% of the quantization algorithm execution time, resulting in a 1.6-1.8× increase in inference throughput compared to the state-of-the-art.

Implications and Conclusions

The proposed VPTQ technique enables extremely low-bit quantization of LLMs, addressing the challenges of model size, storage, and inference efficiency. The significant improvements in model quantization perplexity, accuracy, and inference throughput demonstrate the effectiveness of VPTQ, making it a promising solution for the practical deployment of large language models.


Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents

Authors: Junting Lu, Zhiyang Zhang, Fangkai Yang, Jue Zhang, Lu Wang, Chao Du, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

Source and references: https://arxiv.org/abs/2409.17140v1


Introduction

This paper proposes AXIS, a framework that enables efficient human-agent-computer interaction (HACI) by leveraging API-first LLM-based agents to replace traditional UI-based agents.

Get 30 day free trial

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 StateOfAI
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share