State of AI

State of AI

Share this post

State of AI
State of AI
Bi-Weekly AI Research Roundup

Bi-Weekly AI Research Roundup

Latest research summaries in ML, Robotics, CV, NLP and AI

State of AI's avatar
State of AI
Nov 12, 2024
∙ Paid
1

Share this post

State of AI
State of AI
Bi-Weekly AI Research Roundup
2
Share

Contents

  1. Logic Query of Thoughts: Guiding Large Language Models to Answer Complex Logic Queries with Knowledge Graphs

  2. SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

  3. Gymnasium: A Standard Interface for Reinforcement Learning Environments

  4. Recycled Attention: Efficient inference for long-context language models

  5. End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering

  6. GenCode: A Generic Data Augmentation Framework for Boosting Deep Learning-Based Code Understanding

  7. CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence

  8. Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models

  9. Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags

  10. Score-based generative diffusion with "active" correlated noise sources

  11. TabM: Advancing Tabular Deep Learning with Parameter-Efficient Ensembling

  12. Stronger Random Baselines for In-Context Learning

  13. Continual Memorization of Factoids in Large Language Models

  14. BehaviorGPT: Smart Agent Simulation for Autonomous Driving with Next-Patch Prediction


Logic Query of Thoughts: Guiding Large Language Models to Answer Complex Logic Queries with Knowledge Graphs

Authors: Lihui Liu, Zihao Wang, Ruizhong Qiu, Yikun Ban, Eunice Chan, Yangqiu Song, Jingrui He, Hanghang Tong

Source and references: https://arxiv.org/abs/2404.04264v4


Introduction

This paper introduces a novel model called 'Logic-Query-of-Thoughts' (LGOT) that combines Large Language Models (LLMs) with knowledge graph reasoning to enhance the accuracy of answering complex logic queries.

Key Points

  • Demonstrates that LLMs are insufficient for answering complex logic queries that require multiple reasoning steps.

  • Introduces the LGOT algorithm, which decomposes complex logic queries into multiple subquestions and guides LLMs accordingly.

  • Evaluates LGOT on various real-world datasets, showing state-of-the-art performance with up to 20% improvement over ChatGPT.

Methodology

LGOT utilizes a combination of knowledge graph reasoning methods and LLMs to identify potentially correct answers for each subquestion. These results are merged to generate a comprehensive answer set, which then serves as input for subsequent subquestions. This iterative process continues until the final answers are obtained.

Results and Findings

The experimental results demonstrate that LGOT significantly outperforms existing methods, including ChatGPT, on various logic query datasets with incomplete knowledge graphs. For example, LGOT achieves up to 20% improvement over ChatGPT on the MetaQA dataset.

Implications and Conclusions

The proposed LGOT model provides an effective way to integrate knowledge graph reasoning with LLMs, mitigating the hallucination problem of LLMs and the incompleteness issue of knowledge graphs. This research has broader impacts in improving the reliability and accuracy of LLMs for complex reasoning tasks.


SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Authors: Muyang Li, Yujun Lin, Zhekai Zhang, Tianle Cai, Xiuyu Li, Junxian Guo, Enze Xie, Chenlin Meng, Jun-Yan Zhu, Song Han

Source and references: https://arxiv.org/abs/2411.05007v2


Introduction

This paper proposes SVDQuant, a novel 4-bit post-training quantization paradigm for diffusion models. The goal is to accelerate the inference of large-scale diffusion models, which are computationally intensive and memory-demanding, to enable their deployment on edge devices.

Key Points

  • Introduces a low-rank branch to absorb outliers in both the weights and activations, easing the quantization process.

  • Co-designs an inference engine called Nunchaku that fuses the low-rank and low-bit branch kernels to reduce memory usage and cut off redundant data movement overhead.

  • Supports both INT4 and FP4 quantization, and integrates seamlessly with pre-trained low-rank adapters (LoRA) without requiring re-quantization.

  • Achieves a 3.5× reduction in memory usage and 3.0× speedup over weight-only quantization on an NVIDIA RTX-4090 laptop.

Methodology

The core idea of SVDQuant is to introduce a low-rank branch to absorb the outliers on both the weights and activations. First, the outliers are migrated from the activations to the weights via smoothing. Then, Singular Value Decomposition (SVD) is applied to the updated weights, decomposing them into a low-rank branch and a residual. The low-rank branch operates at 16-bit precision, allowing the residual to be quantized to 4 bits.

Results and Findings

Extensive experiments on various text-to-image diffusion models, including FLUX.1, PixArt-Σ, and SDXL, demonstrate that SVDQuant can effectively preserve image quality while achieving significant memory savings and speedup. Compared to the original 16-bit models, the 4-bit INT and FP models produced by SVDQuant show comparable or even better visual quality. On the 12B FLUX.1 model, SVDQuant reduces the memory usage by 3.5× and delivers a 3.0× speedup over the weight-only quantized baseline on an NVIDIA RTX-4090 laptop.

Implications and Conclusions

The proposed SVDQuant and the Nunchaku inference engine enable the efficient deployment of large-scale diffusion models on edge devices, unlocking broader potential for interactive AI applications. The method's ability to seamlessly integrate with pre-trained LoRAs further enhances its practical applicability.


Gymnasium: A Standard Interface for Reinforcement Learning Environments

Authors: Mark Towers, Ariel Kwiatkowski, Jordan Terry, John U. Balis, Gianluca De Cola, Tristan Deleu, Manuel Goulão, Andreas Kallinteris, Markus Krimmel, Arjun KG, Rodrigo Perez-Vicente, Andrea Pierré, Sander Schulhoff, Jun Jet Tai, Hannah Tan, Omar G. Younis

Source and references: https://arxiv.org/abs/2407.17032v3


Introduction

This paper introduces Gymnasium, an open-source library that provides a standard API for reinforcement learning (RL) environments. Gymnasium aims to address the lack of standardization in RL environment and algorithm implementations, which has hindered progress in the field.

Get 7 day free trial

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 StateOfAI
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share