State of AI

State of AI

Share this post

State of AI
State of AI
Bi-Weekly AI Research Roundup

Bi-Weekly AI Research Roundup

Latest research summaries in ML, Robotics, CV, NLP and AI

State of AI's avatar
State of AI
Oct 11, 2024
∙ Paid
3

Share this post

State of AI
State of AI
Bi-Weekly AI Research Roundup
1
Share

Contents

  1. Scalable and Accurate Graph Reasoning with LLM-based Multi-Agents

  2. TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens

  3. Stateful Large Language Model Serving with Pensieve

  4. PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs

  5. Creative Beam Search: LLM-as-a-Judge For Improving Response Generation

  6. ZS4C: Zero-Shot Synthesis of Compilable Code for Incomplete Code Snippets using LLMs

  7. MM-Ego: Towards Building Egocentric Multimodal LLMs

  8. LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management

  9. Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models

  10. Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making

  11. Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines

  12. DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation

  13. Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction

  14. Mapping the Unseen: Unified Promptable Panoptic Mapping with Dynamic Labeling using Foundation Models


Scalable and Accurate Graph Reasoning with LLM-based Multi-Agents

Authors: Yuwei Hu, Runlin Lei, Xinyi Huang, Zhewei Wei, Yongchao Liu

Source and references: https://arxiv.org/abs/2410.05130v1


Introduction

This research paper introduces GraphAgent-Reasoner, a new framework that leverages the power of multi-agent collaboration to enable large language models (LLMs) to perform scalable and accurate graph reasoning.

Key Points

  • GraphAgent-Reasoner is the first LLM-based multi-agent framework for graph reasoning that requires no fine-tuning and can utilize any LLM as the underlying reasoning model.

  • The framework achieves near-perfect accuracy on various polynomial-time graph reasoning tasks, significantly outperforming existing methods.

  • GraphAgent-Reasoner can handle graph reasoning tasks on graphs with over 1,000 nodes, demonstrating exceptional scalability compared to previous approaches.

  • The framework also showcases its potential for addressing complex real-world graph reasoning applications, such as webpage importance analysis.

Methodology

The GraphAgent-Reasoner framework follows a node-centric approach, where an agent is assigned to each node in the graph. The agents collaborate to solve the overall graph reasoning problem, significantly reducing the amount of information and complexity handled by a single LLM. This approach is inspired by distributed graph computation theory, where the graph problem is decomposed into smaller, node-centric tasks that are distributed among the agents for collaborative resolution.

Results and Findings

Evaluated on the GraphInstruct dataset, the GraphAgent-Reasoner framework demonstrates near-perfect accuracy on polynomial-time graph reasoning tasks, significantly outperforming the best available models, both closed-source and fine-tuned open-source variants. As the graph size increases, the framework maintains robust accuracy, unlike other methods that exhibit significant performance degradation.

Implications and Conclusions

The GraphAgent-Reasoner framework represents a significant advancement in the field of graph reasoning using LLMs. By leveraging the power of multi-agent collaboration, the framework addresses the limitations of single LLMs in handling complex graph structures and large-scale graphs, paving the way for LLMs to tackle real-world graph reasoning applications with high accuracy and scalability.


TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens

Authors: Ya-Qi Yu, Minghui Liao, Jiwen Zhang, Jihao Wu

Source and references: https://arxiv.org/abs/2410.05261v1


Introduction

The paper presents TextHawk2, a bilingual Large Vision-Language Model (LVLM) that excels in Optical Character Recognition (OCR) and grounding tasks while using 16 times fewer tokens than previous models.

Get 7 day free trial

Give a gift subscription

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 StateOfAI
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share