State of AI

State of AI

Share this post

State of AI
State of AI
Bi-Weekly AI Research Roundup

Bi-Weekly AI Research Roundup

Latest research summaries in ML, Robotics, CV, NLP and AI

State of AI's avatar
State of AI
Jul 23, 2024
∙ Paid

Share this post

State of AI
State of AI
Bi-Weekly AI Research Roundup
Share

Contents

  1. Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning

  2. LLMmap: Fingerprinting For Large Language Models

  3. ACEGEN: Reinforcement learning of generative chemical agents for drug discovery

  4. Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity

  5. Wisdom of the Silicon Crowd: LLM Ensemble Prediction Capabilities Rival Human Crowd Accuracy

  6. MarkLLM: An Open-Source Toolkit for LLM Watermarking

  7. UltraEval: A Lightweight Platform for Flexible and Comprehensive Evaluation for LLMs

  8. Wisdom of the Silicon Crowd: LLM Ensemble Prediction Capabilities Rival Human Crowd Accuracy

  9. Attention Is All You Need But You Don't Need All Of It For Inference of Large Language Models

  10. Imposter.AI: Adversarial Attacks with Hidden Intentions towards Aligned Large Language Models

  11. Chain of Code: Reasoning with a Language Model-Augmented Code Emulator

  12. T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation

  13. SynthBA: Reliable Brain Age Estimation Across Multiple MRI Sequences and Resolutions

  14. The Future of Large Language Model Pre-training is Federated

  15. SparQ Attention: Bandwidth-Efficient LLM Inference

Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning

Authors: Chaojie Wang, Yanchen Deng, Zhiyi Lyu, Liang Zeng, Jujie He, Shuicheng Yan, Bo An

Source and references: https://arxiv.org/abs/2406.14283v4


Introduction

This paper explores a framework called Q* that aims to improve the multi-step reasoning capabilities of Large Language Models (LLMs) by guiding their decoding process through deliberative planning.

Key Points

  • Q* casts multi-step reasoning of LLMs as a heuristic search problem, allowing for more deliberative and logical thinking.

  • Q* learns a plug-and-play Q-value model as a heuristic function to estimate the expected future rewards, guiding LLMs to select the most promising next reasoning step.

  • Q* does not require fine-tuning the LLMs for the current task, avoiding significant computational overhead and potential performance degeneration on other tasks.

  • Extensive experiments on GSM8K, MATH, and MBPP datasets demonstrate the superiority of Q* in improving the reasoning performance of existing open-source LLMs.

Methodology

The authors approach the problem of multi-step reasoning in LLMs by casting it as a heuristic search problem. They introduce Q*, a framework that learns a Q-value model as a heuristic function to estimate the expected future rewards, effectively guiding the LLMs' decoding process without the need for fine-tuning the models.

Results and Findings

The results of the extensive experiments conducted on the GSM8K, MATH, and MBPP datasets show that Q* significantly improves the reasoning performance of existing open-source LLMs. The authors present quantitative data and comparisons to demonstrate the superiority of their approach.

Implications and Conclusions

This research presents an important step towards enhancing the multi-step reasoning capabilities of LLMs, which is crucial for their application in complex problem-solving tasks. By introducing a general and versatile framework like Q*, the authors aim to contribute to the ongoing efforts to improve the reasoning abilities of existing LLMs.


LLMmap: Fingerprinting For Large Language Models

Authors: Dario Pasquini, Evgenios M. Kornaropoulos, Giuseppe Ateniese

Source and references: https://arxiv.org/abs/2407.15847v1


Introduction

This paper introduces LLMmap, a first-generation fingerprinting attack targeted at applications integrating Large Language Models (LLMs). LLMmap employs an active fingerprinting approach, sending carefully crafted queries to the application and analyzing the responses to identify the specific LLM model in use.

Get 7 day free trial

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 StateOfAI
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share