Chain-of-Note, JARVIS-1, GPT-4 & Science, MedAgents and more 🚀

Week 3, November 2023

Nov 20, 2023

∙ Paid

Greetings,

We are thrilled to present the 33rd edition of the State of AI, a milestone edition that continues to explore the cutting-edge frontiers of artificial intelligence. In this issue, we dive into the latest advancements and innovations shaping the AI landscape.

Discover how "Chain-of-Note" is enhancing the robustness of retrieval-augmented language models, and delve into the world of "JARVIS-1," where open-world multi-task agents are revolutionized with memory-augmented multimodal language models. Learn about the profound impact large language models, like GPT-4, are having on scientific discovery. Explore the intricacies of filtering context for retrieval-augmented generation, and finally, understand how "MedAgents" are redefining medical reasoning as large language models become invaluable collaborators in zero-shot scenarios.

Each article in this issue provides a unique perspective on how AI continues to evolve and reshape our understanding of various fields. We are confident that this edition will offer you deep insights and provoke thoughtful discussions on the future of AI.

Enjoy this journey through the latest in AI innovation!

Best regards,

State of AI

Get 7 day free trial

Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models
The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4
Learning to Filter Context for Retrieval-Augmented Generation
MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning

CHAIN-OF-NOTE: Enhancing Robustness in Retrieval-Augmented Language Models

Authors: Wenhao Yu, Hongming Zhang, Xiaoman Pan, Kaixin Ma, Hongwei Wang, Dong Yu

Source & References: https://arxiv.org/abs/2311.09210

Introduction

Retrieval-augmented language models (RALMs) have been a game-changer in the world of large language models. They excel at reducing factual hallucinations and injecting up-to-date or domain-specific knowledge into the models by leveraging external knowledge sources. However, the current RALM framework has its flaws, particularly when it comes to the reliability of knowledge retrieval and the model's ability to recognize its own knowledge limitations.

In their paper, the authors propose the CHAIN-OF-NOTE (CON) framework to address these challenges. The main aim is to improve RALMs' robustness when handling noisy, irrelevant documents and dealing with unknown scenarios that fall outside their pre-training knowledge.

How Does CON Work?

The core idea behind CON is to generate sequential reading notes for the retrieved documents to evaluate their relevance to a given question before formulating a final response. This method ensures that irrelevant or less trustworthy content is filtered out, leading to more accurate and contextually relevant answers.

The authors train the CON framework on an LLaMa-2 7B model using training data created via ChatGPT. Their evaluation focuses on three aspects:

Overall QA performance using DPR-retrieved documents
Noise robustness, assessed by introducing noisy information into the system
Unknown robustness, evaluated through queries not covered in the LLaMa-2 pre-training data (e.g., real-time questions)

Evaluation and Comparison with Standard RALMs

They conduct experiments on four open-domain QA benchmarks, namely Natural Questions (NQ), TriviaQA, WebQ, and RealTimeQA.

The results indicate that CON not only improves overall QA performance but also significantly enhances robustness in both noise and unknown aspects, achieving an average improvement of +7.9 in exact match score with noisy retrieved documents, and a +10.5 increase in rejection rate for real-time questions that fall outside the pre-training knowledge scope.

Application of Chain-of-X Approaches in Large Language Models

Chain-of-X approaches have already shown promise in enhancing the performance of large language models (LLMs) across various domains, thanks to their ability to break down complex problems into a series of intermediate steps called Chain-of-Thought (CoT) prompting.

This ability to decompose problems into smaller components makes the reasoning process more transparent and allows LLMs to tackle each segment with focused attention.

The authors seek to apply similar chain-of-X methodologies to RALMs to enhance their robustness and reliability.

The Limitations of Existing RALMs

Current RALMs suffer from a range of limitations, including:

Risk of surface-level processing: When generating an answer, language models might rely on shallow information without deep comprehension, overlooking nuances in questions or documents, particularly in complex or indirect queries.
Difficulty in handling contradictory information: The model may struggle to resolve these contradictions or determine which piece of information is more credible or relevant.
Reduced transparency and interpretability: Direct answer generation offers limited insight into how the model arrives at its conclusion, making it harder for users to understand the basis of the response.
Overdependence on retrieved documents: Direct generation can lead to an overreliance on content from retrieved documents, ignoring the model's inherent knowledge base.

Addressing RALM Limitations with CHAIN-OF-NOTE

The proposed CON framework aims to tackle the limitations of RALMs by systematically evaluating the relevance and accuracy of information in retrieved documents through the creation of sequential reading notes. By doing so, it identifies the most reliable information, resolves conflicting information, and effectively filters out irrelevant or less trustworthy content.

CON also ensures that the model has the capacity to acknowledge its limitations, providing the appropriate response (e.g., "unknown") when it lacks the necessary knowledge or relevant information in retrieved documents.

Conclusion

In a nutshell, the CHAIN-OF-NOTE framework is an exciting development in the realm of retrieval-augmented language models, as it addresses issues of robustness and reliability. CON's ability to generate sequential reading notes, combined with its capacity to recognize and filter out irrelevant or less trustworthy content, makes it a valuable tool in enhancing the overall performance of RALMs.

The paper demonstrates the positive impact of applying CON to RALMs, making it a promising solution for those looking to improve the accuracy and context appropriateness of their language models. By leveraging the strengths of the CON framework, users can expect to see more reliable and robust language models in future applications.

JARVIS-1: Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models

Authors: Zihao Wang, Shaofei Cai, Anji Liu, Yonggang Jin, Jinbing Hou, Bowei Zhang, Haowei Lin, Zhaofeng He, Zilong Zheng, Yaodong Yang, Xiaojian Ma, Yitao Liang

Source & References: https://arxiv.org/abs/2311.05997

Introduction

The research paper "JARVIS-1: Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models" introduces a new agent capable of robust planning and control in the popular open-world environment, Minecraft. The authors develop JARVIS-1 to perceive multimodal input (visual observations and human instructions), generate sophisticated plans, and perform embodied control. This agent is built on top of pre-trained multimodal language models that map visual observations and textual instructions to plans executed by the goal-conditioned controllers.

Get 7 day free trial

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.