State of AI

Week 4, July 2023

Jul 25, 2023

∙ Paid

Greetings,

Welcome to the 16th edition of the State of AI. This issue marks a significant milestone in our journey as we navigate the exciting waters of AI research and development. We'll explore the world of refined chat models with Llama 2, witness the creation of a real-world web agent capable of intricate planning and program synthesis, and venture into the promising territory of multimodal learning through the Meta-Transformer framework.

The second half of our journey examines the evolving behaviors of popular AI, ChatGPT, giving us a unique perspective on how such models develop over time. Finally, we arrive at the fascinating intersection of neuroscience and AI with Brain2Music, a groundbreaking effort to recreate music from human brain activity.

Each of these topics continues to push the boundaries of AI application, offering a captivating and intellectually stimulating read. Enjoy!

Best regards,

State of AI

Llama 2: Open Foundation and Fine-Tuned Chat Models
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis
Meta-Transformer: A Unified Framework for Multimodal Learning
How is ChatGPT's behavior changing over time?
Brain2Music: Reconstructing Music from Human Brain Activity

Llama 2: Open Foundation and Fine-Tuned Chat Models

Authors: Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang, Kuan Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, Thomas Scialom

Source & References: https://arxiv.org/abs/2307.09288

Introducing Llama 2

Llama 2 is a new collection of large language models (LLMs) developed by a team of researchers at GenAI, Meta. It consists of pretrained and fine-tuned models ranging from 7 billion to 70 billion parameters, with the fine-tuned LLMs called Llama 2-Chat. These models have demonstrated strong performance in various benchmarks and have been designed to be safer and more helpful, making them potentially suitable for replacing closed-source models in various applications.

Pretraining: The Foundations

The authors begin by pretraining the LlamA 2 models on a large dataset composed of a mix of publicly available data. They have made improvements compared to their previous Llama 1 models, performing more robust data cleaning, doubling the context length of the model, and incorporating grouped-query attention (GQA) to improve inference scalability.

Fine-tuning: Making the Models Chat-worthy

Once the models are pretrained, they undergo the process of fine-tuning for conversational use cases. Supervised fine-tuning (SFT) is first applied to the models, followed by reinforcement learning with human feedback (RLHF). This iterative process refines the models and ensures they align with human preferences while remaining helpful and safe.

Safety: Preventing Undesirable Outputs

The authors pay special attention to safety concerns with Llama 2-Chat. They apply safety measures during the pretraining stage and invest in safety-specific fine-tuning. Red teaming is also employed to assess potential risks. Additionally, they provide a responsible release strategy, encouraging developers to carry out safety testing and tuning tailored to their specific applications.

Model Performance: Llama 2-Chat vs. Other Models

When compared to existing open-source and closed-source models on various benchmarks, Llama 2-Chat models generally perform better. They also appear to be on par with some closed-source models in both helpfulness and safety. In addition to providing better performance, Llama 2-Chat models are safer, addressing the potential risks associated with irresponsible LLM deployment.

Learnings, Limitations, and Ethical Considerations

The authors highlight some key observations and insights gained during the development of Llama 2 and Llama 2-Chat, such as the emergence of tool usage and temporal organization of knowledge. They also acknowledge the limitations and ethical concerns of developing large language models. The authors emphasize the need to conduct safety testing as these models are only available in English and have not been tested for every possible use case.

Responsible Release Strategy

As part of their responsible release strategy, the authors have released Llama 2 and Llama 2-Chat models to the public for research and commercial use. However, they stress the importance of testing these models for safety before deployment and encourage developers to follow guidelines provided in a responsible use guide, with additional code examples available for further guidance.

Related Work and Future Directions

The authors discuss related work in the field of LLMs, including GPT-3, BLOOM, LLaMa-1, and Falcon, which have demonstrated advancements in recent years. They also acknowledge the continuing research into the responsible development of AI and the growing need for safety measures and fine-tuning techniques that align with human values.

Conclusion

In conclusion, Llama 2 and Llama 2-Chat models are a step forward in the open development of large language models. They showcase improvements in their helpfulness, safety, and overall performance when compared to existing models. By releasing these models and providing thorough documentation, the authors aim to enable the community to build upon their work and contribute to the responsible development and deployment of LLMs in the future.

In summary, Llama 2 is a new family of language models that boast improved performance and safety compared to their predecessors. The researchers have made their models publicly available for research and commercial applications while emphasizing the importance of safety testing when deploying these models. By sharing their insights, the authors hope that the AI community will continue to refine and responsibly develop large language models for a wide range of applications.

A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

Authors: Izzeddin Gur, Hiroki Furuta, Austin Huang, Mustafa Safdari, Yutaka Matsuo, Douglas Eck, and Aleksandra Faust

Source & References: https://arxiv.org/abs/2307.12856

Introduction

In this research paper, the authors introduce WebAgent, a powerful and efficient large language model-driven autonomous agent designed to complete web navigation tasks in real-world websites by following natural language instructions. Leveraging planning, context understanding, and program synthesis, the WebAgent significantly improves web navigation performance over existing approaches. The authors propose a new pre-trained language model, HTML-T5, specialized for long HTML documents and a novel WebAgent architecture combining Flan-U-PaLM for grounded code generation with HTML-T5 for planning and summarization.

Get 7 day free trial

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.

State of AI

Week 4, July 2023

Contents

Llama 2: Open Foundation and Fine-Tuned Chat Models

Introducing Llama 2

Pretraining: The Foundations

Fine-tuning: Making the Models Chat-worthy

Safety: Preventing Undesirable Outputs

Model Performance: Llama 2-Chat vs. Other Models

Learnings, Limitations, and Ethical Considerations

Responsible Release Strategy

Related Work and Future Directions

Conclusion

A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

Introduction

Keep reading with a 7-day free trial