Mixture of Agents, SF-V, MobileAgent v2, GenAI Arena & 1.99bit quantization

Week 1, June 2024

Jun 11, 2024

∙ Paid

Greetings AI Enthusiasts,

Welcome to the latest edition of the State of AI. We have an exciting line-up for you this time around, brimming with cutting-edge advancements in the world of artificial intelligence. In this issue, we explore the Mixture-of-Agents framework, a revolutionary approach enhancing large language model capabilities by leveraging the collective expertise of multiple models. Following this, we delve into the single forward video generation model, SF-V, which promises to set new standards in video content creation.

Next, we present Mobile-Agent-v2, a next-gen mobile device operation assistant that enhances navigation through multi-agent collaboration. Our exploration continues with GenAI Arena, an open evaluation platform designed to benchmark generative models and push the boundaries of innovation. Finally, we investigate BitsFusion, a pioneering technique offering 1.99 bits weight quantization for diffusion models, optimizing performance like never before.

Each of these topics reflects the dynamic and transformative nature of AI research and its practical applications. We trust this edition will provide you with valuable insights and spark your curiosity.

Happy reading!

Best regards,

State of AI

Mixture-of-Agents Enhances Large Language Model Capabilities
SF-V: Single Forward Video Generation Model
Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
GenAI Arena: An Open Evaluation Platform for Generative Models
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model

Mixture-of-Agents Enhances Large Language Model Capabilities

Authors: Junlin Wang, Jue Wang, Ben Athiwaratkun, Ce Zhang, James Zou

Source and references: https://arxiv.org/abs/2406.04692

Revolutionizing Language Models: The New Dream Team

In the rapidly advancing world of artificial intelligence, large language models (LLMs) have become a key player, showcasing exceptional capabilities in understanding and generating human-like text. However, as impressive as individual models like GPT-4 and its contemporaries are, they are not without limitations. Enter the innovative methodology from Junlin Wang and his team: "Mixture-of-Agents" (MoA), a novel approach aimed at harnessing the collective expertise of multiple LLMs to push the boundaries of what these models can achieve.

The Big Question: Can LLMs Work Together?

The concept behind MoA stems from a curious observation: different LLMs possess unique strengths and weaknesses. Some models are better at following complex instructions, while others excel at tasks like code generation. This diversity prompts a significant question — can the collective expertise of multiple LLMs create a more capable and robust framework?

The team's research answers this with a resounding "Yes!" They have identified a phenomenon they term the "collaborativeness of LLMs." Put simply, LLMs tend to generate better responses when they can access outputs from other models, even those that may be less capable.

Building the Supermodel: The MoA Architecture Unveiled

The MoA architecture is structured in layers, with each layer comprising multiple LLM agents. These agents take all outputs from the previous layer's agents as auxiliary information to generate their responses. This layered approach ensures that the final output is a refined synthesis of several LLMs' contributions, significantly improving performance.

In practice, the MoA model has achieved state-of-the-art performance on benchmarks like AlpacaEval 2.0, MT-Bench, and FLASK, surpassing even GPT-4 Omni. For example, using only open-source LLMs, the MoA model scored 65.1% on AlpacaEval 2.0, significantly higher than GPT-4 Omni's 57.5%.

The Layers of Magic: How MoA Works

To truly appreciate the magic of MoA, let's delve into its structure. Imagine a four-layered cake, each layer representing a set of LLMs working in tandem to enhance the final outcome. In the first layer, agents independently generate responses to a given prompt. These responses are then passed to the second layer's agents, which refine and enhance the initial responses. This process continues through subsequent layers until a final, more robust response is achieved.

Criteria for Success: Performance and Diversity

The selection of LLMs for each MoA layer hinges on two primary criteria: performance metrics and diversity considerations. Performance metrics ensure that only high-performing models are chosen for subsequent layers, while diversity ensures that a range of different model outputs contribute to the final synthesis. This combination mitigates individual model deficiencies and enhances overall response quality through collaborative synthesis.

Real-world Applications: Testing the Limits

To validate their approach, the team conducted extensive evaluations using standard benchmarks. The results were nothing short of impressive:

On AlpacaEval 2.0, MoA outperformed the previous best model by a substantial margin, achieving a win rate of 65.8% compared to GPT-4 Omni's 57.5%.
On MT-Bench, MoA clinched the top spots, demonstrating that even with already optimized benchmarks, it can push the boundaries further.
On FLASK, MoA showcased improvements in robustness, correctness, efficiency, factuality, commonsense, insightfulness, and completeness, affirming its superior capabilities across various dimensions.

Beyond the Benchmarks: Understanding MoA's Inner Workings

So, what makes MoA tick? At its core, MoA leverages the "collaborativeness" phenomenon, where LLMs tend to generate better quality responses when referencing outputs from other models. This is akin to a group of experts brainstorming together, with each expert contributing unique insights to solve a problem.

Roles within MoA: Proposers and Aggregators

Within the MoA framework, LLMs can assume two distinct roles: proposers and aggregators. Proposers excel at generating diverse reference responses, while aggregators synthesize these into high-quality outputs. Interestingly, some models, like GPT-4o, demonstrate proficiency in both roles, while others, like WizardLM, may specialize in one.

To further enhance this collaborative potential, the MoA approach iterates the aggregation process using additional aggregators. This iterative synthesis refines and enriches the responses, leveraging the strengths of multiple models to produce superior outcomes.

A New Dawn in AI Collaboration: MoA's Broader Implications

The implications of the MoA methodology extend far beyond the current benchmarks. By effectively harnessing the collaborative potential of multiple LLMs, this approach paves the way for more advanced, adaptable, and resilient AI systems.

Computational Efficiency and Flexibility

One of the standout features of MoA is its reliance on the prompt interface. This means it doesn't require any fine-tuning or modifications to the internal activations or weights of the LLMs, eliminating computational overhead associated with fine-tuning and providing unmatched flexibility.

Moreover, MoA can be applied to the latest LLMs regardless of their size or architecture, making it a versatile and scalable solution for leveraging collective model expertise.

Conclusion: The Future Is Collaborative

In summary, the Mixture-of-Agents methodology is a game-changer in the realm of large language models. By leveraging the inherent collaborativeness of LLMs, it transcends the limitations of individual models, creating a more capable and robust framework.

The team's comprehensive evaluations underscore MoA's potential, achieving state-of-the-art performance across multiple benchmarks and proving its efficacy in various real-world applications. As AI continues to evolve, the collaborative approach embodied by MoA offers a promising pathway toward more advanced and intelligent systems, heralding a new era of innovation and possibilities.

For those intrigued by the potential of collaborative AI and eager to explore the technical intricacies, the full paper is available at the provided ArXiv link. Dive in, and join the conversation on the future of AI!

SF-V: Single Forward Video Generation Model

Authors: Zhixing Zhang, Yanyu Li, Yushu Wu, Yanwu Xu, Anil Kag, Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Dimitris Metaxas, Sergey Tulyakov, Jian Ren

Source and references: https://arxiv.org/abs/2406.04324

Introduction

In the booming world of artificial intelligence, video generation has become a pivotal area of research. The paper "SF-V: Single Forward Video Generation Model" marks a significant leap in this domain by presenting a method to efficiently generate high-quality, motion-consistent videos using a single forward pass. This innovation not only reduces the computational burden but also enhances the feasibility of real-time applications.

Get 7 day free trial

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.