State of AI

State of AI

Share this post

State of AI
State of AI
NN Diffusion, Genie, LongRoPE 2M Context, Mobile LLMs & OpenCodeInterpreter

NN Diffusion, Genie, LongRoPE 2M Context, Mobile LLMs & OpenCodeInterpreter

Week 4, February 2024

Feb 26, 2024
∙ Paid
2

Share this post

State of AI
State of AI
NN Diffusion, Genie, LongRoPE 2M Context, Mobile LLMs & OpenCodeInterpreter
2
Share

Greetings,

Welcome to the 47th edition of the State of AI! This time, we explore Neural Network Diffusion, meet Genie – generative interactive environments, learn about LongRoPE that extends context window beyond 2 million tokens, get MobileLLM optimized for on-device use cases, and dive into OpenCodeInterpreter integrating code generation with execution and refinement. Prepare for a captivating journey through cutting-edge AI advancements!

Best regards,

State of AI

Get 7 day free trial


Contents

  1. Neural Network Diffusion

  2. Genie: Generative Interactive Environments

  3. LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

  4. MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

  5. OpenCodeInterpreter: Integrating Code Generation with Execution and
    Refinement


Neural Network Diffusion: Generating High-Performing Parameters from Noise

Authors: Kai Wang, Zhaopan Xu, Yukun Zhou, Zelin Zang, Trevor Darrell, Zhuang Liu, Yang You

Source and references: https://arxiv.org/abs/2402.13144


Introduction to Diffusion Models

In recent years, diffusion models have gained significant attention in the realm of image and video generation. These models are derived from non-equilibrium thermodynamics and involve two opposing processes: injecting noise into images during the forward process and removing it through a denoising network during the reverse process. Recent advancements have led to diffusion models generating images even more realistic than typical generative adversarial networks (GANs).

However, the potential use of diffusion models in other domains remains largely unexplored. In this research paper, the authors demonstrate the effectiveness of applying diffusion models to generate high-performing neural network parameters.

Nerual Network Diffusion (p-diff) Framework

The proposed neural network diffusion (p-diff) framework shares similarities with stochastic gradient descent (SGD) learning in terms of transitioning from random initialization to specific distributions. The authors aim to generate high-performing parameters for a neural network by employing a standard latent diffusion model to synthesize a new set of parameters. The process first involves training an autoencoder to extract the latent representations of a subset of trained network parameters. Then, a latent diffusion model is trained to synthesize latent parameter representations from random noise. Finally, a decoder reconstructs network parameters from the synthesized latent representations.

The p-diff framework is simple, efficient, and showcases good generality across various datasets and network architectures. The generated models show competitive or improved performance compared to original models with minimal additional cost. This achievement encourages further exploration and versatile use of diffusion models in other domains.

Parameter Autoencoder and Generation

The proposed p-diff framework consists of two processes: parameter autoencoder and generation. Given a set of trained, high-performing models, a subset of their parameters is selected to train an autoencoder that can extract latent representations and reconstruct parameters. Next, a standard latent diffusion model is employed to synthesize the latent representations from random noise—you're left with high-performing synthesized model parameters.

The authors perform experiments on a wide range of datasets, architectures, and training baselines to evaluate the effectiveness of their approach. The results consistently demonstrate the ability of the p-diff to generate high-performing neural network parameters from random noise.

Exploring p-diff Performance

The first set of experiments investigate the performance of the p-diff framework across multiple datasets and architectures, comparing it to traditional model training methods. The results demonstrate that p-diff consistently achieves comparable, if not improved, model performance across datasets and architectures.

The authors then perform various ablation studies and analyses to understand the characteristics of the proposed framework. A noteworthy revelation from the experiments is that the p-diff framework demonstrates good generalization even when applied to entirely new model parameters, such as ConvNet-3 and MLP-3.

Key Takeaways

In summary, the authors of this paper successfully demonstrate that diffusion models can be leveraged to generate high-performing neural network parameters. Their proposed Neural Network Diffusion (p-diff) framework consists of two processes: parameter autoencoder and generation. The approach showcases efficiency and good generality across various datasets and network architectures.

The experiments conducted in the paper reveal the effectiveness of the p-diff framework, as it consistently achieves competitive or improved performance compared to standard training methods. Additionally, the p-diff generalizes even when applied to entirely new model parameters.

The results of this research encourage more exploration into the versatile use of diffusion models, expanding beyond traditional image and video generation tasks. It suggests that diffusion models could become an essential tool for generating high-performing neural networks in various domains, making them more accessible to a wide range of applications.


Genie: Generative Interactive Environments

Authors: Jake Bruce, Michael Dennis, Ashley Edwards, Jack Parker-Holder, Yuge (Jimmy) Shi, Edward Hughes, Matthew Lai, Aditi Mavalankar, Richie Steigerwald, Chris Apps, Yusuf Aytar, Sarah Bechtle, Feryal Behbahani, Stephanie Chan, Nicolas Heess, Lucy Gonzalez, Simon Osindero, Sherjil Ozair, Scott Reed, Jingwei Zhang, Konrad Zolna, Jeff Clune, Nando de Freitas, Satinder Singh, and Tim Rocktäschel

Source and references: https://arxiv.org/abs/2402.15391


Introduction

Welcome to the world of Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled internet videos! With 11 billion parameters, Genie can be considered a foundation world model. It has been built by an incredible team at Google DeepMind, and it's definitely worth exploring how Genie brings the power of AI to generate realistic, actionable, and immersive virtual environments.

State of AI is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 StateOfAI
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share