Kolmogorov-Arnold Networks, WildChat, LoRA Land, Octopus v4 & Prometheus 2

Week 1, May 2024

May 06, 2024

∙ Paid

Greetings,

Welcome to the latest edition of the State of AI. In this vibrant issue, we explore the cutting-edge developments at the forefront of artificial intelligence. Dive into the intricacies of Kolmogorov-Arnold Networks (KAN), gain insights from over a million ChatGPT interactions through WildChat, uncover the capabilities of 310 fine-tuned LLMs rivaling GPT-4 in LoRA Land, delve into the interconnected complexity of Octopus v4, and learn about Prometheus 2, an innovative open-source language model specialized in evaluating other language models.

Each article is crafted to give you a comprehensive look at the profound impacts and exciting advancements within the AI landscape. Prepare to be enlightened and inspired as we traverse these groundbreaking topics. Enjoy the journey through the state-of-the-art in AI!

Best regards,

State of AI

KAN: Kolmogorov-Arnold Networks
WildChat: 1M ChatGPT Interaction Logs in the Wild
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Octopus v4: Graph of language models
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

KAN: Kolmogorov–Arnold Networks

Authors: Ziming Liu, Yixuan Wang, Sachin Vaidya, Fabian Ruehle, James Halverson, Marin Soljačić, Thomas Y. Hou, Max Tegmark

Source and references: https://arxiv.org/abs/2404.19756

Introduction

In the ever-evolving field of machine learning, the quest for more efficient and insightful models is unending. The paper "KAN: Kolmogorov–Arnold Networks" by a diverse team of researchers from prestigious institutions, introduces a fascinating addition to this pursuit. The proposed Kolmogorov-Arnold Networks (KANs) offer a novel architecture that promises superior performance over Multi-Layer Perceptrons (MLPs) in various respects, including accuracy, parameter efficiency, and interpretability. This deep dive into the world of KANs will explore the theoretical underpinnings, architectural innovations, potential applications, and the broader impacts of this intriguing new approach.

Theoretical Inspirations and Basic Concept

At the heart of KANs lies the Kolmogorov-Arnold representation theorem. Originally conceptualized as a purely mathematical curiosity, its potential for enhancing neural network architectures has been underexplored. The theorem posits that any multivariate continuous function can be expressed as a composition of univariate functions and addition, presenting a fundamental approach that is both unique and simplistic.

KANs leverage this theory by replacing traditional weight parameters with univariate function parameters on network edges, and removing linear transformations completely. Each edge in a KAN carries a learnable spline function, individual to that connection, while nodes focus on summing these inputs, mirroring the theoretical foundations laid out by Kolmogorov and Arnold.

Architectural Innovations

One of the main architectural differences between KANs and conventional MLPs is the replacement of static, nonlinear activation functions and linear weight matrices with dynamic, learnable functions on every edge. This not only provides a high degree of flexibility in manipulating data across layers but also integrates feature learning directly into the network's fabric.

KANs can be neatly broken down into scalar functions and a summation operation, diverging from MLPs where differentiation between linearity and non-linearity is distinct. By focusing purely on the scalar transformations encoded by the spline functions, KANs offer a fundamentally different paradigm for data handling within neural networks.

Practical Applications and Empirical Validation

The versatility of KANs is demonstrated through their applicability in tasks like data fitting and Partial Differential Equation (PDE) solving, where they have shown remarkable promise. Empirically, KANs exhibit superior accuracy with fewer parameters compared to deeper and wider MLP models. For instance, in solving a sample PDE, a relatively simple KAN outperformed a complex MLP while using a fraction of the parameters.

This efficiency springs from the intrinsic nature of KANs that aligns closely with how real-world data often organizes - hierarchically and in a decomposable manner. By fitting directly into this natural structure, KANs manage to be both highly effective and interpretable.

Interpretability and Interaction

A significant advantage of KANs lies in their interpretability. The architecture allows individual components of the network to be examined and understood in isolation or in concert. This not only aids in debugging and model refinement but also makes KANs potential tools for educational purposes, where understanding the interaction between model components can provide deeper insights into both the model and the data it handles.

Furthermore, the interpretability extends to the collaboration with human operators, as KANs can be adjusted and tuned intuitively due to their reliance on well-understood mathematical operations. This could revolutionize areas of research and industry where machine learning models act as aids or augmentations to human decision-making processes.

Future Directions and Potential Impact

The introduction of KANs opens up numerous possibilities for both theoretical advancements and practical applications. Their ability to break down complex multidimensional functions into simpler, interpretable components could make them invaluable in fields where understanding the decision-making process of AI is crucial, such as in autonomous driving and medical diagnostics.

Moreover, as the approach is refined and possibly combined with other machine learning advancements, KANs could lead to more robust, efficient, and transparent models. The ability to model intricate systems with fewer parameters and enhanced interpretability could also lead to significant computational savings, reducing the carbon footprint associated with training large models.

Conclusion

KAN not only introduces a compelling new architecture but also paves the way for a rethinking of how we build and understand neural networks. By marrying a deep mathematical theory with practical machine learning applications, this research illuminates a promising path forward in the AI landscape, promising models that are not only powerful and efficient but also comprehensible and adaptable to human needs.

WILDCHAT: 1M ChatGPT Interaction Logs in the Wild

Authors: Wenting Zhao, Xiang Ren, Jack Hessel, Claire Cardie, Yejin Choi, Yuntian Deng

Source and references: https://arxiv.org/abs/2405.01470

Introduction

In an era where conversational AI like ChatGPT has become a household name, understanding how users interact with these systems in real-world scenarios is invaluable for continued advancements in this technology. A recent paper, "WILDCHAT: 1M ChatGPT Interaction Logs in the Wild," explores this by presenting a unique dataset comprising over 1 million anonymized interactions between users and ChatGPT. Dive into this comprehensive summary to uncover what insights this gargantuan dataset holds and how it could shape the future of conversational AI.

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.

Kolmogorov-Arnold Networks, WildChat, LoRA Land, Octopus v4 & Prometheus 2

Week 1, May 2024

Contents

KAN: Kolmogorov–Arnold Networks

Introduction

Theoretical Inspirations and Basic Concept

Architectural Innovations

Practical Applications and Empirical Validation

Interpretability and Interaction

Future Directions and Potential Impact

Conclusion

WILDCHAT: 1M ChatGPT Interaction Logs in the Wild

Introduction

Keep reading with a 7-day free trial