Bi-Weekly AI Research Roundup

Latest research summaries in ML, Robotics, CV, NLP and AI

Nov 15, 2024

∙ Paid

Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data
GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation
Foundation Models for the Electric Power Grid
Derivational Morphology Reveals Analogical Generalization in Large Language Models
LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models
Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data
GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation
Towards Low-bit Communication for Tensor Parallel LLM Inference
Likelihood as a Performance Gauge for Retrieval-Augmented Generation
LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models
Polymetis: Large Language Modeling for Multiple Material Domains
A Single Transformer for Scalable Vision-Language Modeling
LLMStinger: Jailbreaking LLMs using RL fine-tuned LLMs
CamemBERT 2.0: A Smarter French Language Model Aged to Perfection
π0: A Vision-Language-Action Flow Model for General Robot Control

Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data

Authors: Juanhui Li, Sreyashi Nag, Hui Liu, Xianfeng Tang, Sheikh Sarwar, Limeng Cui, Hansu Gu, Suhang Wang, Qi He, Jiliang Tang

Source and references: https://arxiv.org/abs/2411.08028v1

Introduction

This paper proposes a method called LLKD that enables learning with less computational resources and less data for knowledge distillation from large language models (LLMs) using unlabeled data.

Key Points

LLKD leverages the extensive knowledge in LLMs to generate pseudo-labels for unlabeled data, which are then used to train a smaller, more efficient student model.
The method selects samples based on both the quality of the pseudo-labels, as indicated by the teacher model's confidence, and the informativeness of the samples for the student model, as measured by its uncertainty.
This adaptive data selection strategy promotes efficient knowledge transfer from the LLM to the student model, reducing the amount of training data required while enhancing performance.

Methodology

LLKD uses an LLM (specifically LLaMA) as the teacher model to generate pseudo-labels and confidence scores for the unlabeled data. The student model is a smaller pretrained language model, such as RoBERTa, which is trained on the pseudo-labeled data.

To select the most informative samples, LLKD employs two thresholds: one based on teacher confidence to assess pseudo-label quality, and the other based on student uncertainty to identify challenging samples. These thresholds are adaptively defined at each training step to account for the student model's evolving learning progress.

Results and Findings

The comprehensive experiments conducted on five datasets across various domains demonstrate that LLKD significantly outperforms several baseline methods, including thresholding techniques and other knowledge distillation approaches. LLKD achieves superior performance while using as little as 3.7% of the original training data on the PubMed-RCT-20k dataset, showcasing its high data efficiency.

Implications and Conclusions

The proposed LLKD method enables the efficient transfer of knowledge from LLMs to smaller, more practical models, addressing the limitations of deploying large, resource-intensive LLMs in real-world applications. By leveraging unlabeled data and adaptively selecting the most informative samples, LLKD effectively bridges the gap between the capabilities of LLMs and the constraints of deployment environments.

GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation

Authors: Yushi Lan, Shangchen Zhou, Zhaoyang Lyu, Fangzhou Hong, Shuai Yang, Bo Dai, Xingang Pan, Chen Change Loy

Source and references: https://arxiv.org/abs/2411.08033v1

Introduction

This paper introduces a novel 3D generation framework, dubbed "GAUSSIAN ANYTHING", that addresses key challenges in input formats, latent space design, and output representations for 3D content generation.

Key Points

Adopts multi-view posed RGB-D(epth)-N(ormal) renderings as input, encoding comprehensive 3D attributes in a flexible manner
Proposes a point-cloud structured latent space that preserves 3D shape information and enables geometry-texture disentanglement
Incorporates a cascaded latent diffusion model for improved shape-texture disentanglement
Supports multi-modal conditional 3D generation, including point cloud, caption, and single/multi-view image inputs
Newly proposed latent space naturally enables 3D-aware editing

Methodology

The paper's proposed framework employs a Variational Autoencoder (VAE) that encodes the multi-view posed RGB-D-N renderings into a point-cloud structured latent space, using a Scene Representation Transformer (SRT) encoder. This latent space is then used for cascaded latent diffusion modeling, where a point cloud diffusion model first generates the overall 3D layout, followed by a point-cloud feature diffusion model to output the corresponding features. The generated featured point cloud is then decoded into high-quality surfel Gaussians via a pre-trained VAE decoder.

Results and Findings

Experimental results demonstrate the effectiveness of the proposed approach on multiple datasets, outperforming existing methods in both text- and image-conditioned 3D generation. The point-cloud structured latent space naturally enables geometry-texture disentanglement and interactive 3D editing, as showcased in the paper.

Implications and Conclusions

The novel 3D generation framework proposed in this paper offers scalable, high-quality 3D generation with an interactive Point Cloud-structured Latent space, supporting multi-modal conditional 3D generation and enabling 3D-aware editing capabilities. These advancements have the potential to transform various applications in virtual reality, film, and gaming industries.

Foundation Models for the Electric Power Grid

Authors: Hendrik F. Hamann, Thomas Brunschwiler, Blazhe Gjorgiev, Leonardo S. A. Martins, Alban Puech, Anna Varbella, Jonas Weiss, Juan Bernabe-Moreno, Alexandre Blondin Massé, Seong Choi, Ian Foster, Bri-Mathias Hodge, Rishabh Jain, Kibaek Kim, Vincent Mai, François Mirallès, Martin De Montigny, Octavio Ramos-Leaños, Hussein Suprême, Le Xie, El-Nasser S. Youssef, Arnaud Zinflou, Alexander J. Belyi, Ricardo J. Bessa, Bishnu Prasad Bhattarai, Johannes Schmude, Stanislav Sobolevsky

Source and references: https://arxiv.org/abs/2407.09434v2

Introduction

This paper advocates for the development of Foundation Models (FMs) for the electric power grid, which the authors refer to as "GridFMs." FMs are advanced AI models developed through self-supervised learning that can generalize across various tasks after initial training on large datasets.

Get 14 day free trial

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.

Bi-Weekly AI Research Roundup

Latest research summaries in ML, Robotics, CV, NLP and AI

Contents

Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data

Introduction

Key Points

Methodology

Results and Findings

Implications and Conclusions

GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation

Introduction

Key Points

Methodology

Results and Findings

Implications and Conclusions

Foundation Models for the Electric Power Grid

Introduction

Keep reading with a 7-day free trial