Bi-Weekly AI Research Roundup

Latest research summaries in ML, Robotics, CV, NLP and AI

State of AI

Oct 25, 2024

∙ Paid

GeoCode-GPT: A Large Language Model for Geospatial Code Generation Tasks
Altogether: Image Captioning via Re-aligning Alt-text
Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing
SELA: Tree-Search Enhanced LLM Agents for Automated Machine Learning
CLEAR: Towards Contextual LLM-Empowered Privacy Policy Analysis and Risk Generation for Large Language Model Applications
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation
Prioritized Generative Replay
LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering
Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation Models
Scaling up Masked Diffusion Models on Text
Taming Data and Transformers for Audio Generation
Tuning-free coreset Markov chain Monte Carlo
PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches
SkillMimicGen: Automated Demonstration Generation for Efficient Skill Learning and Deployment

GeoCode-GPT: A Large Language Model for Geospatial Code Generation Tasks

Authors: Shuyang Hou, Zhangxiao Shen, Anqi Zhao, Jianyuan Liang, Zhipeng Gui, Xuefeng Guan, Rui Li, Huayi Wu

Source and references: https://arxiv.org/abs/2410.17031v1

Introduction

This paper presents GeoCode-GPT, the first large language model (LLM) focused on geospatial code generation tasks. The model aims to address the challenges faced by general-purpose LLMs in handling the complexities of geospatial code, such as specialized data formats, massive datasets, and unique platform-specific syntax and logic.

Key Points

Introduction of GeoCode-PT and GeoCode-SFT corpora, as well as the GeoCode-Eval evaluation dataset, to provide a systematic corpus foundation and evaluation tools for pretraining and fine-tuning LLMs in geospatial code generation tasks.
Proposal of a novel pretraining and fine-tuning strategy that combines QLoRA and LoRA to balance computational resources and training efficiency.
Establishment of a comprehensive evaluation framework for geospatial code, incorporating option matching, expert validation, and prompt engineering scoring for LLMs, providing insights for the holistic evaluation of domain-specific models.
Development and release of GeoCode-GPT-7B, the first LLM dedicated to geospatial code generation tasks.

Methodology

The researchers employed an autoregressive unsupervised approach to conduct lightweight pretraining on CodeLlama-7B using QLoRA, followed by supervised fine-tuning with instruction-tuning data via LoRA. This process led to the creation of GeoCode-GPT-7B.

Results and Findings

Experimental results show that GeoCode-GPT outperforms other models in multiple-choice accuracy by 9.1% to 32.1%, in code summarization ability by 1.7% to 25.4%, and in code generation capability by 1.2% to 25.1%. Despite having significantly fewer parameters than commercial models, GeoCode-GPT approaches their performance in certain metrics.

Implications and Conclusions

The development of GeoCode-GPT advances the application and development of LLMs in geospatial code generation, providing a solution and empirical validation for enhancing LLMs' performance in this domain. The research framework and findings offer valuable insights into unlocking the potential of LLMs in geospatial code generation tasks.

Altogether: Image Captioning via Re-aligning Alt-text

Authors: Hu Xu, Po-Yao Huang, Xiaoqing Ellen Tan, Ching-Feng Yeh, Jacob Kahn, Christine Jou, Gargi Ghosh, Omer Levy, Luke Zettlemoyer, Wen-tau Yih, Shang-Wen Li, Saining Xie, Christoph Feichtenhofer

Source and references: https://arxiv.org/abs/2410.17251v1

Introduction

This paper focuses on creating synthetic data to improve the quality of image captions. The key idea is to edit and re-align existing alt-text associated with the images, rather than generating captions from scratch, which often lack the nuanced information present in the alt-text.

Key Points

The paper presents a principled approach called Altogether to enhance caption quality by iteratively refining captions to better describe the visual content.
The approach involves human annotation where annotators start with the existing alt-text and re-align it to the image content in multiple rounds, constructing captions with rich visual concepts.
The paper introduces a parameter-efficient captioner that can generalize the process of re-aligning alt-texts at scale.
The synthetic captions generated by Altogether improve performance on various tasks, including text-to-image generation and zero-shot image classification.

Methodology

The authors leverage the insight that the creator who posts an image along with its associated alt-text is likely the most knowledgeable expert regarding the concrete visual concepts within that image. They use this idea to develop a two-pronged approach: (i) human annotation to create a fine-tuning dataset by iteratively refining alt-texts, and (ii) a parameter-efficient captioner that can re-caption billions of images by generalizing this process.

Results and Findings

The authors' evaluation shows that their re-aligned captions outperform alt-texts by 4% in CLIP score and outperform state-of-the-art captioners on a challenging test set. In text-to-image generation, using the synthetic captions improves the similarity between generated images and text prompts. For discriminative tasks, the synthetic captions lead to a 1.1% absolute accuracy improvement in zero-shot classification and a 3% gain on retrieval tasks.

Implications and Conclusions

The Altogether approach presents a principled and transparent way to enhance image caption quality, addressing the shortcomings of existing captioning models that often ignore the valuable information present in alt-texts. The authors' findings demonstrate the potential of their approach to improve various computer vision and multimodal tasks.

Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing

Authors: Kento Nishi, Maya Okawa, Rahul Ramesh, Mikail Khona, Ekdeep Singh Lubana, Hidenori Tanaka

Source and references: https://arxiv.org/abs/2410.17194v1

Get 7 day free trial

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.

Bi-Weekly AI Research Roundup

Latest research summaries in ML, Robotics, CV, NLP and AI

Contents

GeoCode-GPT: A Large Language Model for Geospatial Code Generation Tasks

Introduction

Key Points

Methodology

Results and Findings

Implications and Conclusions

Altogether: Image Captioning via Re-aligning Alt-text

Introduction

Key Points

Methodology

Results and Findings

Implications and Conclusions

Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing

Keep reading with a 7-day free trial