State of AI

Week 3, June 2023

Jun 19, 2023

∙ Paid

Greetings,

Welcome to the 11th edition of the State of AI. In this issue, we're taking a deep dive into the fascinating intersection of AI and the financial industry with FinGPT, examining the prominent usage of large language models in crowd worker text production, and discussing the breakthroughs in universal speech generation with Voicebox.

Further, we explore the progress in creating generalist agents for the web and probe into the capabilities of large language models in discerning causation from mere correlation. These topics offer a captivating exploration of the growing versatility and increasing complexities of AI applications.

Best regards,

State of AI

Get 7 day free trial

FinGPT: Open-Source Financial Large Language Models
Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Mind2Web: Towards a Generalist Agent for the Web
Can Large Language Models Infer Causation from Correlation?

FinGPT: Open-Source Financial Large Language Models

Authors: Hongyang Yang, Xiao-Yang Liu, Christina Dan Wang

Source & References: https://arxiv.org/abs/2306.06031v1

Introduction

In recent years, large language models (LLMs) have revolutionized natural language processing and proven to be promising tools across various domains. Finance is one area where LLMs have garnered significant interest—in particular, with the rise of proprietary models like BloombergGPT. However, this paper highlights the need for open-source financial LLMs (FinLLMs) and introduces FinGPT, an open-source large language model for the finance sector.

The Need for Open-Source FinLLMs

Proprietary models, such as BloombergGPT, have leveraged their exclusive access to specialized data to train finance-specific LLMs, which excel in handling financial tasks. On the other hand, they are not accessible and transparent, creating the need for an open-source alternative for the democratization of financial data. Open-source FinLLMs address issues of data accessibility and quality and foster collaborative innovation within the financial domain. By leveraging the power of the open-source AI4Finance community, FinGPT aims to unlock new opportunities in open finance.

FinGPT's Data-Centric Approach

Data plays a crucial role in FinLLMs development. FinGPT adopts a data-centric approach by prioritizing the collection, preparation, and processing of high-quality financial data. The authors outline the unique characteristics of financial data sources such as financial news, filings, social media discussions, and trends, and they address challenges such as high temporal sensitivity, high dynamism, and low signal-to-noise ratio in handling and preprocessing this data. By integrating and managing these diverse data types, FinGPT provides a comprehensive understanding of financial markets and facilitates effective financial decision-making.

Framework Overview

FinGPT encompasses four fundamental components: Data Source, Data Engineering, LLMs, and Applications. Each component addresses specific challenges associated with financial data and market conditions:

Data Source layer: Acquires financial data from a variety of online sources, including financial news websites, social media platforms, filings, trends, and academic datasets.
Data Engineering layer: Focuses on real-time NLP data processing to address high temporal sensitivity and low signal-to-noise ratio. This layer includes data cleaning, tokenization, stop word removal, stemming/lemmatization, feature extraction, sentiment analysis, and prompt engineering.
LLMs layer: Implements various fine-tuning methodologies, prioritizing lightweight adaptation to keep the model updated and relevant.
Application layer: Showcases practical applications of FinGPT, such as robo-advising, algorithmic trading, and low-code development.

Real-Time Data Engineering Pipeline for Financial NLP

Financial NLP requires real-time data processing. The paper outlines steps for setting up a real-time data ingestion system and performing data cleaning, tokenization, stop word removal, stemming/lemmatization, feature extraction, sentiment analysis, prompt engineering, and decision making/alerts. By doing so, FinGPT can better understand and adapt to individual preferences, ultimately paving the way for more personalized financial assistants.

Potential Applications and Collaborations

FinGPT, through its open-source framework, seeks to stimulate innovation and collaboration within the finance domain by providing accessible resources for developing and fine-tuning FinLLMs. The FinGPT project aims to support a wide range of use cases, including robo-advisory services, quantitative trading, and low-code development. The ultimate goal is to democratize FinLLMs and uncover untapped potential in open finance.

A Catalyst for Change

The vision for FinGPT is to serve as a catalyst for change within the financial landscape, driving research, innovation, and collaboration. By nurturing a robust collaboration ecosystem within the AI4Finance community, FinGPT has the potential to reshape our understanding and application of FinLLMs and help unlock new possibilities in the world of open finance. The authors encourage further contributions and collaboration to help move the FinGPT project forward and make it an essential tool for financial analysis, decision-making, and overall growth within the industry.

In conclusion

FinGPT's open-source framework provides a data-centric approach and a full-stack implementation for FinLLMs. Through collaborative efforts within the AI4Finance community, FinGPT aims to democratize financial data and FinLLMs, ultimately leading to increased innovation and development of open finance applications. With its real-time data engineering pipeline and dedication to addressing the challenges associated with financial data, FinGPT is a promising tool with the potential to unlock new opportunities in the financial industry.

Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks

Authors: Veniamin Veselovsky, Manoel Horta Ribeiro, Robert West

Source & References: https://arxiv.org/abs/2306.07899v1

Introduction

In a world where Large Language Models (LLMs) like GPT-3 and ChatGPT can generate high-quality text, researchers are starting to question the authenticity of human-generated data. This paper investigates the usage of LLMs by crowd workers on Amazon Mechanical Turk (MTurk), a platform often used to gather human annotations and surveys. If crowd workers are indeed using LLMs, it becomes essential to find a way to ensure human data remains human, as the quality of LLM-generated data can significantly differ from that of humans.

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.

State of AI

Week 3, June 2023

Contents

FinGPT: Open-Source Financial Large Language Models

Introduction

The Need for Open-Source FinLLMs

FinGPT's Data-Centric Approach

Framework Overview

Real-Time Data Engineering Pipeline for Financial NLP

Potential Applications and Collaborations

A Catalyst for Change

In conclusion

Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks

Introduction

Keep reading with a 7-day free trial