AI Research Roundup: Long-Context LLMs, Multimodal Breakthroughs, and Optimizing AI at Scale

Latest research summaries in ML, Robotics, CV, NLP and AI

State of AI

Feb 22, 2025

∙ Paid

Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems
LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models
Towards Efficient Optimizer Design for LLM via Structured Fisher Approximation with a Low-Rank Extension
LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation
Humanoid-VLA: Towards Universal Humanoid Control with Visual Integration
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
STAR: Scale-wise Text-conditioned AutoRegressive image generation
MatterChat: A Multi-Modal LLM for Material Science
Scaling Test-Time Compute Without Verification or RL is Suboptimal
Magma: A Foundation Model for Multimodal AI Agents
EAGER-LLM: Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic Integration
Dynamic Low-Rank Sparse Adaptation for Large Language Models
FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling

Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems

Authors: Ernest Davis, Scott Aaronson

Source and references: https://arxiv.org/abs/2308.05713v4

Introduction

This research paper reports on a test of the large language model GPT-4 with two plug-ins, Wolfram Alpha and Code Interpreter, on 105 original problems in science and math at the high school and college levels.

Key Points

GPT-4 with either the Wolfram Alpha or Code Interpreter plug-in is significantly stronger than GPT-4 alone on the tested problems.

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.