State of AI

State of AI

AI Research Roundup: Long-Context LLMs, Multimodal Breakthroughs, and Optimizing AI at Scale

Latest research summaries in ML, Robotics, CV, NLP and AI

State of AI's avatar
State of AI
Feb 22, 2025
∙ Paid
8
2
Share

Contents

  1. Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems

  2. LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models

  3. Towards Efficient Optimizer Design for LLM via Structured Fisher Approximation with a Low-Rank Extension

  4. LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

  5. OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking

  6. Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation

  7. Humanoid-VLA: Towards Universal Humanoid Control with Visual Integration

  8. SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

  9. STAR: Scale-wise Text-conditioned AutoRegressive image generation

  10. MatterChat: A Multi-Modal LLM for Material Science

  11. Scaling Test-Time Compute Without Verification or RL is Suboptimal

  12. Magma: A Foundation Model for Multimodal AI Agents

  13. EAGER-LLM: Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic Integration

  14. Dynamic Low-Rank Sparse Adaptation for Large Language Models

  15. FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling

Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems

Authors: Ernest Davis, Scott Aaronson

Source and references: https://arxiv.org/abs/2308.05713v4


Introduction

This research paper reports on a test of the large language model GPT-4 with two plug-ins, Wolfram Alpha and Code Interpreter, on 105 original problems in science and math at the high school and college levels.

Key Points

  • GPT-4 with either the Wolfram Alpha or Code Interpreter plug-in is significantly stronger than GPT-4 alone on the tested problems.

Keep reading with a 7-day free trial

Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 StateOfAI
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture