Greetings,
Welcome to the 29th edition of the State of AI. This issue uncovers new horizons in AI, featuring topics such as human-level reward design via coding large language models in "Eureka," and the pioneering 4K4D technology that synthesizes real-time 4D views at unparalleled 4K resolution. Discover "Self-RAG," a fascinating model that learns to retrieve, generate, and critique through self-reflection, and marvel at the efficiency of BitNet, which scales 1-bit transformers for large language models. Additionally, learn about "AgentTuning," a new paradigm that generalizes agent abilities, and immerse yourself in "3D-GPT," a revolutionary step in procedural 3D modeling.
Each topic invites you to delve deep into the unfolding advancements of AI, broadening your understanding and igniting your imagination. We hope you find this edition as captivating and enlightening as the breakthroughs it discusses. Enjoy!
Best regards,
Contents
Eureka: Human-Level Reward Design via Coding Large Language Models
4K4D: Real-Time 4D View Synthesis at 4K Resolution
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
BitNet: Scaling 1-bit Transformers for Large Language Models
AgentTuning: Enabling Generalized Agent Abilities for LLMs
3D-GPT: Procedural 3D Modeling with Large Language Models
EUREKA: Human-Level Reward Design via Coding Large Language Models
Authors: Yecheng Jason Ma, William Liang, Guanzhi Wang, De-An Huang, Osbert Bastani, Dinesh Jayaraman, Yuke Zhu, Linxi “Jim” Fan, Anima Anandkumar
Source & References: https://arxiv.org/abs/2310.12931v1
Introduction
Large Language Models (LLMs) like GPT-4 have shown impressive capabilities in various tasks, ranging from high-level semantic planning to reinforcement learning. However, applying them to complex low-level manipulation tasks, such as dexterous pen spinning, remains an open challenge. This paper presents EUREKA, a human-level reward design algorithm powered by LLMs. The authors demonstrate that EUREKA can generate reward functions autonomously, surpassing expert human-engineered rewards on a diverse range of 29 open-source RL environments using 10 distinct robot morphologies.
Algorithm Breakdown
Eureka’s design consists of three main components:
Environment as context: The LLM directly takes raw environment source code as context and generates executable rewards based on it.
Evolutionary search: EUREKA iteratively samples multiple outputs from the LLM, selecting the best performing rewards and building upon them to generate improved reward candidates.
Reward reflection: Detailed accounts of policy training dynamics are provided in text, giving fine-grained information on the effectiveness of rewards and enabling targeted reward editing.
This approach allows EUREKA to zero-shot generate reward functions and iteratively improve them using evolutionary search based on detailed feedback of policy training dynamics.
Experiment Results
EUREKA was evaluated on a diverse suite of robot embodiments and tasks, demonstrating its ability to generate rewards, solve new tasks, and integrate various forms of human input. The experimental results show that EUREKA outperforms both human-engineered rewards and L2R (a two-stage LLM-prompting reward generation system) on 83% of tasks. By combining EUREKA with curriculum learning, the authors also achieve rapid pen spinning maneuvers on a simulated anthropomorphic Shadow Hand for the first time.
Novel Task Solution
One key takeaway from this research is that EUREKA can solve tasks that were previously infeasible with manual reward engineering, such as dexterous pen spinning. In this task, a five-finger hand needs to rapidly rotate a pen in predefined spinning configurations. By incorporating EUREKA with curriculum learning, the researchers demonstrated the ability to perform rapid pen spinning tricks on the simulated Shadow Hand.
In-Context Learning and Human Feedback
EUREKA also introduces a gradient-free, in-context learning approach to reinforcement learning from human feedback (RLHF). This enables the algorithm to generate reward functions that align better with human preferences and improve the quality and safety of rewards without model updating. Through user-input textual feedback, EUREKA can adapt and refine its generated rewards, which showcases its ability to cooperate with human co-pilots in arriving at a desirable agent behavior.
Comparison to L2R and Human-Engineered Rewards
Unlike L2R, which requires task-specific prompts, reward templates, and a few-shot examples, EUREKA can autonomously generate rewards without any task-specific information. The generality of EUREKA, along with its superior performance, highlights the potential of this approach over traditional reward engineering methods.
Looking Forward
EUREKA represents a significant step forward in leveraging the capabilities of LLMs for reward design and complex low-level manipulation tasks. Its success with pen spinning and performance in various RL environments points to a promising future for LLM-based reward design. While the paper provides extensive experimentation and results, there's always room for improvement and further research into topics such as safety, generalization, and efficiency improvements.
Ultimately, the researchers behind EUREKA aim to promote further research by open-sourcing reward functions, prompts, and environments tied to their work, hoping that others can build upon their discoveries and unlock new potential within the field of artificial intelligence and robotics.
4K4D: Real-Time 4D View Synthesis at 4K Resolution
Authors: Zhen Xu, Sida Peng, Haotong Lin, Guangzhao He, Jiaming Sun, Yujun Shen, Hujun Bao, Xiaowei Zhou
Source & References: https://arxiv.org/abs/2310.11448
Introduction
Recent advances in machine learning and computer vision have led researchers to tackle the problem of real-time 4D view synthesis at 4K resolution—a dynamic 3D scene rendering technology that can create realistic and immersive virtual playbacks. Applications for this technology span from virtual and augmented reality to sports broadcasting and performance capture.
Traditional methods for dynamic 3D scene reconstruction face limitations due to complex hardware requirements and reliance on controlled environments. Meanwhile, implicit neural representations have shown great success in reconstructing dynamic 3D scenes from RGB videos via differentiable rendering, which makes it easier to optimize. However, existing approaches can be slow when rendering high-resolution images—hence the need for a faster method that maintains state-of-the-art quality.
Keep reading with a 7-day free trial
Subscribe to State of AI to keep reading this post and get 7 days of free access to the full post archives.