Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis - AI Pulse

Capture and render dynamic content at 8K/60fps on 4090

Jan 09, 2024

Welcome to the 5th issue of AI Pulse. Our goal is simple: each issue will focus on breaking down one important AI or Machine Learning paper. We aim to provide clear, in-depth analysis so that our readers, whether they're professionals, academics, or enthusiasts, can easily understand key developments in the field.

Get 7 day free trial

State of AI

Reading Time: ~ 5 Minutes

Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis

Authors: Zhan Li, Zhang Chen, Zhong Li, Yi Xu

Source and references: https://arxiv.org/abs/2312.16812

Introduction

Novel view synthesis of dynamic scenes has long been an intriguing and challenging problem in computer vision and graphics. In recent years, research advancements like Neural Radiance Fields (NeRF) have dramatically improved the modeling of static scenes from casual multi-view inputs. However, extending these methods from static to dynamic scenes remains a formidable task due to the overhead in model size and training time.

In this paper, the authors present a novel approach called Spacetime Gaussian Feature Splatting to represent dynamic scenes, achieving photorealistic quality, real-time high-resolution rendering, and compact storage simultaneously. The proposed method extends 3D Gaussian Splatting to the spacetime domain and incorporates several new components to handle motion, rotation, and dynamic_opacity.

Spacetime Gaussians

Spacetime Gaussian (STG) is designed to efficiently capture the dynamics in 4D (spatial and temporal) space. The authors extend the 3D Gaussian representation by adding temporal components to model emerging or vanishing content and motion or deformation in dynamic scenes. The STG representation consists of time-dependent opacity, polynomial motion trajectory, and polynomial rotation.

The time-dependent opacity is represented by a temporal radial basis function, allowing STGs to efficiently model scene content that emerges or vanishes within the duration of a video. Polynomial motion trajectories are utilized to model the motion and deformation of a scene over time, providing an expressive representation. Similarly, a polynomial rotation is adopted to represent rotating objects in the dynamic scene.

Splatted Feature Rendering

To achieve compact model size and handle time-varying appearance, the method replaces spherical harmonics (SH) coefficients with features. Each STG stores features that encode base color, view-related information, and time-related information. These features are rasterized to the image space using a differentiable splatting process, and a small multi-layer perceptron (MLP) network converts the features to the final rendered color.

The feature-based approach significantly reduces the number of parameters for each STG compared to traditional SH encoding, while maintaining fast rendering speed. For even faster rendering, a lite-version model can be used, which only keeps the base color feature.

Guided Sampling of Gaussians

Rendering complex scenes can sometimes result in blurry or erroneous areas, especially at distant locations that are sparsely covered by Gaussians. To tackle this issue, the authors propose a guided sampling approach that leverages training error and coarse depth information to sample new Gaussians in challenging areas.

This strategy improves rendering quality by dynamically adding new Gaussians where they are needed during training, simultaneously enhancing the model's representation capability and making it more adaptive to complex scene geometry.

Experimental Results

The authors evaluate their method on several established real-world datasets, demonstrating state-of-the-art rendering quality and speed while retaining compact model size. The lite-version model can achieve an impressive 8K rendering at 60 frames per second (FPS) on an Nvidia RTX 4090 GPU, significantly outperforming other existing models.

Compared to previous methods, the Spacetime Gaussian Feature Splatting approach exhibits superior performance in terms of rendering quality, speed, and model size. The introduction of Spacetime Gaussians, combined with feature-based rendering and guided sampling of Gaussians, leads to a versatile and powerful representation for dynamic scene synthesis.

Conclusion

Spacetime Gaussian Feature Splatting is a groundbreaking approach to dynamic view synthesis, achieving high-fidelity, real-time rendering, and compact storage for dynamic scenes. The introduction of Spacetime Gaussians, along with splatted feature rendering and guided sampling, allows the method to generate photorealistic novel view renderings at unprecedented speeds.

This innovative approach has significant potential applications in virtual reality, augmented reality, broadcasting, education, and other areas that benefit from immersive, dynamic scene representation. Implementing the lite-version model, in particular, makes it possible to render high-quality 8K videos at 60 FPS, opening up new possibilities for real-time interactive experiences in dynamic environments.