Gaussian Splatting has changed the game for real-time photo-realistic rendering. One of the most popular applications of Gaussian Splatting is to create animatable avatars, known as Gaussian Avatars. Recent works have pushed the boundaries of quality and rendering efficiency but suffer from two main limitations. Either they require expensive multi-camera rigs to produce avatars with free-view rendering, or they can be trained with a single camera but only rendered at high quality from this fixed viewpoint. An ideal model would be trained using a short monocular video or image from available hardware, such as a webcam, and rendered from any view.
To this end, we propose GASP: Gaussian Avatars with Synthetic Priors. To overcome the limitations of existing datasets, we exploit the pixel-perfect nature of synthetic data to train a Gaussian Avatar prior. By fitting this prior model to a single photo or video and fine-tuning it, we get a high-quality Gaussian Avatar, which supports 360-degree rendering. Our prior is only required for fitting, not inference, enabling real-time application. Through our method, we obtain high-quality, animatable Avatars from limited data which can be animated and rendered at 70fps on commercial hardware.
Gaussian Avatar Models are excellent at producing high-quality Avatars given multi-camera data. However, models trained on only a single camera or image suffer significantly when rendered from novel viewpoints.
To "fill in the gaps" left by missing data when training on a single camera we propose training a prior over Gaussian Avatars. Ideally, we would do this with a huge dataset of many people from all angles. However, existing datasets lack full 360-degree coverage and are not sufficiently diverse. Moreover, annotations such as camera parameters have to be estimated. We therefore use perfectly annotated and diverse synthetic data to build a synthetic prior. Correlations in per-Gaussian features, together with our three-stage fitting process enables users to create realistic avatars from single camera data.
First we generate a large, synthetic dataset. This dataset is highly diverse and has perfect annotations, including camera parameters, and an exact correspondence to the underlying 3DMM.
Our prior model takes learned, Per-Gaussian features and an identity code as input and produces the parameters for a Gaussian Avatar.
The result is a Generative Prior of Gaussian Avatars. Here we show some interpolations in the latent space of the prior.
For a given user, we require only a short video or even a single image. This enables users to enrol with a webcam or smartphone camera.
We learn per-Gaussian features which are mapped by the MLP during the prior learning process. We can then exploit the correlations to update unseen regions. For example updating a Gaussian at the seen front of the hair will also update the back. Here we show a PCA plot of the per-Gaussian features to show the correlations.
We fit using a three-stage process. First we optimize for the latent identity code (top). Next we refine the decoder MLP, which takes advantage of the per-Gaussian feature correlations (middle). Finally we refine the Gaussians themselves (bottom).
@misc{saunders2024gasp, title={{GASP}: Gaussian Avatars with Synthetic Priors}, author={Jack Saunders and Charlie Hewitt and Yanan Jian and Marek Kowalski and Tadas Baltru\v{s}aitis and Yiye Chen and Darren Cosker and Virginia Estellers and Nicholas Gyde and Vinay P. Namboodiri and Benjamin E Lundell}, year={2024}, eprint={2412.07739}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2412.07739}, }