GeoGen: Geometry-Aware Generative Modeling via Signed Distance Functions

Computer Vision and Pattern Recognition 2024
2nd Workshop on Generative Models for Computer Vision

Salvatore Esposito Qingshan Xu Kacper Kania Charlie Hewitt Octave Mariotti Lohit Petikam Julien Valentin Arno Onken Oisin Mac Aodha

Paper arXiv Dataset

Abstract

We introduce a new generative approach for synthesizing 3D geometry and images from single-view collections. Most existing approaches predict volumetric density to render multi-view consistent images. By employing volumetric rendering using neural radiance fields, they inherit a key limitation: the generated geometry is noisy and unconstrained, limiting the quality and utility of the output meshes. To address this issue, we propose GeoGen, a new SDF-based 3D generative model trained in an end-to-end manner. Initially, we reinterpret the volumetric density as a Signed Distance Function (SDF). This allows us to introduce useful priors to generate valid meshes. However, those priors prevent the generative model from learning details, limiting the applicability of the method to real-world scenarios. To alleviate that problem, we make the transformation learnable and constrain the rendered depth map to be consistent with the zero-level set of the SDF. Through the lens of adversarial training, we encourage the network to produce higher fidelity details on the output meshes. For evaluation, we introduce a synthetic dataset of human avatars captured from 360-degree camera angles, to overcome the challenges presented by real-world datasets, which often lack 3D consistency and do not cover all camera angles. Our experiments on multiple datasets show that GeoGen produces visually and quantitatively better geometry than the previous generative models based on neural radiance fields.

About the Dataset

The GeoGen dataset is a comprehensive collection of over 70,000 synthetic face images spanning 360 degrees views designed for 3D geometry reconstruction research. We build on the synthetic face generation framework of Wood et al.

For our dataset, we randomly generate 7 images of 512×512 for each of 10,800 identities, ensuring a comprehensive set of different views, encompassing full azimuthal coverage

This dataset, introduced in our paper significantly enhances the realism and usability for 3D applications by focusing on the accuracy of camera parameters and the ability to generate easily attainable pseudo ground truths.

  • Accurate Camera Parameters: GeoGen sets itself apart with highly accurate camera parameters for each image, ensuring reliable input for tasks demanding exact geometric precision.
  • Full Azimuthal Coverage: By providing images from a full 360-degree spectrum, the dataset offers unparalleled views of synthetic human heads, crucial for thorough 3D reconstructions.
  • Pseudo Ground Truths: The dataset facilitates the generation of pseudo ground truths using advanced multi-view stereo and surface reconstruction techniques, enabling detailed quantitative evaluations of 3D models.

With its robust framework and detailed captures, the GeoGen dataset can be an essential resource for researchers and developers working on next-generation 3D modeling technologies.

Qualitative and Quantitative 3D Reconstruction Results

Our latest advances in 3D geometry reconstruction, as detailed in our findings for both ShapeNet Cars and Synthetic Heads, demonstrate significant improvements over previous methods. By incorporating Signed Distance Functions (SDF) and Depth Loss, GeoGen achieves superior accuracy and detail in reconstructed models.

Comparison of different 3D reconstruction metrics for generative models on ShapeNet Cars and our Synthetic Heads dataset. We report averages for MSE, HD, and MSD metrics. Variations of GeoGen without the SDF and Depth Loss constraints are also shown. Best methods for each dataset are bolded.
Method Chamfer ↓ MSE ↓ HD ↓ EMD ↓ MSD ↓
ShapeNet Cars
EG3D 0.31 0.31 0.85 0.44 0.33
GeoGen w/o SDF&Depth Loss 0.27 0.28 0.77 0.42 0.31
GeoGen 0.25 0.27 0.77 0.40 0.29
Synthetic Heads
EG3D 0.21 0.29 0.65 0.54 0.35
GeoGen w/o SDF& Depth Loss 0.19 0.29 0.59 0.45 0.26
GeoGen 0.17 0.27 0.56 0.43 0.24

These results highlight our model's capability in providing detailed and accurate reconstructions, reducing metrics like Chamfer and MSE significantly across all tested models, and improving handling metrics like HD, EMD, and MSD.

The precision in our 3D models showcases our capability to tackle complex reconstruction challenges. These results are pivotal for applications requiring precise geometric data and serve as a benchmark for future developments in the field.

BibTeX

@inproceedings{esposito2024geogen,
    author = {Esposito, Salvatore and Xu, Qingshan and Kania, Kacper and Hewitt, Charlie and Mariotti, Octave and Petikam, Lohit and Valentin, Julien and Onken, Arno and Mac Aodha, Oisin},
    title = {GeoGen: Geometry-Aware Generative Modeling via Signed Distance Functions},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month = {June},
    year = {2024},
    pages = {7479-7488}
}