Mesh-Tension Driven Expression-Based Wrinkles for Synthetic Faces

Winter Conference on
Applications of Computer Vision 2023

Chirag Raman Charlie Hewitt Erroll Wood Tadas Baltrušaitis

Paper arXiv

Abstract

Recent advances in synthesizing realistic faces have shown that synthetic training data can replace real data for various face-related computer vision tasks. A question arises: how important is realism? Is the pursuit of photorealism excessive? In this work, we show otherwise.

We boost the realism of our synthetic faces by introducing dynamic skin wrinkles in response to facial expressions, and observe significant performance improvements in downstream computer vision tasks. Previous approaches for producing such wrinkles either required prohibitive artist effort to scale across identities and expressions, or were not capable of reconstructing high-frequency skin details with sufficient fidelity. Our key contribution is an approach that produces realistic wrinkles across a large and diverse population of digital humans.

Concretely, we formalize the concept of mesh-tension and use it to aggregate possible wrinkles from high-quality expression scans into albedo and displacement texture maps. At synthesis, we use these maps to produce wrinkles even for expressions not represented in the source scans. Additionally, to provide a more nuanced indicator of model performance under deformations resulting from compressed expressions, we introduce the 300W-winks evaluation subset and the Pexels dataset of closed eyes and winks.

Method Overview

We build upon the synthetic face generation framework of Wood et al., which generates albedo and displacement textures using only the neutral-expression scan for an identity (middle row). In contrast, we automatically compute expanded and compressed texture maps (known collectively as wrinkle maps) to aggregate wrinkling effects in the face and neck regions across available posed-expression scans for the identity. At synthesis, for a given set of arbitrary expression parameters we compute the local tension at every vertex in the corresponding face mesh: we depict expansion in green and compression in red. This mesh tension serves as weights to dynamically blend between the neutral, expanded, and compressed texture maps to synthesize the wrinkling effect at that vertex. Note that our method can thereby generate wrinkles for expressions even beyond those represented in the source scans.

Mesh Tension

Our key idea is to formalize the notion of mesh tension to capture the amount of compression (red) or expansion (green) at each vertex of a 3D polygon mesh. We express tension as a function of the mean change in the length of the edges connected to a vertex as a result of a deformation. Here we illustrate tension for various deformations of a simple cylinder mesh. Code for computing mesh tension is available as a Blender add-on.

Results and Takeaways

For each identity, we illustrate final renders without (left) and with (right) our method for dynamic expression-based wrinkles. For the same expression parameters, our method produces varied wrinkling effects across distinct identities (middle and bottom row, left column). For identities where expression scans were not available, we compute wrinkle maps by grafting wrinkling effects from the identities with expression scans. In the right column below we illustrate renders for two expressions for such identities with missing expression scans.

Training models on synthetic faces with our expression-based wrinkles is crucial for localizing keypoints in compressed regions of the face. For the task of surface-normal prediction, models trained on synthetic faces with wrinkles recover significantly more high-frequency details. Moreover, they yield predictions comparable to SOTA methods trained on real-world data while being less noisy and more robust to lighting.

The video at the top of the page illustrates our method for a full range of facial expression parameters. Here is an animated sequence further illustrating our method on a more natural sequence of expressions:

Datasets

For specifically evaluating model performance on compressed expressions, we introduce the novel 300W-winks subset of the 300W dataset and the Pexels Winks and Pexels Blinks datasets. Here we specify the image identifiers that comprise these datasets. The Pexels images can be accessed by appending the identifiers to https://www.pexels.com/photo/.

BibTeX

@inproceedings{raman2023mesh,
    title={Mesh-Tension Driven Expression-Based Wrinkles for Synthetic Faces},
    author={Raman, Chirag and Hewitt, Charlie and Wood, Erroll and Baltru{\v{s}}aitis, Tadas},
    booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    year={2023}
}