BakedAvatar: Baking Neural Fields for Real-Time Head Avatar Synthesis


ACM Transactions on Graphics (SIGGRAPH Asia 2023)

Hao-Bin Duan1      Miao Wang1, 2      Jin-Chuan Shi1      Xu-Chuan Chen1      Yan-Pei Cao3
1State Key Laboratory of Virtual Reality Technology and Systems, Beihang University     2Zhongguancun Laboratory     3ARC Lab, Tencent PCG    

TL;DR: BakedAvatar takes monocular video recordings of a person and produces a mesh-based representation for real-time 4D head avatar synthesis on various devices including mobiles.

Abstract

Synthesizing photorealistic 4D human head avatars from videos is essential for VR/AR, telepresence, and video game applications. Although existing Neural Radiance Fields (NeRF)-based methods achieve high-fidelity results, the computational expense limits their use in real-time applications. To overcome this limitation, we introduce BakedAvatar, a novel representation for real-time neural head avatar synthesis, deployable in a standard polygon rasterization pipeline. Our approach extracts deformable multi-layer meshes from learned isosurfaces of the head and computes expression-, pose-, and view-dependent appearances that can be baked into static textures for efficient rasterization. We thus propose a three-stage pipeline for neural head avatar synthesis, which includes learning continuous deformation, manifold, and radiance fields, extracting layered meshes and textures, and fine-tuning texture details with differential rasterization. Experimental results demonstrate that our representation generates synthesis results of comparable quality to other state-of-the-art methods while significantly reducing the inference time required. We further showcase various head avatar synthesis results from monocular videos, including view synthesis, face reenactment, expression editing, and pose editing, all at interactive frame rates.

Method

Overview of our three-stage pipeline. In the first stage, we learn three implicit fields in canonical space for FLAME-based deformation, isosurface geometry, as well as multiple radiance bases and a position feature that are combined by the light-weighted appearance decoder. In the second stage, we bake these neural fields as deformable multi-layer meshes and multiple static textures. In the third stage, we use differential rasterization to fine-tune the baked textures.

Results

We test our method on multiple monocular videos of real portrait videos from PointAvatar [Zheng et al. 2022b], NHA [Grassal et al. 2022], NerFace [Gafniet al. 2021], and HDTF [Zhang et al. 2021].

Comparisons

We compare our method with state-of-the-art neural head avatar reconsturction methods.

Controlled Head Avatar Synthesis

Our method supports real-time rendering of controllable head avatars. We test our method on novel-view synthesis, facial reenactment, expression and pose editing task.

Real-time Rendering

Our method renders 4D head avatars at real-time framerates on mobiles devices.

BibTeX

If you find this work useful for your work, please cite us:

@article{bakedavatar,
  author = {Duan, Hao-Bin and Wang, Miao and Shi, Jin-Chuan and Chen, Xu-Chuan and Cao, Yan-Pei},
  title = {BakedAvatar: Baking Neural Fields for Real-Time Head Avatar Synthesis},
  year = {2023},
  issue_date = {December 2023},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3618399},
  doi = {10.1145/3618399},
  volume = {42},
  number = {6},
  journal = {ACM Trans. Graph.},
  month = {sep},
  articleno = {225},
  numpages = {14}
}

Acknowledgements

The authors express gratitude to the anonymous reviewers for their valuable feedback. This work was supported by the National Natural Science Foundation of China (Project Number: 61932003) and the Fundamental Research Funds for the Central Universities.