Communications on Applied Mathematics and Computation ›› 2024, Vol. 6 ›› Issue (2): 837-861.doi: 10.1007/s42967-023-00256-y

• ORIGINAL PAPERS • Previous Articles     Next Articles

Deep Energies for Estimating Three-Dimensional Facial Pose and Expression

Jane Wu1, Michael Bao1, Xinwei Yao1, Ronald Fedkiw1,2   

  1. 1. Department of Computer Science, Stanford University, 353 Jane Stanford Way, Stanford, CA 94305, USA;
    2. Epic Games, 620 Crossroads Blvd, Cary, NC 27518, USA
  • Received:2022-10-26 Revised:2022-10-26 Accepted:2023-01-28 Online:2023-03-27 Published:2023-03-27
  • Contact: Jane Wu,E-mail:janehwu@stanford.edu;Michael Bao,E-mail:mikebao@stanford.edu;Xinwei Yao,E-mail:yaodavid@stanford.edu;Ronald Fedkiw,E-mail:fedkiw@cs.stanford.edu E-mail:janehwu@stanford.edu;mikebao@stanford.edu;yaodavid@stanford.edu;fedkiw@cs.stanford.edu
  • Supported by:
    Research was supported in part by the Office of Naval Research (ONR) N00014-13-1-0346, ONR N00014-17-1-2174, ARL AHPCRC W911NF-07-0027, and generous gifts from Amazon and Toyota.

Abstract: While much progress has been made in capturing high-quality facial performances using motion capture markers and shape-from-shading, high-end systems typically also rely on rotoscope curves hand-drawn on the image. These curves are subjective and difficult to draw consistently; moreover, ad-hoc procedural methods are required for generating matching rotoscope curves on synthetic renders embedded in the optimization used to determine three-dimensional (3D) facial pose and expression. We propose an alternative approach whereby these curves and other keypoints are detected automatically on both the image and the synthetic renders using trained neural networks, eliminating artist subjectivity, and the ad-hoc procedures meant to mimic it. More generally, we propose using machine learning networks to implicitly define deep energies which when minimized using classical optimization techniques lead to 3D facial pose and expression estimation.

Key words: Numerical optimization, Neural networks, Motion capture, Face tracking