Visemenet
-
- Yang Zhou
- University of Massachusetts Amherst
-
- Zhan Xu
- University of Massachusetts Amherst
-
- Chris Landreth
- University of Toronto
-
- Evangelos Kalogerakis
- University of Massachusetts Amherst
-
- Subhransu Maji
- University of Massachusetts Amherst
-
- Karan Singh
- University of Toronto
Bibliographic Information
- Other Title
-
- audio-driven animator-centric speech animation
Description
<jats:p>We present a novel deep-learning based approach to producing animator-centric speech motion curves that drive a JALI or standard FACS-based production face-rig, directly from input audio. Our three-stage Long Short-Term Memory (LSTM) network architecture is motivated by psycho-linguistic insights: segmenting speech audio into a stream of phonetic-groups is sufficient for viseme construction; speech styles like mumbling or shouting are strongly co-related to the motion of facial landmarks; and animator style is encoded in viseme motion curve profiles. Our contribution is an automatic real-time lip-synchronization from audio solution that integrates seamlessly into existing animation pipelines. We evaluate our results by: cross-validation to ground-truth data; animator critique and edits; visual comparison to recent deep-learning lip-synchronization solutions; and showing our approach to be resilient to diversity in speaker and language.</jats:p>
Journal
-
- ACM Transactions on Graphics
-
ACM Transactions on Graphics 37 (4), 1-10, 2018-07-30
Association for Computing Machinery (ACM)
- Tweet
Details 詳細情報について
-
- CRID
- 1361981470612196096
-
- ISSN
- 15577368
- 07300301
-
- Data Source
-
- Crossref