Visemenet

Yang Zhou, Zhan Xu, Chris Landreth, Evangelos Kalogerakis, Subhransu Maji, Karan Singh

doi:10.1145/3197517.3201292

Visemenet

DOI Web Site 3 Citations

Yang Zhou

University of Massachusetts Amherst
Zhan Xu

University of Massachusetts Amherst
Chris Landreth

University of Toronto
Evangelos Kalogerakis

University of Massachusetts Amherst
Subhransu Maji

University of Massachusetts Amherst
Karan Singh

University of Toronto

Bibliographic Information

Other Title

audio-driven animator-centric speech animation

Abstract

<jats:p>We present a novel deep-learning based approach to producing animator-centric speech motion curves that drive a JALI or standard FACS-based production face-rig, directly from input audio. Our three-stage Long Short-Term Memory (LSTM) network architecture is motivated by psycho-linguistic insights: segmenting speech audio into a stream of phonetic-groups is sufficient for viseme construction; speech styles like mumbling or shouting are strongly co-related to the motion of facial landmarks; and animator style is encoded in viseme motion curve profiles. Our contribution is an automatic real-time lip-synchronization from audio solution that integrates seamlessly into existing animation pipelines. We evaluate our results by: cross-validation to ground-truth data; animator critique and edits; visual comparison to recent deep-learning lip-synchronization solutions; and showing our approach to be resilient to diversity in speaker and language.</jats:p>

Journal

ACM Transactions on Graphics

ACM Transactions on Graphics 37 (4), 1-10, 2018-07-30

Association for Computing Machinery (ACM)

Citations (3)*help

Details 詳細情報について

CRID

1361981470612196096
DOI

10.1145/3197517.3201292
ISSN

15577368

07300301
Web Site

https://dl.acm.org/doi/pdf/10.1145/3197517.3201292
Data Source
- Crossref

Visemenet

Bibliographic Information

Abstract

Journal

Citations (3)*help

Details 詳細情報について

Export

Report a problem