MakeItTalk: Speaker-Aware Talking-Head Animation

Overview

MakeItTalk: Speaker-Aware Talking-Head Animation

This is the code repository implementing the paper:

MakeItTalk: Speaker-Aware Talking-Head Animation

Yang Zhou, Xintong Han, Eli Shechtman, Jose Echevarria , Evangelos Kalogerakis, Dingzeyu Li

SIGGRAPH Asia 2020

Abstract We present a method that generates expressive talking-head videos from a single facial image with audio as the only input. In contrast to previous attempts to learn direct mappings from audio to raw pixels for creating talking faces, our method first disentangles the content and speaker information in the input audio signal. The audio content robustly controls the motion of lips and nearby facial regions, while the speaker information determines the specifics of facial expressions and the rest of the talking-head dynamics. Another key component of our method is the prediction of facial landmarks reflecting the speaker-aware dynamics. Based on this intermediate representation, our method works with many portrait images in a single unified framework, including artistic paintings, sketches, 2D cartoon characters, Japanese mangas, and stylized caricatures. In addition, our method generalizes well for faces and characters that were not observed during training. We present extensive quantitative and qualitative evaluation of our method, in addition to user studies, demonstrating generated talking-heads of significantly higher quality compared to prior state-of-the-art methods.

[Project page] [Paper] [Video] [Arxiv] [Colab Demo] [Colab Demo TDLR]

img

Figure. Given an audio speech signal and a single portrait image as input (left), our model generates speaker-aware talking-head animations (right). Both the speech signal and the input face image are not observed during the model training process. Our method creates both non-photorealistic cartoon animations (top) and natural human face videos (bottom).

Updates

  • facewarp source code and compile instructions
  • Pre-trained models
  • Google colab quick demo for natural faces [detail] [TDLR]
  • Training code for each module
  • Customized puppet creating tool

Requirements

  • Python environment 3.6
conda create -n makeittalk_env python=3.6
conda activate makeittalk_env
sudo apt-get install ffmpeg
  • python packages
pip install -r requirements.txt
sudo dpkg --add-architecture i386
wget -nc https://dl.winehq.org/wine-builds/winehq.key
sudo apt-key add winehq.key
sudo apt-add-repository 'deb https://dl.winehq.org/wine-builds/ubuntu/ xenial main'
sudo apt update
sudo apt install --install-recommends winehq-stable

Pre-trained Models

Download the following pre-trained models to examples/ckpt folder for testing your own animation.

Model Link to the model
Voice Conversion Link
Speech Content Module Link
Speaker-aware Module Link
Image2Image Translation Module Link
Non-photorealistic Warping (.exe) Link

Animate You Portraits!

  • Download pre-trained embedding [here] and save to examples/dump folder.

Nature Human Faces / Paintings

  • crop your portrait image into size 256x256 and put it under examples folder with .jpg format. Make sure the head is almost in the middle (check existing examples for a reference).

  • put test audio files under examples folder as well with .wav format.

  • animate!

python main_end2end.py --jpg 
     

   
  • use addition args --amp_lip_x --amp_lip_y --amp_pos to amply lip motion (in x/y-axis direction) and head motion displacements, default values are =2., =2., =.5

Cartoon Faces

  • put test audio files under examples folder as well with .wav format.

  • animate one of the existing puppets

Puppet Name wilk roy sketch color cartoonM danbooru1
Image img img img img img img
python main_end2end_cartoon.py --jpg 
   
     --jpg_bg 
    

    
   
  • --jpg_bg takes a same-size image as the background image to create the animation, such as the puppet's body, the overall fixed background image. If you want to use the background, make sure the puppet face image (i.e. --jpg image) is in png format and is transparent on the non-face area. If you don't need any background, please also create a same-size image (e.g. a pure white image) to hold the argument place.

  • use addition args --amp_lip_x --amp_lip_y --amp_pos to amply lip motion (in x/y-axis direction) and head motion displacements, default values are =2., =2., =.5

  • create your own puppets (ToDo...)

Train

Train Voice Conversion Module

Todo...

Train Content Branch

  • Create dataset root directory

  • Dataset: Download preprocessed dataset [here], and put it under /dump .

  • Train script: Run script below. Models will be saved in /ckpt/ .

    python main_train_content.py --train --write --root_dir <root_dir> --name <train_instance_name>

Train Speaker-Aware Branch

Todo...

Train Image-to-Image Translation

Todo...

License

Acknowledgement

We would like to thank Timothy Langlois for the narration, and Kaizhi Qian for the help with the voice conversion module. We thank Jakub Fiser for implementing the real-time GPU version of the triangle morphing algorithm. We thank Daichi Ito for sharing the caricature image and Dave Werner for Wilk, the gruff but ultimately lovable puppet.

This research is partially funded by NSF (EAGER-1942069) and a gift from Adobe. Our experiments were performed in the UMass GPU cluster obtained under the Collaborative Fund managed by the MassTech Collaborative.

Owner
Adobe Research
Adobe Research
Seach Losses of our paper 'Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search', accepted by ICLR 2021.

CSE-Autoloss Designing proper loss functions for vision tasks has been a long-standing research direction to advance the capability of existing models

Peidong Liu(刘沛东) 54 Dec 17, 2022
Can we learn gradients by Hamiltonian Neural Networks?

Can we learn gradients by Hamiltonian Neural Networks? This project was carried out as part of the Optimization for Machine Learning course (CS-439) a

2 Aug 22, 2022
Open source implementation of "A Self-Supervised Descriptor for Image Copy Detection" (SSCD).

A Self-Supervised Descriptor for Image Copy Detection (SSCD) This is the open-source codebase for "A Self-Supervised Descriptor for Image Copy Detecti

Meta Research 68 Jan 04, 2023
Official code for the CVPR 2022 (oral) paper "Extracting Triangular 3D Models, Materials, and Lighting From Images".

nvdiffrec Joint optimization of topology, materials and lighting from multi-view image observations as described in the paper Extracting Triangular 3D

NVIDIA Research Projects 1.4k Jan 01, 2023
Official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR)

This is the official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.

12 Jan 13, 2022
A set of tools to pre-calibrate and calibrate (multi-focus) plenoptic cameras (e.g., a Raytrix R12) based on the libpleno.

COMPOTE: Calibration Of Multi-focus PlenOpTic camEra. COMPOTE is a set of tools to pre-calibrate and calibrate (multifocus) plenoptic cameras (e.g., a

ComSEE - Computers that SEE 4 May 10, 2022
PyGRANSO: A PyTorch-enabled port of GRANSO with auto-differentiation

PyGRANSO PyGRANSO: A PyTorch-enabled port of GRANSO with auto-differentiation Please check https://ncvx.org/PyGRANSO for detailed instructions (introd

SUN Group @ UMN 26 Nov 16, 2022
The UI as a mobile display for OP25

OP25 Mobile Control Head A 'remote' control head that interfaces with an OP25 instance. We take advantage of some data end-points left exposed for the

Sarah Rose Giddings 13 Dec 28, 2022
A curated list of references for MLOps

A curated list of references for MLOps

Larysa Visengeriyeva 9.3k Jan 07, 2023
OpenPCDet Toolbox for LiDAR-based 3D Object Detection.

OpenPCDet OpenPCDet is a clear, simple, self-contained open source project for LiDAR-based 3D object detection. It is also the official code release o

OpenMMLab 3.2k Dec 31, 2022
Code for the paper: Adversarial Machine Learning: Bayesian Perspectives

Code for the paper: Adversarial Machine Learning: Bayesian Perspectives This repository contains code for reproducing the experiments in the ** Advers

Roi Naveiro 2 Nov 11, 2022
Collection of in-progress libraries for entity neural networks.

ENN Incubator Collection of in-progress libraries for entity neural networks: Neural Network Architectures for Structured State Entity Gym: Abstractio

25 Dec 01, 2022
Face Alignment using python

Face Alignment Face Alignment using python Input Image Aligned Face Aligned Face Aligned Face Input Image Aligned Face Input Image Aligned Face Instal

Sajjad Aemmi 28 Nov 23, 2022
Neural network for recognizing the gender of people in photos

Neural Network For Gender Recognition How to test it? Install requirements.txt file using pip install -r requirements.txt command Run nn.py using pyth

Valery Chapman 1 Sep 18, 2022
AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition

AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition [ArXiv] [Project Page] This repository is the official implementation of AdaMML:

International Business Machines 43 Dec 26, 2022
[TIP 2021] SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction

SADRNet Paper link: SADRNet: Self-Aligned Dual Face Regression Networks for Robust 3D Dense Face Alignment and Reconstruction Requirements python

Multimedia Computing Group, Nanjing University 99 Dec 30, 2022
Learning based AI for playing multi-round Koi-Koi hanafuda card games. Have fun.

Koi-Koi AI Learning based AI for playing multi-round Koi-Koi hanafuda card games. Platform Python PyTorch PySimpleGUI (for the interface playing vs AI

Sanghai Guan 10 Nov 20, 2022
Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data.

causal-bald | Abstract | Installation | Example | Citation | Reproducing Results DUE An implementation of the methods presented in Causal-BALD: Deep B

OATML 13 Oct 07, 2022
A Unified Generative Framework for Various NER Subtasks.

This is the code for ACL-ICJNLP2021 paper A Unified Generative Framework for Various NER Subtasks. Install the package in the requirements.txt, then u

177 Jan 05, 2023
A pytorch-based deep learning framework for multi-modal 2D/3D medical image segmentation

A 3D multi-modal medical image segmentation library in PyTorch We strongly believe in open and reproducible deep learning research. Our goal is to imp

Adaloglou Nikolas 1.2k Dec 27, 2022