Demo programs for the Talking Head Anime from a Single Image 2: More Expressive project.

Overview

Demo Code for "Talking Head Anime from a Single Image 2: More Expressive"

This repository contains demo programs for the Talking Head Anime from a Single Image 2: More Expressive project. Similar to the previous version, it has two programs:

  • The manual_poser lets you manipulate the facial expression and the head rotation of an anime character, given in a single image, through a graphical user interface. The poser is available in two forms: a standard GUI application, and a Jupyter notebook.
  • The ifacialmocap_puppeteer lets you transfer your facial motion, captured by a commercial iOS application called iFacialMocap, to an image of an anime character.

Try the Manual Poser on Google Colab

If you do not have the required hardware (discussed below) or do not want to download the code and set up an environment to run it, click this link to try running the manual poser on Google Colab.

Hardware Requirements

Both programs require a recent and powerful Nvidia GPU to run. I could personally ran them at good speed with the Nvidia Titan RTX. However, I think recent high-end gaming GPUs such as the RTX 2080, the RTX 3080, or better would do just as well.

The ifacialmocap_puppeteer requires an iOS device that is capable of computing blend shape parameters from a video feed. This means that the device must be able to run iOS 11.0 or higher and must have a TrueDepth front-facing camera. (See this page for more info.) In other words, if you have the iPhone X or something better, you should be all set. Personally, I have used an iPhone 12 mini.

Software Requirements

Both programs were written in Python 3. To run the GUIs, the following software packages are required:

  • Python >= 3.8
  • PyTorch >= 1.7.1 with CUDA support
  • SciPY >= 1.6.0
  • wxPython >= 4.1.1
  • Matplotlib >= 3.3.4

In particular, I created the environment to run the programs with Anaconda, using the following commands:

> conda create -n talking-head-anime-2-demo python=3.8
> conda activate talking-head-anime-2-demo
> conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
> conda install scipy
> pip install wxPython
> conda install matplotlib

To run the Jupyter notebook version of the manual_poser, you also need:

  • Jupyter Notebook >= 6.2.0
  • IPyWidgets >= 7.6.3

This means that, in addition to the commands above, you also need to run:

> conda install -c conda-forge notebook
> conda install -c conda-forge ipywidgets
> jupyter nbextension enable --py widgetsnbextension

Lastly, the ifacialmocap_puppeteer requires iFacialMocap, which is available in the App Store for 980 yen. You also need to install the paired desktop application on your PC or Mac. (Linux users, I'm sorry!) Your iOS and your computer must also use the same network. (For example, you may connect them to the same wireless router.)

Automatic Environment Construction with Anaconda

You can also use Anaconda to download and install all Python packages in one command. Open your shell, change the directory to where you clone the repository, and run:

conda env create -f environment.yml

This will create an environment called talking-head-anime-2-demo containing all the required Python packages.

Download the Model

Before running the programs, you need to download the model files from this Dropbox link and unzip it to the data folder of the repository's directory. In the end, the data folder should look like:

+ data
  + illust
    - waifu_00.png
    - waifu_01.png
    - waifu_02.png
    - waifu_03.png
    - waifu_04.png
    - waifu_05.png
    - waifu_06.png
    - waifu_06_buggy.png
  - combiner.pt
  - eyebrow_decomposer.pt
  - eyebrow_morphing_combiner.pt
  - face_morpher.pt
  - two_algo_face_rotator.pt

The model files are distributed with the Creative Commons Attribution 4.0 International License, which means that you can use them for commercial purposes. However, if you distribute them, you must, among other things, say that I am the creator.

Running the manual_poser Desktop Application

Open a shell. Change your working directory to the repository's root directory. Then, run:

> python tha2/app/manual_poser.py

Note that before running the command above, you might have to activate the Python environment that contains the required packages. If you created an environment using Anaconda as was discussed above, you need to run

> conda activate talking-head-anime-2-demo

if you have not already activated the environment.

Running the manual_poser Jupyter Notebook

Open a shell. Activate the environment. Change your working directory to the repository's root directory. Then, run:

> jupyter notebook

A browser window should open. In it, open tha2.ipynb. Once you have done so, you should see that it only has one cell. Run it. Then, scroll down to the end of the document, and you'll see the GUI there.

Running the ifacialmocap_puppeteer

First, run iFacialMocap on your iOS device. It should show you the device's IP address. Jot it down. Keep the app open.

IP address in iFacialMocap screen

Then, run the companion desktop application.

iFaciaMocap desktop application

Click "Open Advanced Setting >>". The application should expand.

Click the 'Open Advanced Setting >>' button.

Click the button that says "Maya" on the right side.

Click the 'Maya' button.

Then, click "Blender."

Select 'Blender' mode in the desktop application

Next, replace the IP address on the left side with your iOS device's IP address.

Replace IP address with device's IP address.

Click "Connect to Blender."

Click 'Connect to Blender.'

Open a shell. Activate the environment. Change your working directory to the repository's root directory. Then, run:

> python tha2/app/ifacialmocap_puppeteer.py

If the programs are connected properly, you should see that the many progress bars at the bottom of the ifacialmocap_puppeteer window should move when you move your face in front of the iOS device's front-facing camera.

You should see the progress bars moving.

If all is well, load an character image, and it should follow your facial movement.

Constraints on Input Images

In order for the model to work well, the input image must obey the following constraints:

  • It must be of size 256 x 256.
  • It must be of PNG format.
  • It must have an alpha channel.
  • It must contain only one humanoid anime character.
  • The character must be looking straight ahead.
  • The head of the character should be roughly contained in the middle 128 x 128 box.
  • All pixels that do not belong to the character (i.e., background pixels) should have RGBA = (0,0,0,0).

Image specification

FAQ: I prepared an image just like you said, why is my output so ugly?!?

This is most likely because your image does not obey the "background RGBA = (0,0,0,0)" constraint. In other words, your background pixels are (RRR,GGG,BBB,0) for some RRR, GGG, BBB > 0 rather than (0,0,0,0). This happens when you use Photoshop because it does not clear the RGB channels of transparent pixels.

Let's see an example. When I tried to use the manual_poser with data/illust/waifu_06_buggy.png. Here's what I got.

A failure case

When you look at the image, there seems to be nothing wrong with it.

waifu_06_buggy.png

However, if you inspect it with GIMP, you will see that the RGB channels have what backgrounds, which means that those pixels have non-zero RGB values.

In the buggy image, background pixels have colors in the RGB channels.

What you want, instead, is something like the non-buggy version: data/illust/waifu_06.png, which looks exactly the same as the buggy one to the naked eyes.

waifu_06.png

However, in GIMP, all channels have black backgrounds.

In the good image, background pixels do not have colors in any channels.

Because of this, the output was clean.

A success case

A way to make sure that your image works well with the model is to prepare it with GIMP. When exporting your image to the PNG format, make sure to uncheck "Save color values from transparent pixels" before you hit "Export."

Make sure to uncheck 'Save color values from transparent pixels' before exporting!

Disclaimer

While the author is an employee of Google Japan, this software is not Google's product and is not supported by Google.

The copyright of this software belongs to me as I have requested it using the IARC process. However, Google might claim the rights to the intellectual property of this invention.

The code is released under the MIT license. The model is released under the Creative Commons Attribution 4.0 International License.

Owner
Pramook Khungurn
A software developer from Thailand, interested in computer graphics, machine learning, and algorithms.
Pramook Khungurn
基于“Seq2Seq+前缀树”的知识图谱问答

KgCLUE-bert4keras 基于“Seq2Seq+前缀树”的知识图谱问答 简介 博客:https://kexue.fm/archives/8802 环境 软件:bert4keras=0.10.8 硬件:目前的结果是用一张Titan RTX(24G)跑出来的。 运行 第一次运行的时候,会给知

苏剑林(Jianlin Su) 65 Dec 12, 2022
An easy to use Natural Language Processing library and framework for predicting, training, fine-tuning, and serving up state-of-the-art NLP models.

Welcome to AdaptNLP A high level framework and library for running, training, and deploying state-of-the-art Natural Language Processing (NLP) models

Novetta 407 Jan 03, 2023
Code and data accompanying Natural Language Processing with PyTorch

Natural Language Processing with PyTorch Build Intelligent Language Applications Using Deep Learning By Delip Rao and Brian McMahan Welcome. This is a

Joostware 1.8k Jan 01, 2023
Pre-training BERT masked language models with custom vocabulary

Pre-training BERT Masked Language Models (MLM) This repository contains the method to pre-train a BERT model using custom vocabulary. It was used to p

Stella Douka 14 Nov 02, 2022
A simple version of DeTR

DeTR-Lite A simple version of DeTR Before you enjoy this DeTR-Lite The purpose of this project is to allow you to learn the basic knowledge of DeTR. P

Jianhua Yang 11 Jun 13, 2022
Kerberoast with ACL abuse capabilities

targetedKerberoast targetedKerberoast is a Python script that can, like many others (e.g. GetUserSPNs.py), print "kerberoast" hashes for user accounts

Shutdown 213 Dec 22, 2022
मराठी भाषा वाचविण्याचा एक प्रयास. इंग्रजी ते मराठीचा शब्दकोश. An attempt to preserve the Marathi language. A lightweight and ad free English to Marathi thesaurus.

For English, scroll down मराठी शब्द मराठी भाषा वाचवण्यासाठी मी हा ओपन सोर्स प्रोजेक्ट सुरू केला आहे. माझ्या मते, आपली भाषा हळूहळू आणि कोणाचाही लक्षात

मुक्त स्त्रोत 20 Oct 11, 2022
Almost State-of-the-art Text Generation library

Ps: we are adding transformer model soon Text Gen 🐐 Almost State-of-the-art Text Generation library Text gen is a python library that allow you build

Emeka boris ama 63 Jun 24, 2022
MMDA - multimodal document analysis

MMDA - multimodal document analysis

AI2 75 Jan 04, 2023
auto_code_complete is a auto word-completetion program which allows you to customize it on your need

auto_code_complete v1.3 purpose and usage auto_code_complete is a auto word-completetion program which allows you to customize it on your needs. the m

RUO 2 Feb 22, 2022
NLP techniques such as named entity recognition, sentiment analysis, topic modeling, text classification with Python to predict sentiment and rating of drug from user reviews.

This file contains the following documents sumbited for Baruch CIS9665 group 9 fall 2021. 1. Dataset: drug_reviews.csv 2. python codes for text classi

Aarif Munwar Jahan 2 Jan 04, 2023
A multi-voice TTS system trained with an emphasis on quality

TorToiSe Tortoise is a text-to-speech program built with the following priorities: Strong multi-voice capabilities. Highly realistic prosody and inton

James Betker 2.1k Jan 01, 2023
🤕 spelling exceptions builder for lazy people

🤕 spelling exceptions builder for lazy people

Vlad Bokov 3 May 12, 2022
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

The implementation of paper CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval. CLIP4Clip is a video-text retrieval model based

ArrowLuo 456 Jan 06, 2023
This repository details the steps in creating a Part of Speech tagger using Trigram Hidden Markov Models and the Viterbi Algorithm without using external libraries.

POS-Tagger This repository details the creation of a Part-of-Speech tagger using Trigram Hidden Markov Models to predict word tags in a word sequence.

Raihan Ahmed 1 Dec 09, 2021
Uncomplete archive of files from the European Nopsled Team

European Nopsled CTF Archive This is an archive of collected material from various Capture the Flag competitions that the European Nopsled team played

European Nopsled 4 Nov 24, 2021
Wake: Context-Sensitive Automatic Keyword Extraction Using Word2vec

Wake Wake: Context-Sensitive Automatic Keyword Extraction Using Word2vec Abstract استخراج خودکار کلمات کلیدی متون کوتاه فارسی با استفاده از word2vec ب

Omid Hajipoor 1 Dec 17, 2021
Text-Based zombie apocalyptic decision-making game in Python

Inspiration We shared university first year game coursework.[to gauge previous experience and start brainstorming] Adapted a particular nuclear fallou

Amin Sabbagh 2 Feb 17, 2022
Transformer - A TensorFlow Implementation of the Transformer: Attention Is All You Need

[UPDATED] A TensorFlow Implementation of Attention Is All You Need When I opened this repository in 2017, there was no official code yet. I tried to i

Kyubyong Park 3.8k Dec 26, 2022
【原神】自动演奏风物之诗琴的程序

疯物之诗琴 读取midi并自动演奏原神风物之诗琴。 可以自定义配置文件自动调整音符来适配风物之诗琴。 (原神1.4直播那天就开始做了!到现在才能放出来。。) 如何使用 在Release页面中下载打包好的程序和midi压缩包并解压。 双击运行“疯物之诗琴.exe”。 在原神中打开风物之诗琴,软件内输入

435 Jan 04, 2023