Code for "Layered Neural Rendering for Retiming People in Video."

Overview

Layered Neural Rendering in PyTorch

This repository contains training code for the examples in the SIGGRAPH Asia 2020 paper "Layered Neural Rendering for Retiming People in Video."

This is not an officially supported Google product.

Prerequisites

  • Linux
  • Python 3.6+
  • NVIDIA GPU + CUDA CuDNN

Installation

This code has been tested with PyTorch 1.4 and Python 3.8.

  • Install PyTorch 1.4 and other dependencies.
    • For pip users, please type the command pip install -r requirements.txt.
    • For Conda users, you can create a new Conda environment using conda env create -f environment.yml.

Data Processing

  • Download the data for a video used in our paper (e.g. "reflection"):
bash ./datasets/download_data.sh reflection
  • Or alternatively, download all the data by specifying all.
  • Download the pretrained keypoint-to-UV model weights:
bash ./scripts/download_kp2uv_model.sh

The pretrained model will be saved at ./checkpoints/kp2uv/latest_net_Kp2uv.pth.

  • Generate the UV maps from the keypoints:
bash datasets/prepare_iuv.sh ./datasets/reflection

Training

  • To train a model on a video (e.g. "reflection"), run:
python train.py --name reflection --dataroot ./datasets/reflection --gpu_ids 0,1
  • To view training results and loss plots, visit the URL http://localhost:8097. Intermediate results are also at ./checkpoints/reflection/web/index.html.

You can find more scripts in the scripts directory, e.g. run_${VIDEO}.sh which combines data processing, training, and saving layer results for a video.

Note:

  • It is recommended to use >=2 GPUs, each with >=16GB memory.
  • The training script first trains the low-resolution model for --num_epochs at --batch_size, and then trains the upsampling module for --num_epochs_upsample at --batch_size_upsample. If you do not need the upsampled result, pass --num_epochs_upsample 0.
  • Training the upsampling module requires ~2.5x memory as the low-resolution model, so set batch_size_upsample accordingly. The provided scripts set the batch sizes appropriately for 2 GPUs with 16GB memory.
  • GPU memory scales linearly with the number of layers.

Saving layer results from a trained model

  • Run the trained model:
python test.py --name reflection --dataroot ./datasets/reflection --do_upsampling
  • The results (RGBA layers, videos) will be saved to ./results/reflection/test_latest/.
  • Passing --do_upsampling uses the results of the upsampling module. If the upsampling module hasn't been trained (num_epochs_upsample=0), then remove this flag.

Custom video

To train on your own video, you will have to preprocess the data:

  1. Extract the frames, e.g.
    mkdir ./datasets/my_video && cd ./datasets/my_video 
    mkdir rgb && ffmpeg -i video.mp4 rgb/%04d.png
    
  2. Resize the video to 256x448 and save the frames in my_video/rgb_256, and resize the video to 512x896 and save in my_video/rgb_512.
  3. Run AlphaPose and Pose Tracking on the frames. Save results as my_video/keypoints.json
  4. Create my_video/metadata.json following these instructions.
  5. If your video has camera motion, either (1) stabilize the video, or (2) maintain the camera motion by computing homographies and saving as my_video/homographies.txt. See scripts/run_cartwheel.sh for a training example with camera motion, and see ./datasets/cartwheel/homographies.txt for formatting.

Note: Videos that are suitable for our method have the following attributes:

  • Static camera or limited camera motion that can be represented with a homography.
  • Limited number of people, due to GPU memory limitations. We tested up to 7 people and 7 layers. Multiple people can be grouped onto the same layer, though they cannot be individually retimed.
  • People that move relative to the background (static people will be absorbed into the background layer).
  • We tested a video length of up to 200 frames (~7 seconds).

Citation

If you use this code for your research, please cite the following paper:

@inproceedings{lu2020,
  title={Layered Neural Rendering for Retiming People in Video},
  author={Lu, Erika and Cole, Forrester and Dekel, Tali and Xie, Weidi and Zisserman, Andrew and Salesin, David and Freeman, William T and Rubinstein, Michael},
  booktitle={SIGGRAPH Asia},
  year={2020}
}

Acknowledgments

This code is based on pytorch-CycleGAN-and-pix2pix.

Owner
Google
Google ❤️ Open Source
Google
A collection of implementations of deep domain adaptation algorithms

Deep Transfer Learning on PyTorch This is a PyTorch library for deep transfer learning. We divide the code into two aspects: Single-source Unsupervise

Yongchun Zhu 647 Jan 03, 2023
Official repository of the paper Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision

Official repository of the paper Learning to Regress 3D Face Shape and Expression from an Image without 3D Supervision

Soubhik Sanyal 689 Dec 25, 2022
The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding.

SuperGen The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding. Requirements Before running, you

Yu Meng 38 Dec 12, 2022
Meta Learning Backpropagation And Improving It (VSML)

Meta Learning Backpropagation And Improving It (VSML) This is research code for the NeurIPS 2021 publication Kirsch & Schmidhuber 2021. Many concepts

Louis Kirsch 22 Dec 21, 2022
Temporal Dynamic Convolutional Neural Network for Text-Independent Speaker Verification and Phonemetic Analysis

TDY-CNN for Text-Independent Speaker Verification Official implementation of Temporal Dynamic Convolutional Neural Network for Text-Independent Speake

Seong-Hu Kim 16 Oct 17, 2022
VGGFace2-HQ - A high resolution face dataset for face editing purpose

The first open source high resolution dataset for face swapping!!! A high resolution version of VGGFace2 for academic face editing purpose

Naiyuan Liu 232 Dec 29, 2022
Equipped customers with insights about their EVs Hourly energy consumption and helped predict future charging behavior using LSTM model

Equipped customers with insights about their EVs Hourly energy consumption and helped predict future charging behavior using LSTM model. Designed sample dashboard with insights and recommendation for

Yash 2 Apr 07, 2022
A generalized framework for prototyping full-stack cooperative driving automation applications under CARLA+SUMO.

OpenCDA OpenCDA is a SIMULATION tool integrated with a prototype cooperative driving automation (CDA; see SAE J3216) pipeline as well as regular autom

UCLA Mobility Lab 726 Dec 29, 2022
🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

English | 简体中文 | 繁體中文 State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrained mo

Hugging Face 77.2k Jan 02, 2023
A minimal implementation of face-detection models using flask, gunicorn, nginx, docker, and docker-compose

Face-Detection-flask-gunicorn-nginx-docker This is a simple implementation of dockerized face-detection restful-API implemented with flask, Nginx, and

Pooya-Mohammadi 30 Dec 17, 2022
This is the official code release for the paper Shape and Material Capture at Home

This is the official code release for the paper Shape and Material Capture at Home. The code enables you to reconstruct a 3D mesh and Cook-Torrance BRDF from one or more images captured with a flashl

89 Dec 10, 2022
Install alphafold on the local machine, get out of docker.

AlphaFold This package provides an implementation of the inference pipeline of AlphaFold v2.0. This is a completely new model that was entered in CASP

Kui Xu 73 Dec 13, 2022
Image data augmentation scheduler for albumentations transforms

albu_scheduler Scheduler for albumentations transforms based on PyTorch schedulers interface Usage TransformMultiStepScheduler import albumentations a

19 Aug 04, 2021
Sign Language Transformers (CVPR'20)

Sign Language Transformers (CVPR'20) This repo contains the training and evaluation code for the paper Sign Language Transformers: Sign Language Trans

Necati Cihan Camgoz 164 Dec 30, 2022
CN24 is a complete semantic segmentation framework using fully convolutional networks

Build status: master (production branch): develop (development branch): Welcome to the CN24 GitHub repository! CN24 is a complete semantic segmentatio

Computer Vision Group Jena 123 Jul 14, 2022
Hierarchical Motion Encoder-Decoder Network for Trajectory Forecasting (HMNet)

Hierarchical Motion Encoder-Decoder Network for Trajectory Forecasting (HMNet) Our paper: https://arxiv.org/abs/2111.13324 We will release the complet

15 Oct 17, 2022
audioLIME: Listenable Explanations Using Source Separation

audioLIME This repository contains the Python package audioLIME, a tool for creating listenable explanations for machine learning models in music info

Institute of Computational Perception 27 Dec 01, 2022
Implementation of Kaneko et al.'s MaskCycleGAN-VC model for non-parallel voice conversion.

MaskCycleGAN-VC Unofficial PyTorch implementation of Kaneko et al.'s MaskCycleGAN-VC (2021) for non-parallel voice conversion. MaskCycleGAN-VC is the

86 Dec 25, 2022
RLMeta is a light-weight flexible framework for Distributed Reinforcement Learning Research.

RLMeta rlmeta - a flexible lightweight research framework for Distributed Reinforcement Learning based on PyTorch and moolib Installation To build fro

Meta Research 281 Dec 22, 2022
An official source code for "Augmentation-Free Self-Supervised Learning on Graphs"

Augmentation-Free Self-Supervised Learning on Graphs An official source code for Augmentation-Free Self-Supervised Learning on Graphs paper, accepted

Namkyeong Lee 59 Dec 01, 2022