Code for "Layered Neural Rendering for Retiming People in Video."

Last update: Dec 16, 2022

Overview

Layered Neural Rendering in PyTorch

This repository contains training code for the examples in the SIGGRAPH Asia 2020 paper "Layered Neural Rendering for Retiming People in Video."

This is not an officially supported Google product.

Prerequisites

Linux
Python 3.6+
NVIDIA GPU + CUDA CuDNN

Installation

This code has been tested with PyTorch 1.4 and Python 3.8.

Install PyTorch 1.4 and other dependencies.
- For pip users, please type the command pip install -r requirements.txt.
- For Conda users, you can create a new Conda environment using conda env create -f environment.yml.

Data Processing

Download the data for a video used in our paper (e.g. "reflection"):

bash ./datasets/download_data.sh reflection

Or alternatively, download all the data by specifying all.
Download the pretrained keypoint-to-UV model weights:

bash ./scripts/download_kp2uv_model.sh

The pretrained model will be saved at ./checkpoints/kp2uv/latest_net_Kp2uv.pth.

Generate the UV maps from the keypoints:

bash datasets/prepare_iuv.sh ./datasets/reflection

Training

To train a model on a video (e.g. "reflection"), run:

python train.py --name reflection --dataroot ./datasets/reflection --gpu_ids 0,1

To view training results and loss plots, visit the URL http://localhost:8097. Intermediate results are also at ./checkpoints/reflection/web/index.html.

You can find more scripts in the scripts directory, e.g. run_${VIDEO}.sh which combines data processing, training, and saving layer results for a video.

Note:

It is recommended to use >=2 GPUs, each with >=16GB memory.
The training script first trains the low-resolution model for --num_epochs at --batch_size, and then trains the upsampling module for --num_epochs_upsample at --batch_size_upsample. If you do not need the upsampled result, pass --num_epochs_upsample 0.
Training the upsampling module requires ~2.5x memory as the low-resolution model, so set batch_size_upsample accordingly. The provided scripts set the batch sizes appropriately for 2 GPUs with 16GB memory.
GPU memory scales linearly with the number of layers.

Saving layer results from a trained model

Run the trained model:

python test.py --name reflection --dataroot ./datasets/reflection --do_upsampling

The results (RGBA layers, videos) will be saved to ./results/reflection/test_latest/.
Passing --do_upsampling uses the results of the upsampling module. If the upsampling module hasn't been trained (num_epochs_upsample=0), then remove this flag.

Custom video

To train on your own video, you will have to preprocess the data:

Extract the frames, e.g.

mkdir ./datasets/my_video && cd ./datasets/my_video 
mkdir rgb && ffmpeg -i video.mp4 rgb/%04d.png

Resize the video to 256x448 and save the frames in my_video/rgb_256, and resize the video to 512x896 and save in my_video/rgb_512.
Run AlphaPose and Pose Tracking on the frames. Save results as my_video/keypoints.json
Create my_video/metadata.json following these instructions.
If your video has camera motion, either (1) stabilize the video, or (2) maintain the camera motion by computing homographies and saving as my_video/homographies.txt. See scripts/run_cartwheel.sh for a training example with camera motion, and see ./datasets/cartwheel/homographies.txt for formatting.

Note: Videos that are suitable for our method have the following attributes:

Static camera or limited camera motion that can be represented with a homography.
Limited number of people, due to GPU memory limitations. We tested up to 7 people and 7 layers. Multiple people can be grouped onto the same layer, though they cannot be individually retimed.
People that move relative to the background (static people will be absorbed into the background layer).
We tested a video length of up to 200 frames (~7 seconds).

Citation

If you use this code for your research, please cite the following paper:

@inproceedings{lu2020,
  title={Layered Neural Rendering for Retiming People in Video},
  author={Lu, Erika and Cole, Forrester and Dekel, Tali and Xie, Weidi and Zisserman, Andrew and Salesin, David and Freeman, William T and Rubinstein, Michael},
  booktitle={SIGGRAPH Asia},
  year={2020}
}

Acknowledgments

This code is based on pytorch-CycleGAN-and-pix2pix.

Code for "Layered Neural Rendering for Retiming People in Video."

Related tags

Overview

Layered Neural Rendering in PyTorch

Prerequisites

Installation

Data Processing

Training

Saving layer results from a trained model

Custom video

Citation

Acknowledgments

Owner

Google

Code for the paper titled "Generalized Depthwise-Separable Convolutions for Adversarially Robust and Efficient Neural Networks" (NeurIPS 2021 Spotlight).

Introducing neural networks to predict stock prices

TEDSummary is a speech summary corpus. It includes TED talks subtitle (Document), Title-Detail (Summary), speaker name (Meta info), MP4 URL, and utterance id

LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

Implementation of ICCV 2021 oral paper -- A Novel Self-Supervised Learning for Gaussian Mixture Model

Small repo describing how to use Hugging Face's Wav2Vec2 with PyCTCDecode

Pixel-level Crack Detection From Images Of Levee Systems : A Comparative Study

Qt-GUI implementation of the YOLOv5 algorithm (ver.6 and ver.5)

Official implementation of the paper Vision Transformer with Progressive Sampling, ICCV 2021.

Does Oversizing Improve Prosumer Profitability in a Flexibility Market? - A Sensitivity Analysis using PV-battery System

Automated Hyperparameter Optimization Competition

pytorch implementation for PointNet

CTF Challenge for CSAW Finals 2021

A basic reminder tool written in Python.

tinykernel - A minimal Python kernel so you can run Python in your Python

CVPR 2021: "Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE"

Benchmarks for Model-Based Optimization

Official repo for AutoInt: Automatic Integration for Fast Neural Volume Rendering in CVPR 2021

The PyTorch implementation of paper REST: Debiased Social Recommendation via Reconstructing Exposure Strategies

Unofficial PyTorch implementation of Fastformer based on paper "Fastformer: Additive Attention Can Be All You Need"."