an implementation of 3D Ken Burns Effect from a Single Image using PyTorch

Overview

3d-ken-burns

This is a reference implementation of 3D Ken Burns Effect from a Single Image [1] using PyTorch. Given a single input image, it animates this still image with a virtual camera scan and zoom subject to motion parallax. Should you be making use of our work, please cite our paper [1].

Paper

setup

Several functions are implemented in CUDA using CuPy, which is why CuPy is a required dependency. It can be installed using pip install cupy or alternatively using one of the provided binary packages as outlined in the CuPy repository. Please also make sure to have the CUDA_HOME environment variable configured.

In order to generate the video results, please also make sure to have pip install moviepy installed.

usage

To run it on an image and generate the 3D Ken Burns effect fully automatically, use the following command.

python autozoom.py --in ./images/doublestrike.jpg --out ./autozoom.mp4

To start the interface that allows you to manually adjust the camera path, use the following command. You can then navigate to http://localhost:8080/ and load an image using the button on the bottom right corner. Please be patient when loading an image and saving the result, there is a bit of background processing going on.

python interface.py

To run the depth estimation to obtain the raw depth estimate, use the following command. Please note that this script does not perform the depth adjustment, see #22 for information on how to add it.

python depthestim.py --in ./images/doublestrike.jpg --out ./depthestim.npy

To benchmark the depth estimation, run python benchmark-ibims.py or python benchmark-nyu.py. You can use it to easily verify that the provided implementation runs as expected.

colab

If you do not have a suitable environment to run this projects then you could give Colab a try. It allows you to run the project in the cloud, free of charge. There are several people who provide Colab notebooks that should get you started. A few that I am aware of include one from Arnaldo Gabriel, one from Vlad Alex, and one from Ahmed Harmouche.

dataset

This dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0) and may only be used for non-commercial purposes. Please see the LICENSE file for more information.

scene mode color depth normal
asdf flying 3.7 GB 1.0 GB 2.9 GB
asdf walking 3.6 GB 0.9 GB 2.7 GB
blank flying 3.2 GB 1.0 GB 2.8 GB
blank walking 3.0 GB 0.9 GB 2.7 GB
chill flying 5.4 GB 1.1 GB 10.8 GB
chill walking 5.2 GB 1.0 GB 10.5 GB
city flying 0.8 GB 0.2 GB 0.9 GB
city walking 0.7 GB 0.2 GB 0.8 GB
environment flying 1.9 GB 0.5 GB 3.5 GB
environment walking 1.8 GB 0.5 GB 3.3 GB
fort flying 5.0 GB 1.1 GB 9.2 GB
fort walking 4.9 GB 1.1 GB 9.3 GB
grass flying 1.1 GB 0.2 GB 1.9 GB
grass walking 1.1 GB 0.2 GB 1.6 GB
ice flying 1.2 GB 0.2 GB 2.1 GB
ice walking 1.2 GB 0.2 GB 2.0 GB
knights flying 0.8 GB 0.2 GB 1.0 GB
knights walking 0.8 GB 0.2 GB 0.9 GB
outpost flying 4.8 GB 1.1 GB 7.9 GB
outpost walking 4.6 GB 1.0 GB 7.4 GB
pirates flying 0.8 GB 0.2 GB 0.8 GB
pirates walking 0.7 GB 0.2 GB 0.8 GB
shooter flying 0.9 GB 0.2 GB 1.1 GB
shooter walking 0.9 GB 0.2 GB 1.0 GB
shops flying 0.2 GB 0.1 GB 0.2 GB
shops walking 0.2 GB 0.1 GB 0.2 GB
slums flying 0.5 GB 0.1 GB 0.8 GB
slums walking 0.5 GB 0.1 GB 0.7 GB
subway flying 0.5 GB 0.1 GB 0.9 GB
subway walking 0.5 GB 0.1 GB 0.9 GB
temple flying 1.7 GB 0.4 GB 3.1 GB
temple walking 1.7 GB 0.3 GB 2.8 GB
titan flying 6.2 GB 1.1 GB 11.5 GB
titan walking 6.0 GB 1.1 GB 11.3 GB
town flying 1.7 GB 0.3 GB 3.0 GB
town walking 1.8 GB 0.3 GB 3.0 GB
underland flying 5.4 GB 1.2 GB 12.1 GB
underland walking 5.1 GB 1.2 GB 11.4 GB
victorian flying 0.5 GB 0.1 GB 0.8 GB
victorian walking 0.4 GB 0.1 GB 0.7 GB
village flying 1.6 GB 0.3 GB 2.8 GB
village walking 1.6 GB 0.3 GB 2.7 GB
warehouse flying 0.9 GB 0.2 GB 1.5 GB
warehouse walking 0.8 GB 0.2 GB 1.4 GB
western flying 0.8 GB 0.2 GB 0.9 GB
western walking 0.7 GB 0.2 GB 0.8 GB

Please note that this is an updated version of the dataset that we have used in our paper. So while it has fewer scenes in total, each sample capture now has a varying focal length which should help with generalizability. Furthermore, some examples are either over- or under-exposed and it would be a good idea to remove these outliers. Please see #37, #39, and #40 for supplementary discussions.

video

Video

license

This is a project by Adobe Research. It is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License (CC BY-NC-SA 4.0) and may only be used for non-commercial purposes. Please see the LICENSE file for more information.

references

[1]  @article{Niklaus_TOG_2019,
         author = {Simon Niklaus and Long Mai and Jimei Yang and Feng Liu},
         title = {3D Ken Burns Effect from a Single Image},
         journal = {ACM Transactions on Graphics},
         volume = {38},
         number = {6},
         pages = {184:1--184:15},
         year = {2019}
     }

acknowledgment

The video above uses materials under a Creative Common license or with the owner's permission, as detailed at the end.

Owner
Simon Niklaus
Research Scientist at Adobe
Simon Niklaus
Mask-invariant Face Recognition through Template-level Knowledge Distillation

Mask-invariant Face Recognition through Template-level Knowledge Distillation This is the official repository of "Mask-invariant Face Recognition thro

Fadi Boutros 35 Dec 06, 2022
Codes for "CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation"

CSDI This is the github repository for the NeurIPS 2021 paper "CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation

106 Jan 04, 2023
A collection of random and hastily hacked together scripts for investigating EU-DCC

A collection of random and hastily hacked together scripts for investigating EU-DCC

Ryan Barrett 8 Mar 01, 2022
Credo AI Lens is a comprehensive assessment framework for AI systems. Lens standardizes model and data assessment, and acts as a central gateway to assessments created in the open source community.

Lens by Credo AI - Responsible AI Assessment Framework Lens is a comprehensive assessment framework for AI systems. Lens standardizes model and data a

Credo AI 27 Dec 14, 2022
TLXZoo - Pre-trained models based on TensorLayerX

Pre-trained models based on TensorLayerX. TensorLayerX is a multi-backend AI fra

TensorLayer Community 13 Dec 07, 2022
The repository contains reproducible PyTorch source code of our paper Generative Modeling with Optimal Transport Maps, ICLR 2022.

Generative Modeling with Optimal Transport Maps The repository contains reproducible PyTorch source code of our paper Generative Modeling with Optimal

Litu Rout 30 Dec 22, 2022
Cortex-compatible model server for Python and TensorFlow

Nucleus model server Nucleus is a model server for TensorFlow and generic Python models. It is compatible with Cortex clusters, Kubernetes clusters, a

Cortex Labs 14 Nov 27, 2022
Code for paper "Learning to Reweight Examples for Robust Deep Learning"

learning-to-reweight-examples Code for paper Learning to Reweight Examples for Robust Deep Learning. [arxiv] Environment We tested the code on tensorf

Uber Research 261 Jan 01, 2023
Code for "The Box Size Confidence Bias Harms Your Object Detector"

The Box Size Confidence Bias Harms Your Object Detector - Code Disclaimer: This repository is for research purposes only. It is designed to maintain r

Johannes G. 24 Dec 07, 2022
Sequence Modeling with Structured State Spaces

Structured State Spaces for Sequence Modeling This repository provides implementations and experiments for the following papers. S4 Efficiently Modeli

HazyResearch 896 Jan 01, 2023
SWA Object Detection

SWA Object Detection This project hosts the scripts for training SWA object detectors, as presented in our paper: @article{zhang2020swa, title={SWA

237 Nov 28, 2022
Civsim is a basic civilisation simulation and modelling system built in Python 3.8.

Civsim Introduction Civsim is a basic civilisation simulation and modelling system built in Python 3.8. It requires the following packages: perlin_noi

17 Aug 08, 2022
1st Solution For NeurIPS 2021 Competition on ML4CO Dual Task

KIDA: Knowledge Inheritance in Data Aggregation This project releases our 1st place solution on NeurIPS2021 ML4CO Dual Task. Slide and model weights a

MEGVII Research 24 Sep 08, 2022
CSPML (crystal structure prediction with machine learning-based element substitution)

CSPML (crystal structure prediction with machine learning-based element substitution) CSPML is a unique methodology for the crystal structure predicti

8 Dec 20, 2022
A-ESRGAN aims to provide better super-resolution images by using multi-scale attention U-net discriminators.

A-ESRGAN: Training Real-World Blind Super-Resolution with Attention-based U-net Discriminators The authors are hidden for the purpose of double blind

77 Dec 16, 2022
Open AI's Python library

OpenAI Python Library The OpenAI Python library provides convenient access to the OpenAI API from applications written in the Python language. It incl

Pavan Ananth Sharma 3 Jul 10, 2022
Distance-Ratio-Based Formulation for Metric Learning

Distance-Ratio-Based Formulation for Metric Learning Environment Python3 Pytorch (http://pytorch.org/) (version 1.6.0+cu101) json tqdm Preparing datas

Hyeongji Kim 1 Dec 07, 2022
A MNIST-like fashion product database. Benchmark

Fashion-MNIST Table of Contents Why we made Fashion-MNIST Get the Data Usage Benchmark Visualization Contributing Contact Citing Fashion-MNIST License

Zalando Research 10.5k Jan 08, 2023
Polynomial-time Meta-Interpretive Learning

Louise - polynomial-time Program Learning Getting help with Louise Louise's author can be reached by email at Stassa Patsantzis 64 Dec 26, 2022

AOT (Associating Objects with Transformers) in PyTorch

An efficient modular implementation of Associating Objects with Transformers for Video Object Segmentation in PyTorch

162 Dec 14, 2022