Playable Video Generation


Playable Video Generation

Playable Video Generation
Willi Menapace, Stéphane Lathuilière, Sergey Tulyakov, Aliaksandr Siarohin, Elisa Ricci

Paper: ArXiv
Supplementary: Website
Demo: Try it Live

Abstract: This paper introduces the unsupervised learning problem of playable video generation (PVG). In PVG, we aim at allowing a user to control the generated video by selecting a discrete action at every time step as when playing a video game. The difficulty of the task lies both in learning semantically consistent actions and in generating realistic videos conditioned on the user input. We propose a novel framework for PVG that is trained in a self-supervised manner on a large dataset of unlabelled videos. We employ an encoder-decoder architecture where the predicted action labels act as bottleneck. The network is constrained to learn a rich action space using, as main driving loss, a reconstruction loss on the generated video. We demonstrate the effectiveness of the proposed approach on several datasets with wide environment variety.


Figure 1. Illustration of the proposed CADDY model for playable video generation.

Given a set of completely unlabeled videos, we jointly learn a set of discrete actions and a video generation model conditioned on the learned actions. At test time, the user can control the generated video on-the-fly providing action labels as if he or she was playing a videogame. We name our method CADDY. Our architecture for unsupervised playable video generation is composed by several components. An encoder E extracts frame representations from the input sequence. A temporal model estimates the successive states using a recurrent dynamics network R and an action network A which predicts the action label corresponding to the current action performed in the input sequence. Finally, a decoder D reconstructs the input frames. The model is trained using reconstruction as the main driving loss.


We recommend the use of Linux and of one or more CUDA compatible GPUs. We provide both a Conda environment and a Dockerfile to configure the required libraries.


The environment can be installed and activated with:

conda env create -f env.yml

conda activate video-generation


Use the Dockerfile to build the docker image:

docker build -t video-generation:1.0 .

Run the docker image mounting the root directory to /video-generation in the docker container:

docker run -it --gpus all --ipc=host -v /path/to/directory/video-generation:/video-generation video-generation:1.0 /bin/bash

Preparing Datasets


Coming soon

Atari Breakout

Download the breakout_160_ours.tar.gz archive from Google Drive and extract it under the data folder.


The Tennis dataset is automatically acquired from Youtube by running


This requires an installation of youtube-dl (Download). Please run youtube-dl -U to update the utility to the latest version. The dataset will be created at data/tennis_v4_256_ours.

Custom Datasets

Custom datasets can be created from a user-provided folder containing plain videos. Acquired video frames are sampled at the specified resolution and framerate. ffmpeg is used for the extraction and supports multiple input formats. By default only mp4 files are acquired.

python -m dataset.acquisition.convert_video_directory --video_directory --output_directory --target_size [--fps --video_extension --processes ]

As an example the following command transforms all mp4 videos in the tmp/my_videos directory into a 256x256px dataset sampled at 10fps and saves it in the data/my_videos folder python -m dataset.acquisition.convert_video_directory --video_directory tmp/my_videos --output_directory data/my_videos --target_size 256 256 --fps 10

Using Pretrained Models

Pretrained models in .pth.tar format are available for all the datasets and can be downloaded at the following link: Google Drive

Please place each directory under the checkpoints folder. Training and inference scripts automatically make use of the latest.pth.tar checkpoint when present in the checkpoints subfolder corresponding to the configuration in use.


When a latest.pth.tar checkpoint is present under the checkpoints folder corresponding to the current configuration, the model can be interactively used to generate videos with the following commands:

  • Bair: python --config configs/01_bair.yaml

  • Breakout: python configs/breakout/02_breakout.yaml

  • Tennis: python --config configs/03_tennis.yaml

A full screen window will appear and actions can be provided using number keys in the range [1, actions_count]. Number key 0 resets the generation process.

The inference process is lightweight and can be executed even in browser as in our Live Demo.


The models can be trained with the following commands:

python --config configs/

The training process generates multiple files under the results and checkpoint directories a sub directory with the name corresponding to the one specified in the configuration file. In particular, the folder under the results directory will contain an images folder showing qualitative results obtained during training. The checkpoints subfolder will contain regularly saved checkpoints and the latest.pth.tar checkpoint representing the latest model parameters.

The training can be completely monitored through Weights and Biases by running before execution of the training command: wandb init

Training the model in full resolution on our datasets required the following GPU resources:

  • BAIR: 4x2080Ti 44GB
  • Breakout: 1x2080Ti 11GB
  • Tennis: 2x2080 16GB

Lower resolution versions of the model can be trained with a single 8GB GPU.


Evaluation requires two steps. First, an evaluation dataset must be built. Second, evaluation is carried out on the evaluation dataset. To build the evaluation dataset please issue:

python --config configs/

The command creates a reconstruction of the test portion of the dataset under the results//evaluation_dataset directory. To run evaluation issue:

python --config configs/evaluation/configs/

Evaluation results are saved under the evaluation_results directory the folder specified in the configuration file with the name data.yml.

Willi Menapace
Hi, I'm Willi Menapace, Ph.D Student and passionate deep learning practitioner. Here you can find some of the projects I am allowed to publish.
Willi Menapace
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)

Bayesian Methods for Hackers Using Python and PyMC The Bayesian method is the natural approach to inference, yet it is hidden from readers behind chap

Cameron Davidson-Pilon 25.1k Jan 02, 2023


Aigege 8 Mar 31, 2022
Do Neural Networks for Segmentation Understand Insideness?

This is part of the code to reproduce the results of the paper Do Neural Networks for Segmentation Understand Insideness? [pdf] by K. Villalobos (*),

biolins 0 Mar 20, 2021
Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

PyTorch Implementation of Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers 1 Using Colab Please notic

Hila Chefer 489 Jan 07, 2023
TensorFlow (v2.7.0) benchmark results on an M1 Macbook Air 2020 laptop (macOS Monterey v12.1).

M1-tensorflow-benchmark TensorFlow (v2.7.0) benchmark results on an M1 Macbook Air 2020 laptop (macOS Monterey v12.1). I was initially testing if Tens

particle 2 Jan 05, 2022
CVPR 2020 oral paper: Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax.

Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax ⚠️ Latest: Current repo is a complete version. But we delet

FishYuLi 341 Dec 23, 2022
Learning to Reconstruct 3D Non-Cuboid Room Layout from a Single RGB Image

NonCuboidRoom Paper Learning to Reconstruct 3D Non-Cuboid Room Layout from a Single RGB Image Cheng Yang*, Jia Zheng*, Xili Dai, Rui Tang, Yi Ma, Xiao

67 Dec 15, 2022
AntiFuzz: Impeding Fuzzing Audits of Binary Executables

AntiFuzz: Impeding Fuzzing Audits of Binary Executables Get the paper here: Usage: The python scri

Chair for Sys­tems Se­cu­ri­ty 88 Dec 21, 2022
Dataset used in "PlantDoc: A Dataset for Visual Plant Disease Detection" accepted in CODS-COMAD 2020

PlantDoc: A Dataset for Visual Plant Disease Detection This repository contains the Cropped-PlantDoc dataset used for benchmarking classification mode

Pratik Kayal 109 Dec 29, 2022
Official implementation of NPMs: Neural Parametric Models for 3D Deformable Shapes - ICCV 2021

NPMs: Neural Parametric Models Project Page | Paper | ArXiv | Video NPMs: Neural Parametric Models for 3D Deformable Shapes Pablo Palafox, Aljaz Bozic

PabloPalafox 109 Nov 22, 2022
Matlab Python Heuristic Battery Opt - SMOP conversion and manual conversion

SMOP is Small Matlab and Octave to Python compiler. SMOP translates matlab to py

Tom Xu 1 Jan 12, 2022
INSPIRED: A Transparent Dialogue Dataset for Interactive Semantic Parsing

INSPIRED: A Transparent Dialogue Dataset for Interactive Semantic Parsing Existing studies on semantic parsing focus primarily on mapping a natural-la

7 Aug 22, 2022
This repo provides the source code & data of our paper "GreaseLM: Graph REASoning Enhanced Language Models"

GreaseLM: Graph REASoning Enhanced Language Models This repo provides the source code & data of our paper "GreaseLM: Graph REASoning Enhanced Language

137 Jan 02, 2023
Personalized Transfer of User Preferences for Cross-domain Recommendation (PTUPCDR)

This is the official implementation of our paper Personalized Transfer of User Preferences for Cross-domain Recommendation (PTUPCDR), which has been accepted by WSDM2022.

Yongchun Zhu 81 Dec 29, 2022
Code for NeurIPS2021 submission "A Surrogate Objective Framework for Prediction+Programming with Soft Constraints"

This repository is the code for NeurIPS 2021 submission "A Surrogate Objective Framework for Prediction+Programming with Soft Constraints". Edit 2021/

10 Dec 20, 2022
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

Master status: Development status: Package information: TPOT stands for Tree-based Pipeline Optimization Tool. Consider TPOT your Data Science Assista

Epistasis Lab at UPenn 8.9k Dec 30, 2022
Hierarchical Clustering: O(1)-Approximation for Well-Clustered Graphs

Hierarchical Clustering: O(1)-Approximation for Well-Clustered Graphs This repository contains code to accompany the paper "Hierarchical Clustering: O

3 Sep 25, 2022
Facilitating Database Tuning with Hyper-ParameterOptimization: A Comprehensive Experimental Evaluation

A Comprehensive Experimental Evaluation for Database Configuration Tuning This is the source code to the paper "Facilitating Database Tuning with Hype

DAIR Lab 9 Oct 29, 2022
Mahadi-Now - This Is Pakistani Just Now Login Tools

PAKISTANI JUST NOW LOGIN TOOLS Install apt update apt upgrade apt install python

GeoTransformer - Geometric Transformer for Fast and Robust Point Cloud Registration

Geometric Transformer for Fast and Robust Point Cloud Registration PyTorch imple

Zheng Qin 220 Jan 05, 2023