A TensorFlow Implementation of "Deep Multi-Scale Video Prediction Beyond Mean Square Error" by Mathieu, Couprie & LeCun.

Overview

Adversarial Video Generation

This project implements a generative adversarial network to predict future frames of video, as detailed in "Deep Multi-Scale Video Prediction Beyond Mean Square Error" by Mathieu, Couprie & LeCun. Their official code (using Torch) can be found here.

Adversarial generation uses two networks – a generator and a discriminator – to improve the sharpness of generated images. Given the past four frames of video, the generator learns to generate accurate predictions for the next frame. Given either a generated or a real-world image, the discriminator learns to correctly classify between generated and real. The two networks "compete," with the generator attempting to fool the discriminator into classifying its output as real. This forces the generator to create frames that are very similar to what real frames in the domain might look like.

Results and Comparison

I trained and tested my network on a dataset of frame sequences from Ms. Pac-Man. To compare adversarial training vs. non-adversarial, I trained an adversarial network for 500,000 steps on both the generator and discriminator, and I trained a non-adversarial network for 1,000,000 steps (as the non-adversarial network runs about twice as fast). Training took around 24 hours for each network, using a GTX 980TI GPU.

In the following examples, I ran the networks recursively for 64 frames. (i.e. The input to generate the first frame was [input1, input2, input3, input4], the input to generate the second frame was [input2, input3, input4, generated1], etc.). As the networks are not fed actions from the original game, they cannot predict much of the true motion (such as in which direction Ms. Pac-Man will turn). Thus, the goal is not to line up perfectly with the ground truth images, but to maintain a crisp and likely representation of the world.

The following example exhibits how quickly the non-adversarial network becomes fuzzy and loses definition of the sprites. The adversarial network exhibits this behavior to an extent, but is much better at maintaining sharp representations of at least some sprites throughout the sequence:

This example shows how the adversarial network is able to keep a sharp representation of Ms. Pac-Man around multiple turns, while the non-adversarial network fails to do so:

While the adversarial network is clearly superior in terms of sharpness and consistency over time, the non-adversarial network does generate some fun/spectacular failures:

Using the error measurements outlined in the paper (Peak Signal to Noise Ratio and Sharp Difference) did not show significant difference between adversarial and non-adversarial training. I believe this is because sequential frames from the Ms. Pac-Man dataset have no motion in the majority of pixels, while the original paper was trained on real-world video where there is motion in much of the frame. Despite this, it is clear that adversarial training produces a qualitative improvement in the sharpness of the generated frames, especially over long time spans. You can view the loss and error statistics by running tensorboard --logdir=./Results/Summaries/ from the root of this project.

Usage

  1. Clone or download this repository.
  2. Prepare your data:
  • If you want to replicate my results, you can download the Ms. Pac-Man dataset here. Put this in a directory named Data/ in the root of this project for default behavior. Otherwise, you will need to specify your data location using the options outlined in parts 3 and 4.
  • If you would like to train on your own videos, preprocess them so that they are directories of frame sequences as structured below. (Neither the names nor the image extensions matter, only the structure):
  - Test
    - Video 1
      - frame1.png
      - frame2.png
      - frame ...
      - frameN.png
    - Video ...
    - Video N
      - ...
  - Train
    - Video 1
      - frame ...
    - Video ...
    - Video N
      - frame ...
  1. Process training data:
  • The network trains on random 32x32 pixel crops of the input images, filtered to make sure that most clips have some movement in them. To process your input data into this form, run the script python process_data from the Code/ directory with the following options:
-n/--num_clips= <# clips to process for training> (Default = 5000000)
-t/--train_dir= <Directory of full training frames>
-c/--clips_dir= <Save directory for processed clips>
                (I suggest making this a hidden dir so the filesystem doesn't freeze
                 with so many files. DON'T `ls` THIS DIRECTORY!)
-o/--overwrite  (Overwrites the previous data in clips_dir)
-H/--help       (prints usage)
  • This can take a few hours to complete, depending on the number of clips you want.
  1. Train/Test:
  • If you want to plug-and-play with the Ms. Pac-Man dataset, you can download my trained models here. Load them using the -l option. (e.g. python avg_runner.py -l ./Models/Adversarial/model.ckpt-500000).
  • Train and test your network by running python avg_runner.py from the Code/ directory with the following options:
-l/--load_path=    <Relative/path/to/saved/model>
-t/--test_dir=     <Directory of test images>
-r--recursions=    <# recursive predictions to make on test>
-a/--adversarial=  <{t/f}> (Whether to use adversarial training. Default=True)
-n/--name=         <Subdirectory of ../Data/Save/*/ in which to save output of this run>
-O/--overwrite     (Overwrites all previous data for the model with this save name)
-T/--test_only     (Only runs a test step -- no training)
-H/--help          (Prints usage)
--stats_freq=      <How often to print loss/train error stats, in # steps>
--summary_freq=    <How often to save loss/error summaries, in # steps>
--img_save_freq=   <How often to save generated images, in # steps>
--test_freq=       <How often to test the model on test data, in # steps>
--model_save_freq= <How often to save the model, in # steps>

FAQs

Why don't you train on patches larger then 32x32? Why not train on the whole image?

Memory usage. Since the discriminator has fully-connected layers after the convolutions, the output of the last convolution must be flattened to connect to the first fully-connected layer. The size of this output is dependent on the input image size, and blows up really quickly (e.g. For an input size of 64x64, going from 128 feature maps to a fully connected layer with 512 nodes, you need a connection with 64 * 64 * 128 * 512 = 268,435,456 weights). Because of this, training on patches larger than 32x32 causes an out-of-memory error (at least on my machine).

Luckily, you only need the discriminator for training, and the generator network is fully convolutional, so you can test the weights you trained on 32x32 images over images of any size (which is why I'm able to do generations for the entire Ms. Pac-Man board).

Owner
Matt Cooper
I'm an entrepreneur who loves music, coding and great design. Passionate about advancing AI and creating products that bring those advancements to the world.
Matt Cooper
This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors" ([email protected])

GP-VAE This repository provides datasets and code for preprocessing, training and testing models for the paper: Diverse Text Generation via Variationa

Wanyu Du 18 Dec 29, 2022
A machine learning package for streaming data in Python. The other ancestor of River.

scikit-multiflow is a machine learning package for streaming data in Python. creme and scikit-multiflow are merging into a new project called River. W

670 Dec 30, 2022
We envision models that are pre-trained on a vast range of domain-relevant tasks to become key for molecule property prediction

We envision models that are pre-trained on a vast range of domain-relevant tasks to become key for molecule property prediction. This repository aims to give easy access to state-of-the-art pre-train

GMUM 90 Jan 08, 2023
Effect of Deep Transfer and Multi task Learning on Sperm Abnormality Detection

Effect of Deep Transfer and Multi task Learning on Sperm Abnormality Detection Introduction This repository includes codes and models of "Effect of De

Amir Abbasi 5 Sep 05, 2022
Keep CALM and Improve Visual Feature Attribution

Keep CALM and Improve Visual Feature Attribution Jae Myung Kim1*, Junsuk Choe1*, Zeynep Akata2, Seong Joon Oh1† * Equal contribution † Corresponding a

NAVER AI 90 Dec 07, 2022
A self-supervised 3D representation learning framework named viewpoint bottleneck.

Pointly-supervised 3D Scene Parsing with Viewpoint Bottleneck Paper Created by Liyi Luo, Beiwen Tian, Hao Zhao and Guyue Zhou from Institute for AI In

63 Aug 11, 2022
Multi-Joint dynamics with Contact. A general purpose physics simulator.

MuJoCo Physics MuJoCo stands for Multi-Joint dynamics with Contact. It is a general purpose physics engine that aims to facilitate research and develo

DeepMind 5.2k Jan 02, 2023
PyTorch implementation of the end-to-end coreference resolution model with different higher-order inference methods.

End-to-End Coreference Resolution with Different Higher-Order Inference Methods This repository contains the implementation of the paper: Revealing th

Liyan 52 Jan 04, 2023
Semantically Contrastive Learning for Low-light Image Enhancement

Semantically Contrastive Learning for Low-light Image Enhancement Here, we propose an effective semantically contrastive learning paradigm for Low-lig

48 Dec 16, 2022
Pytorch implementation of TailCalibX : Feature Generation for Long-tail Classification

TailCalibX : Feature Generation for Long-tail Classification by Rahul Vigneswaran, Marc T. Law, Vineeth N. Balasubramanian, Makarand Tapaswi [arXiv] [

Rahul Vigneswaran 34 Jan 02, 2023
3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks

3D Multi-Person Pose Estimation by Integrating Top-Down and Bottom-Up Networks Introduction This repository contains the code and models for the follo

124 Jan 06, 2023
Pytorch code for "Text-Independent Speaker Verification Using 3D Convolutional Neural Networks".

:speaker: Deep Learning & 3D Convolutional Neural Networks for Speaker Verification

Amirsina Torfi 114 Dec 18, 2022
[IROS'21] SurRoL: An Open-source Reinforcement Learning Centered and dVRK Compatible Platform for Surgical Robot Learning

SurRoL IROS 2021 SurRoL: An Open-source Reinforcement Learning Centered and dVRK Compatible Platform for Surgical Robot Learning Features dVRK compati

<a href=[email protected]"> 55 Jan 03, 2023
Semantic Segmentation for Aerial Imagery using Convolutional Neural Network

This repo has been deprecated because whole things are re-implemented by using Chainer and I did refactoring for many codes. So please check this newe

Shunta Saito 27 Sep 23, 2022
Where2Act: From Pixels to Actions for Articulated 3D Objects

Where2Act: From Pixels to Actions for Articulated 3D Objects The Proposed Where2Act Task. Given as input an articulated 3D object, we learn to propose

Kaichun Mo 69 Nov 28, 2022
Implementation of neural class expression synthesizers

NCES Implementation of neural class expression synthesizers (NCES) Installation Clone this repository: https://github.com/ConceptLengthLearner/NCES.gi

NeuralConceptSynthesis 0 Jan 06, 2022
Official Implementation of Few-shot Visual Relationship Co-localization

VRC Official implementation of the Few-shot Visual Relationship Co-localization (ICCV 2021) paper project page | paper Requirements Use python = 3.8.

22 Oct 13, 2022
Personalized Federated Learning using Pytorch (pFedMe)

Personalized Federated Learning with Moreau Envelopes (NeurIPS 2020) This repository implements all experiments in the paper Personalized Federated Le

Charlie Dinh 226 Dec 30, 2022
ML-Decoder: Scalable and Versatile Classification Head

ML-Decoder: Scalable and Versatile Classification Head Paper Official PyTorch Implementation Tal Ridnik, Gilad Sharir, Avi Ben-Cohen, Emanuel Ben-Baru

189 Jan 04, 2023
Comp445 project - Data Communications & Computer Networks

COMP-445 Data Communications & Computer Networks Change Python version in Conda

Peng Zhao 2 Oct 03, 2022