subpixel: A subpixel convnet for super resolution with Tensorflow

Related tags

Deep Learningsubpixel
Overview

subpixel: A subpixel convolutional neural network implementation with Tensorflow

Left: input images / Right: output images with 4x super-resolution after 6 epochs:

See more examples inside the images folder.

In CVPR 2016 Shi et. al. from Twitter VX (previously Magic Pony) published a paper called Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network [1]. Here we propose a reimplementation of their method and discuss future applications of the technology.

But first let us discuss some background.

Convolutions, transposed convolutions and subpixel convolutions

Convolutional neural networks (CNN) are now standard neural network layers for computer vision. Transposed convolutions (sometimes referred to as deconvolution) are the GRADIENTS of a convolutional layer. Transposed convolutions were, as far as we know first used by Zeiler and Fergus [2] for visualization purposes while improving their AlexNet model.

For visualization purposes let us check out that convolutions in the present subject are a sequence of inner product of a given filter (or kernel) with pieces of a larger image. This operation is highly parallelizable, since the kernel is the same throughout the image. People used to refer to convolutions as locally connected layers with shared parameters. Checkout the figure bellow by Dumoulin and Visin [3]:

source

Note though that convolutional neural networks can be defined with strides or we can follow the convolution with maxpooling to downsample the input image. The equivalent backward operation of a convolution with strides, in other words its gradient, is an upsampling operation, where zeros a filled in between non-zeros pixels followed by a convolution with the kernel rotated 180 degrees. See representation copied from Dumoulin and Visin again:

source

For classification purposes, all that we need is the feedforward pass of a convolutional neural network to extract features at different scales. But for applications such as image super resolution and autoencoders, both downsampling and upsampling operations are necessary in a feedforward pass. The community took inspiration on how the gradients are implemented in CNNs and applied them as a feedforward layer instead.

But as one may have observed the upsampling operation as implemented above with strided convolution gradients adds zero values to the upscale the image, that have to be later filled in with meaningful values. Maybe even worse, these zero values have no gradient information that can be backpropagated through.

To cope with that problem, Shi et. al [1] proposed what we argue to be one the most useful recent convnet tricks (at least in my opinion as a generative model researcher!) They proposed a subpixel convolutional neural network layer for upscaling. This layer essentially uses regular convolutional layers followed by a specific type of image reshaping called a phase shift. In other words, instead of putting zeros in between pixels and having to do extra computation, they calculate more convolutions in lower resolution and resize the resulting map into an upscaled image. This way, no meaningless zeros are necessary. Checkout the figure below from their paper. Follow the colors to have an intuition about how they do the image resizing. Check this paper for further understanding.

source

Next we will discuss our implementation of this method and later what we foresee to be the implications of it everywhere where upscaling in convolutional neural networks was necessary.

Subpixel CNN layer

Following Shi et. al. the equation for implementing the phase shift for CNNs is:

source

In numpy, we can write this as

def PS(I, r):
  assert len(I.shape) == 3
  assert r>0
  r = int(r)
  O = np.zeros((I.shape[0]*r, I.shape[1]*r, I.shape[2]/(r*2)))
  for x in range(O.shape[0]):
    for y in range(O.shape[1]):
      for c in range(O.shape[2]):
        c += 1
        a = np.floor(x/r).astype("int")
        b = np.floor(y/r).astype("int")
        d = c*r*(y%r) + c*(x%r)
        print a, b, d
        O[x, y, c-1] = I[a, b, d]
  return O

To implement this in Tensorflow we would have to create a custom operator and its equivalent gradient. But after staring for a few minutes in the image depiction of the resulting operation we noticed how to write that using just regular reshape, split and concatenate operations. To understand that note that phase shift simply goes through different channels of the output convolutional map and builds up neighborhoods of r x r pixels. And we can do the same with a few lines of Tensorflow code as:

def _phase_shift(I, r):
    # Helper function with main phase shift operation
    bsize, a, b, c = I.get_shape().as_list()
    X = tf.reshape(I, (bsize, a, b, r, r))
    X = tf.transpose(X, (0, 1, 2, 4, 3))  # bsize, a, b, 1, 1
    X = tf.split(1, a, X)  # a, [bsize, b, r, r]
    X = tf.concat(2, [tf.squeeze(x) for x in X])  # bsize, b, a*r, r
    X = tf.split(1, b, X)  # b, [bsize, a*r, r]
    X = tf.concat(2, [tf.squeeze(x) for x in X])  #
    bsize, a*r, b*r
    return tf.reshape(X, (bsize, a*r, b*r, 1))

def PS(X, r, color=False):
  # Main OP that you can arbitrarily use in you tensorflow code
  if color:
    Xc = tf.split(3, 3, X)
    X = tf.concat(3, [_phase_shift(x, r) for x in Xc])
  else:
    X = _phase_shift(X, r)
  return X

The reminder of this library is an implementation of a subpixel CNN using the proposed PS implementation for super resolution of celeb-A image faces. The code was written on top of carpedm20/DCGAN-tensorflow, as so, follow the same instructions to use it:

$ python download.py --dataset celebA  # if this doesn't work, you will have to download the dataset by hand somewhere else
$ python main.py --dataset celebA --is_train True --is_crop True

Subpixel CNN future is bright

Here we want to forecast that subpixel CNNs are going to ultimately replace transposed convolutions (deconv, conv grad, or whatever you call it) in feedforward neural networks. Phase shift's gradient is much more meaningful and resizing operations are virtually free computationally. Our implementation is a high level one, using default Tensorflow OPs. But next we will rewrite everything with Keras so that an even larger community can use it. Plus, a cuda backend level implementation would be even more appreciated.

But for now we want to encourage the community to experiment replacing deconv layers with subpixel operatinos everywhere. By everywhere we mean:

  • Conv-deconv autoencoders
    Similar to super-resolution, include subpixel in other autoencoder implementations, replace deconv layers
  • Style transfer networks
    This didn't work in a lazy plug and play in our experiments. We have to look more carefully
  • Deep Convolutional Autoencoders (DCGAN)
    We started doing this, but as predicted we have to change hyperparameters. The network power is totally different from deconv layers.
  • Segmentation Networks (SegNets)
    ULTRA LOW hanging fruit! This one will be the easiest. Free paper, you're welcome!
  • wherever upscaling is done with zero padding

Join us in the revolution to get rid of meaningless zeros in feedfoward convnets, give suggestions here, try our code!

Sample results

The top row is the input, the middle row is the output, and the bottom row is the ground truth.

by @dribnet

References

[1] Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. By Shi et. al.
[2] Visualizing and Understanding Convolutional Networks. By Zeiler and Fergus.
[3] A guide to convolution arithmetic for deep learning. By Dumoulin and Visin.

Further reading

Alex J. Champandard made a really interesting analysis of this topic in this thread.
For discussions about differences between phase shift and straight up resize please see the companion notebook and this thread.

Owner
Atrium LTS
Atrium LTS
A simple software for capturing human body movements using the Kinect camera.

KinectMotionCapture A simple software for capturing human body movements using the Kinect camera. The software can seamlessly save joints and bones po

Aleksander Palkowski 5 Aug 13, 2022
Framework for estimating the structures and parameters of Bayesian networks (DAGs) at per-sample resolution

Sample-specific Bayesian Networks A framework for estimating the structures and parameters of Bayesian networks (DAGs) at per-sample or per-patient re

Caleb Ellington 1 Sep 23, 2022
nn_builder lets you build neural networks with less boilerplate code

nn_builder lets you build neural networks with less boilerplate code. You specify the type of network you want and it builds it. Install pip install n

Petros Christodoulou 157 Nov 20, 2022
[ICCV 2021 Oral] Just Ask: Learning to Answer Questions from Millions of Narrated Videos

Just Ask: Learning to Answer Questions from Millions of Narrated Videos Webpage • Demo • Paper This repository provides the code for our paper, includ

Antoine Yang 87 Jan 05, 2023
Training DALL-E with volunteers from all over the Internet using hivemind and dalle-pytorch (NeurIPS 2021 demo)

Training DALL-E with volunteers from all over the Internet This repository is a part of the NeurIPS 2021 demonstration "Training Transformers Together

<a href=[email protected]"> 19 Dec 13, 2022
A light-weight image labelling tool for Python designed for creating segmentation data sets.

An image labelling tool for creating segmentation data sets, for Django and Flask.

117 Nov 21, 2022
PyTorch implementation for the paper Visual Representation Learning with Self-Supervised Attention for Low-Label High-Data Regime

Visual Representation Learning with Self-Supervised Attention for Low-Label High-Data Regime Created by Prarthana Bhattacharyya. Disclaimer: This is n

Prarthana Bhattacharyya 5 Nov 08, 2022
1st ranked 'driver careless behavior detection' for AI Online Competition 2021, hosted by MSIT Korea.

2021AICompetition-03 본 repo 는 mAy-I Inc. 팀으로 참가한 2021 인공지능 온라인 경진대회 중 [이미지] 운전 사고 예방을 위한 운전자 부주의 행동 검출 모델] 태스크 수행을 위한 레포지토리입니다. mAy-I 는 과학기술정보통신부가 주최하

Junhyuk Park 9 Dec 01, 2022
Code for our CVPR 2021 paper "MetaCam+DSCE"

Joint Noise-Tolerant Learning and Meta Camera Shift Adaptation for Unsupervised Person Re-Identification (CVPR'21) Introduction Code for our CVPR 2021

FlyingRoastDuck 59 Oct 31, 2022
A Simple Example for Imitation Learning with Dataset Aggregation (DAGGER) on Torcs Env

Imitation Learning with Dataset Aggregation (DAGGER) on Torcs Env This repository implements a simple algorithm for imitation learning: DAGGER. In thi

Hao 66 Nov 23, 2022
PyTorch implementation of TSception V2 using DEAP dataset

TSception This is the PyTorch implementation of TSception V2 using DEAP dataset in our paper: Yi Ding, Neethu Robinson, Su Zhang, Qiuhao Zeng, Cuntai

Yi Ding 27 Dec 15, 2022
A Next Generation ConvNet by FaceBookResearch Implementation in PyTorch(Original) and TensorFlow.

ConvNeXt A Next Generation ConvNet by FaceBookResearch Implementation in PyTorch(Original) and TensorFlow. A FacebookResearch Implementation on A Conv

Raghvender 2 Feb 14, 2022
GNN-based Recommendation Benchmark

GRecX A Fair Benchmark for GNN-based Recommendation Homepage and Documentation Homepage: Documentation: Paper: GRecX: An Efficient and Unified Benchma

73 Oct 17, 2022
Docker containers of baseline agents for the Crafter environment

Crafter Baselines This repository contains Docker containers for running various baselines on the Crafter environment. Reward Agents DreamerV2 based o

Danijar Hafner 17 Sep 25, 2022
Code to reproduce the experiments from our NeurIPS 2021 paper " The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective"

Code To run: python runner.py new --save SAVE_NAME --data PATH_TO_DATA_DIR --dataset DATASET --model model_name [options] --n 1000 - train - t

Geoff Pleiss 5 Dec 12, 2022
A Gura parser implementation for Python

Gura Python parser This repository contains the implementation of a Gura (compliant with version 1.0.0) format parser in Python. Installation pip inst

Gura Config Lang 19 Jan 25, 2022
✨✨✨An awesome open source toolbox for stereo matching.

OpenStereo This is an awesome open source toolbox for stereo matching. Supported Methods: BM SGM(T-PAMI'07) GCNet(ICCV'17) PSMNet(CVPR'18) StereoNet(E

Wang Qingyu 6 Nov 04, 2022
A transformer model to predict pathogenic mutations

MutFormer MutFormer is an application of the BERT (Bidirectional Encoder Representations from Transformers) NLP (Natural Language Processing) model wi

Wang Genomics Lab 2 Nov 29, 2022
Implementation of Sequence Generative Adversarial Nets with Policy Gradient

SeqGAN Requirements: Tensorflow r1.0.1 Python 2.7 CUDA 7.5+ (For GPU) Introduction Apply Generative Adversarial Nets to generating sequences of discre

Lantao Yu 2k Dec 29, 2022
A clear, concise, simple yet powerful and efficient API for deep learning.

The Gluon API Specification The Gluon API specification is an effort to improve speed, flexibility, and accessibility of deep learning technology for

Gluon API 2.3k Dec 17, 2022