Custom implementation of Corrleation Module

Overview

PyPI

Pytorch Correlation module

this is a custom C++/Cuda implementation of Correlation module, used e.g. in FlowNetC

This tutorial was used as a basis for implementation, as well as NVIDIA's cuda code

  • Build and Install C++ and CUDA extensions by executing python setup.py install,
  • Benchmark C++ vs. CUDA by running python benchmark.py {cpu, cuda},
  • Run gradient checks on the code by running python grad_check.py --backend {cpu, cuda}.

Requirements

This module is expected to compile for Pytorch 1.6.

Installation

this module is available on pip

pip install spatial-correlation-sampler

For a cpu-only version, you can install from source with

python setup_cpu.py install

Known Problems

This module needs compatible gcc version and CUDA to be compiled. Namely, CUDA 9.1 and below will need gcc5, while CUDA 9.2 and 10.0 will need gcc7 See this issue for more information

Usage

API has a few difference with NVIDIA's module

  • output is now a 5D tensor, which reflects the shifts horizontal and vertical.
input (B x C x H x W) -> output (B x PatchH x PatchW x oH x oW)
  • Output sizes oH and oW are no longer dependant of patch size, but only of kernel size and padding
  • Patch size patch_size is now the whole patch, and not only the radii.
  • stride1 is now stride andstride2 is dilation_patch, which behave like dilated convolutions
  • equivalent max_displacement is then dilation_patch * (patch_size - 1) / 2.
  • dilation is a new parameter, it acts the same way as dilated convolution regarding the correlation kernel
  • to get the right parameters for FlowNetC, you would have
kernel_size=1
patch_size=21,
stride=1,
padding=0,
dilation=1
dilation_patch=2

Example

import torch
from spatial_correlation_sampler import SpatialCorrelationSampler, 

device = "cuda"
batch_size = 1
channel = 1
H = 10
W = 10
dtype = torch.float32

input1 = torch.randint(1, 4, (batch_size, channel, H, W), dtype=dtype, device=device, requires_grad=True)
input2 = torch.randint_like(input1, 1, 4).requires_grad_(True)

#You can either use the function or the module. Note that the module doesn't contain any parameter tensor.

#function

out = spatial_correlation_sample(input1,
	                         input2,
                                 kernel_size=3,
                                 patch_size=1,
                                 stride=2,
                                 padding=0,
                                 dilation=2,
                                 dilation_patch=1)

#module

correlation_sampler = SpatialCorrelationSampler(
    kernel_size=3,
    patch_size=1,
    stride=2,
    padding=0,
    dilation=2,
    dilation_patch=1)
out = correlation_sampler(input1, input2)

Benchmark

  • default parameters are from benchmark.py, FlowNetC parameters are same as use in FlowNetC with a batch size of 4, described in this paper, implemented here and here.
  • Feel free to file an issue to add entries to this with your hardware !

CUDA Benchmark

  • See here for a benchmark script working with NVIDIA's code, and Pytorch.
  • Benchmark are launched with environment variable CUDA_LAUNCH_BLOCKING set to 1.
  • Only float32 is benchmarked.
  • FlowNetC correlation parameters where launched with the following command:
CUDA_LAUNCH_BLOCKING=1 python benchmark.py --scale ms -k1 --patch 21 -s1 -p0 --patch_dilation 2 -b4 --height 48 --width 64 -c256 cuda -d float

CUDA_LAUNCH_BLOCKING=1 python NV_correlation_benchmark.py --scale ms -k1 --patch 21 -s1 -p0 --patch_dilation 2 -b4 --height 48 --width 64 -c256
implementation Correlation parameters device pass min time avg time
ours default 980 GTX forward 5.745 ms 5.851 ms
ours default 980 GTX backward 77.694 ms 77.957 ms
NVIDIA default 980 GTX forward 13.779 ms 13.853 ms
NVIDIA default 980 GTX backward 73.383 ms 73.708 ms
ours FlowNetC 980 GTX forward 26.102 ms 26.179 ms
ours FlowNetC 980 GTX backward 208.091 ms 208.510 ms
NVIDIA FlowNetC 980 GTX forward 35.363 ms 35.550 ms
NVIDIA FlowNetC 980 GTX backward 283.748 ms 284.346 ms

Notes

  • The overhead of our implementation regarding kernel_size > 1 during backward needs some investigation, feel free to dive in the code to improve it !
  • The backward pass of NVIDIA is not entirely correct when stride1 > 1 and kernel_size > 1, because not everything is computed, see here.

CPU Benchmark

  • No other implementation is avalaible on CPU.
  • It is obviously not recommended to run it on CPU if you have a GPU.
Correlation parameters device pass min time avg time
default E5-2630 v3 @ 2.40GHz forward 159.616 ms 188.727 ms
default E5-2630 v3 @ 2.40GHz backward 282.641 ms 294.194 ms
FlowNetC E5-2630 v3 @ 2.40GHz forward 2.138 s 2.144 s
FlowNetC E5-2630 v3 @ 2.40GHz backward 7.006 s 7.075 s
Owner
Clément Pinard
PhD ENSTA Paris, Deep Learning Engineer @ ContentSquare
Clément Pinard
Code for WSDM 2022 paper, Contrastive Learning for Representation Degeneration Problem in Sequential Recommendation.

DuoRec Code for WSDM 2022 paper, Contrastive Learning for Representation Degeneration Problem in Sequential Recommendation. Usage Download datasets fr

Qrh 46 Dec 19, 2022
Combining Latent Space and Structured Kernels for Bayesian Optimization over Combinatorial Spaces

This repository contains source code for the paper Combining Latent Space and Structured Kernels for Bayesian Optimization over Combinatorial Spaces a

9 Nov 21, 2022
DenseNet Implementation in Keras with ImageNet Pretrained Models

DenseNet-Keras with ImageNet Pretrained Models This is an Keras implementation of DenseNet with ImageNet pretrained weights. The weights are converted

Felix Yu 568 Oct 31, 2022
DeepHawkeye is a library to detect unusual patterns in images using features from pretrained neural networks

English | 简体中文 Introduction DeepHawkeye is a library to detect unusual patterns in images using features from pretrained neural networks Reference Pat

CV Newbie 28 Dec 13, 2022
Neural style in TensorFlow! 🎨

neural-style An implementation of neural style in TensorFlow. This implementation is a lot simpler than a lot of the other ones out there, thanks to T

Anish Athalye 5.5k Dec 29, 2022
Pytorch-diffusion - A basic PyTorch implementation of 'Denoising Diffusion Probabilistic Models'

PyTorch implementation of 'Denoising Diffusion Probabilistic Models' This reposi

Arthur Juliani 76 Jan 07, 2023
Hummingbird compiles trained ML models into tensor computation for faster inference.

Hummingbird Introduction Hummingbird is a library for compiling trained traditional ML models into tensor computations. Hummingbird allows users to se

Microsoft 3.1k Dec 30, 2022
KAPAO is an efficient multi-person human pose estimation model that detects keypoints and poses as objects and fuses the detections to predict human poses.

KAPAO (Keypoints and Poses as Objects) KAPAO is an efficient single-stage multi-person human pose estimation model that models keypoints and poses as

Will McNally 664 Dec 30, 2022
Official Keras Implementation for UNet++ in IEEE Transactions on Medical Imaging and DLMIA 2018

UNet++: A Nested U-Net Architecture for Medical Image Segmentation UNet++ is a new general purpose image segmentation architecture for more accurate i

Zongwei Zhou 1.8k Dec 27, 2022
Discretized Integrated Gradients for Explaining Language Models (EMNLP 2021)

Discretized Integrated Gradients for Explaining Language Models (EMNLP 2021) Overview of paths used in DIG and IG. w is the word being attributed. The

INK Lab @ USC 17 Oct 27, 2022
Facial Expression Detection In The Realtime

The human's facial expressions is very important to detect thier emotions and sentiment. It can be very efficient to use to make our computers make interviews. Furthermore, we have robots now can det

Adel El-Nabarawy 4 Mar 01, 2022
Axel - 3D printed robotic hands and they controll with Raspberry Pi and Arduino combo

Axel It's our graduation project about 3D printed robotic hands and they control

0 Feb 14, 2022
A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation models. It contains 17 different amateur subjects performing 30

Aiden Nibali 25 Jun 20, 2021
Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks

Spontaneous Facial Micro Expression Recognition using 3D Spatio-Temporal Convolutional Neural Networks Abstract Facial expression recognition in video

Bogireddy Sai Prasanna Teja Reddy 103 Dec 29, 2022
This repo is customed for VisDrone.

Object Detection for VisDrone(无人机航拍图像目标检测) My environment 1、Windows10 (Linux available) 2、tensorflow = 1.12.0 3、python3.6 (anaconda) 4、cv2 5、ensemble

53 Jul 17, 2022
[ICCV 2021] FaPN: Feature-aligned Pyramid Network for Dense Image Prediction

FaPN: Feature-aligned Pyramid Network for Dense Image Prediction [arXiv] [Project Page] @inproceedings{ huang2021fapn, title={{FaPN}: Feature-alig

Shihua Huang 23 Jul 22, 2022
Jarvis Project is a basic virtual assistant that uses TensorFlow for learning.

Jarvis_proyect Jarvis Project is a basic virtual assistant that uses TensorFlow for learning. Latest version 0.1 Features: Good morning protocol Tell

Anze Kovac 3 Aug 31, 2022
Source code for "Progressive Transformers for End-to-End Sign Language Production" (ECCV 2020)

Progressive Transformers for End-to-End Sign Language Production Source code for "Progressive Transformers for End-to-End Sign Language Production" (B

58 Dec 21, 2022
Suite of 500 procedurally-generated NLP tasks to study language model adaptability

TaskBench500 The TaskBench500 dataset and code for generating tasks. Data The TaskBench dataset is available under wget http://web.mit.edu/bzl/www/Tas

Belinda Li 20 May 17, 2022