🔥 Real-time Super Resolution enhancement (4x) with content loss and relativistic adversarial optimization 🔥

Overview

Super Resolution for Real Time Image Enhancement

Final Results from Validation data

Introduction

It is no suprising that adversarial training is indeed possible for super resolution tasks. The problem with pretty much all the state of the art models is that they are just not usable for native deployment, because of 100s of MBs of model size.

Also, most of the known super resolution frameworks only works on single compression method, such as bicubic, nearest, bilinear etc. Which helps the model a lot to beat the previous state of the art scores but they perform poorly on real life low resolution images because they might not belong to the same compression method for which the model was trained for.

So for me the main goal with this project wasn't just to create yet another super resolution model, but to develop a lightest model as possible which works well on any random compression methods.

With this goal in mind, I tried adopting the modern super resolution frameworks such as relativistic adversarial training, and content loss optimization (I've mainly followed the ESRGAN, with few changes in the objective function), and finally was able to create a model of size 5MB!!!

API Usage

from inference import enhance_image

enhance_image(
    lr_image, # 
   
    , # or lr_path = 
    
     ,
    
   
    sr_path, # ,
    visualize, # 
   
    size, # 
   
    ,
   
    )

CLI Usage

usage: inference.py [-h] [--lr-path LR_PATH] [--sr-path SR_PATH]

Super Resolution for Real Time Image Enhancement

optional arguments:
  -h, --help         show this help message and exit
  --lr-path LR_PATH  Path to the low resolution image.
  --sr-path SR_PATH  Output path where the enhanced image would be saved.

Model architectures

The main building block of the generator is the Residual in Residual Dense Block (RRDB), which consists of classic DenseNet module but coupled with a residual connection

Now in the original paper the authors mentioned to remove the batch normalization layer in order to remove the checkboard artifacts, but due to the extreme small size of my model, I found utilizing the batch normalization layer quite effective for both speeding up the training and better quality results.

Another change I made in the original architecture is replacing the nearest upsampling proceedure with the pixel shuffle, which helped a lot to produce highly detailed outputs given the size of the model.

The discriminator is made up of blocks of classifc convolution layer followed by batch normalization followed by leaky relu non linearity.

Relativistic Discriminator

A relativistic discriminator tries to predict the probability that a real image is relatively more realistic than a fake one.

So the discriminator and the generator are optimized to minizize these corresponding losses:

Discriminator's adversarial loss:

Generator's adversarial loss:

Perceptual Loss (Final objective for the Generator)

Original perceptual loss introduced in SRGAN paper combines the adversarial loss and the content loss obtained from the features of final convolution layers of the VGG Net.

Effectiveness of perceptual loss if found increased by constraining on features before activation rather than after activation as practiced in SRGAN.

To make the Perceptual loss more effective, I additionally added the preactivation features disparity from both shallow and deep layers, making the generator produce better results.

In addition to content loss and relativistic adversarial optimization, a simple pixel loss is also added to the generator's final objective as per the paper.

Now based on my experiments I found it really hard for the generator to produce highly detailed outputs when its also minimizing the pixel loss (I'm imputing this observation to the fact that my model is very small).

This is a bit surprising because optimizing an additional objective function which has same optima should help speeding up the training. My interpretation is since super resolution is not a one to one matching, as multiple results are there for a single low resolution patch (more on patch size below), so forcing the generator to converge to a single output would cause the generator to not produce detailed but instead the average of all those possible outputs.

So I tried reducing the pixel loss weight coefficient down to 1e-2 to 1e-4 as described in the paper, and then compared the results with the generator trained without any pixel loss, and found that pixel loss has no significant visual improvements. So given my constrained training environment (Google Colab), I decided not to utilize the pixel loss as one of the part of generator's loss.

So here's the generator's final loss:

Patch size affect

Ideally larger the patch size better the adversarial training hence better the results, since an enlarged receptive field helps both the models to capture more semantic information. Therefore the paper uses 96x96 to 192x192 as the patch size resolution, but since I was constrained to utilize Google Colab, my patch size was only 32x32 😶 , and that too with batch size of 8.

Multiple compression methods

The idea is to make the generator independent of the compression that is applied to the training dataset, so that its much more robust in real life samples.

For this I randomly applied the nearest, bilinear and bicubic compressions on all the data points in the dataset every time a batch is processed.

Validation Results after ~500 epochs

Loss type

Value
Content Loss (L1) [5th, 10th, 20th preactivation features from VGGNet] ~38.582
Style Loss (L1) [320th preactivation features from EfficientNetB4] ~1.1752
Adversarial Loss ~1.550

Visual Comparisons

Below are some of the common outputs that most of the super resolution papers compare with (not used in the training data).

Author - Rishik Mourya

Owner
Rishik Mourya
3rd year Undergrad | ML Engineer @Nevronas.ai | ML Engineer @Skylarklabs.ai | And part-time Web Developer.
Rishik Mourya
Multi-View Radar Semantic Segmentation

Multi-View Radar Semantic Segmentation Paper Multi-View Radar Semantic Segmentation, ICCV 2021. Arthur Ouaknine, Alasdair Newson, Patrick Pérez, Flore

valeo.ai 37 Oct 25, 2022
这是一个利用facenet和retinaface实现人脸识别的库,可以进行在线的人脸识别。

Facenet+Retinaface:人脸识别模型在Pytorch当中的实现 目录 注意事项 Attention 所需环境 Environment 文件下载 Download 预测步骤 How2predict 参考资料 Reference 注意事项 该库中包含了两个网络,分别是retinaface和

Bubbliiiing 102 Dec 30, 2022
An open-source Kazakh named entity recognition dataset (KazNERD), annotation guidelines, and baseline NER models.

Kazakh Named Entity Recognition This repository contains an open-source Kazakh named entity recognition dataset (KazNERD), named entity annotation gui

ISSAI 9 Dec 23, 2022
Request execution of Galaxy SARS-CoV-2 variation analysis workflows on input data you provide.

SARS-CoV-2 processing requests Request execution of Galaxy SARS-CoV-2 variation analysis workflows on input data you provide. Prerequisites This autom

useGalaxy.eu 17 Aug 13, 2022
Mmrotate - OpenMMLab Rotated Object Detection Benchmark

OpenMMLab website HOT OpenMMLab platform TRY IT OUT 📘 Documentation | 🛠️ Insta

OpenMMLab 1.2k Jan 04, 2023
A collection of educational notebooks on multi-view geometry and computer vision.

Multiview notebooks This is a collection of educational notebooks on multi-view geometry and computer vision. Subjects covered in these notebooks incl

Max 65 Dec 09, 2022
LAMDA: Label Matching Deep Domain Adaptation

LAMDA: Label Matching Deep Domain Adaptation This is the implementation of the paper LAMDA: Label Matching Deep Domain Adaptation which has been accep

Tuan Nguyen 9 Sep 06, 2022
Codebase to experiment with a hybrid Transformer that combines conditional sequence generation with regression

Regression Transformer Codebase to experiment with a hybrid Transformer that combines conditional sequence generation with regression . Development se

International Business Machines 27 Jan 05, 2023
[3DV 2021] A Dataset-Dispersion Perspective on Reconstruction Versus Recognition in Single-View 3D Reconstruction Networks

dispersion-score Official implementation of 3DV 2021 Paper A Dataset-dispersion Perspective on Reconstruction versus Recognition in Single-view 3D Rec

Yefan 7 May 28, 2022
Neural Koopman Lyapunov Control

Neural-Koopman-Lyapunov-Control Code for our paper: Neural Koopman Lyapunov Control Requirements dReal4: v4.19.02.1 PyTorch: 1.2.0 The learning framew

Vrushabh Zinage 6 Dec 24, 2022
Learning to Prompt for Continual Learning

Learning to Prompt for Continual Learning (L2P) Official Jax Implementation L2P is a novel continual learning technique which learns to dynamically pr

Google Research 207 Jan 06, 2023
Modification of convolutional neural net "UNET" for image segmentation in Keras framework

ZF_UNET_224 Pretrained Model Modification of convolutional neural net "UNET" for image segmentation in Keras framework Requirements Python 3.*, Keras

209 Nov 02, 2022
TilinGNN: Learning to Tile with Self-Supervised Graph Neural Network (SIGGRAPH 2020)

TilinGNN: Learning to Tile with Self-Supervised Graph Neural Network (SIGGRAPH 2020) About The goal of our research problem is illustrated below: give

59 Dec 09, 2022
codes for Image Inpainting with External-internal Learning and Monochromic Bottleneck

Image Inpainting with External-internal Learning and Monochromic Bottleneck This repository is for the CVPR 2021 paper: 'Image Inpainting with Externa

97 Nov 29, 2022
3D mesh stylization driven by a text input in PyTorch

Text2Mesh [Project Page] Text2Mesh is a method for text-driven stylization of a 3D mesh, as described in "Text2Mesh: Text-Driven Neural Stylization fo

Threedle (University of Chicago) 649 Dec 27, 2022
Repo for "Physion: Evaluating Physical Prediction from Vision in Humans and Machines" submission to NeurIPS 2021 (Datasets & Benchmarks track)

Physion: Evaluating Physical Prediction from Vision in Humans and Machines This repo contains code and data to reproduce the results in our paper, Phy

Cognitive Tools Lab 38 Jan 06, 2023
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.

Multimodal Deep Learning 🎆 🎆 🎆 Announcing the multimodal deep learning repository that contains implementation of various deep learning-based model

Deep Cognition and Language Research (DeCLaRe) Lab 398 Dec 30, 2022
This is the official repository for our paper: ''Pruning Self-attentions into Convolutional Layers in Single Path''.

Pruning Self-attentions into Convolutional Layers in Single Path This is the official repository for our paper: Pruning Self-attentions into Convoluti

Zhuang AI Group 77 Dec 26, 2022
Point Cloud Registration using Representative Overlapping Points.

Point Cloud Registration using Representative Overlapping Points (ROPNet) Abstract 3D point cloud registration is a fundamental task in robotics and c

ZhuLifa 36 Dec 16, 2022
Retinal vessel segmentation based on GT-UNet

Retinal vessel segmentation based on GT-UNet Introduction This project is a retinal blood vessel segmentation code based on UNet-like Group Transforme

Kent0n 27 Dec 18, 2022