An official repository for Paper "Uformer: A General U-Shaped Transformer for Image Restoration".

Related tags

Deep LearningUformer
Overview

Uformer: A General U-Shaped Transformer for Image Restoration

Zhendong Wang, Xiaodong Cun, Jianmin Bao and Jianzhuang Liu

PWC PWC

Paper: https://arxiv.org/abs/2106.03106

Update:

  • 2021.08.19 Release a pre-trained model(Uformer32)! Add a script for FLOP/GMAC calculation.
  • 2021.07.29 Add a script for testing the pre-trained model on the arbitrary-resolution images.

In this paper, we present Uformer, an effective and efficient Transformer-based architecture, in which we build a hierarchical encoder-decoder network using the Transformer block for image restoration. Uformer has two core designs to make it suitable for this task. The first key element is a local-enhanced window Transformer block, where we use non-overlapping window-based self-attention to reduce the computational requirement and employ the depth-wise convolution in the feed-forward network to further improve its potential for capturing local context. The second key element is that we explore three skip-connection schemes to effectively deliver information from the encoder to the decoder. Powered by these two designs, Uformer enjoys a high capability for capturing useful dependencies for image restoration. Extensive experiments on several image restoration tasks demonstrate the superiority of Uformer, including image denoising, deraining, deblurring and demoireing. We expect that our work will encourage further research to explore Transformer-based architectures for low-level vision tasks.

Uformer

Details

Package dependencies

The project is built with PyTorch 1.7.1, Python3.7, CUDA10.1. For package dependencies, you can install them by:

pip3 install -r requirements.txt

Pretrained model

Data preparation

Denoising

For training data of SIDD, you can download the SIDD-Medium dataset from the official url. Then generate training patches for training by:

python3 generate_patches_SIDD.py --src_dir ../SIDD_Medium_Srgb/Data --tar_dir ../datasets/denoising/sidd/train

For evaluation, we use the same evaluation data as here, and put it into the dir ../datasets/denoising/sidd/val.

Training

Denoising

To train Uformer32(embed_dim=32) on SIDD, we use 2 V100 GPUs and run for 250 epochs:

python3 ./train.py --arch Uformer --batch_size 32 --gpu '0,1' \
    --train_ps 128 --train_dir ../datasets/denoising/sidd/train --env 32_0705_1 \
    --val_dir ../datasets/denoising/sidd/val --embed_dim 32 --warmup

More configuration can be founded in train.sh.

Evaluation

Denoising

To evaluate Uformer32 on SIDD, you can run:

python3 ./test.py --arch Uformer --batch_size 1 --gpu '0' \
    --input_dir ../datasets/denoising/sidd/val --result_dir YOUR_RESULT_DIR \
    --weights YOUR_PRETRAINED_MODEL_PATH --embed_dim 32 

Computational Cost

We provide a simple script to calculate the flops by ourselves, a simple script has been added in model.py. You can change the configuration and run it via:

python3 model.py

The manual calculation of GMacs in this repo differs slightly from the main paper, but they do not influence the conclusion. We will correct the paper later.

Citation

If you find this project useful in your research, please consider citing:

@article{wang2021uformer,
	title={Uformer: A General U-Shaped Transformer for Image Restoration},
	author={Wang, Zhendong and Cun, Xiaodong and Bao, Jianmin and Liu, Jianzhuang},
	journal={arXiv preprint 2106.03106},
	year={2021}
}

Acknowledgement

This code borrows heavily from MIRNet and SwinTransformer.

Contact

Please contact us if there is any question or suggestion(Zhendong Wang [email protected], Xiaodong Cun [email protected]).

Owner
Zhendong Wang
Deep learning, Computer Vision, Low-level Vision, Image Generation.
Zhendong Wang
ManipNet: Neural Manipulation Synthesis with a Hand-Object Spatial Representation - SIGGRAPH 2021

ManipNet: Neural Manipulation Synthesis with a Hand-Object Spatial Representation - SIGGRAPH 2021 Dataset Code Demos Authors: He Zhang, Yuting Ye, Tak

HE ZHANG 194 Dec 06, 2022
Source code of our BMVC 2021 paper: AniFormer: Data-driven 3D Animation with Transformer

AniFormer This is the PyTorch implementation of our BMVC 2021 paper AniFormer: Data-driven 3D Animation with Transformer. Haoyu Chen, Hao Tang, Nicu S

24 Nov 02, 2022
Powerful and efficient Computer Vision Annotation Tool (CVAT)

Computer Vision Annotation Tool (CVAT) CVAT is free, online, interactive video and image annotation tool for computer vision. It is being used by our

OpenVINO Toolkit 8.6k Jan 01, 2023
Evaluation toolkit of the informative tracking benchmark comprising 9 scenarios, 180 diverse videos, and new challenges.

Informative-tracking-benchmark Informative tracking benchmark (ITB) higher diversity. It contains 9 representative scenarios and 180 diverse videos. m

Xin Li 15 Nov 26, 2022
Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging This repository contains an implementation

Computational Photography Lab @ SFU 1.1k Jan 02, 2023
GAT - Graph Attention Network (PyTorch) 💻 + graphs + 📣 = ❤️

GAT - Graph Attention Network (PyTorch) 💻 + graphs + 📣 = ❤️ This repo contains a PyTorch implementation of the original GAT paper ( 🔗 Veličković et

Aleksa Gordić 1.9k Jan 09, 2023
Using contrastive learning and OpenAI's CLIP to find good embeddings for images with lossy transformations

Creating Robust Representations from Pre-Trained Image Encoders using Contrastive Learning Sriram Ravula, Georgios Smyrnis This is the code for our pr

Sriram Ravula 26 Dec 10, 2022
This repository contains a CBIR system that uses swin transformer to extract image's feature.

Swin-transformer based CBIR This repository contains a CBIR(content-based image retrieval) system. Here we use Swin-transformer to extract query image

JsHou 12 Nov 17, 2022
A PyTorch Library for Accelerating 3D Deep Learning Research

Kaolin: A Pytorch Library for Accelerating 3D Deep Learning Research Overview NVIDIA Kaolin library provides a PyTorch API for working with a variety

NVIDIA GameWorks 3.5k Jan 07, 2023
Official respository for "Modeling Defocus-Disparity in Dual-Pixel Sensors", ICCP 2020

Official respository for "Modeling Defocus-Disparity in Dual-Pixel Sensors", ICCP 2020 BibTeX @INPROCEEDINGS{punnappurath2020modeling, author={Abhi

Abhijith Punnappurath 22 Oct 01, 2022
The 2nd place solution of 2021 google landmark retrieval on kaggle.

Leaderboard, taxonomy, and curated list of few-shot object detection papers.

229 Dec 13, 2022
Advanced Deep Learning with TensorFlow 2 and Keras (Updated for 2nd Edition)

Advanced Deep Learning with TensorFlow 2 and Keras (Updated for 2nd Edition)

Packt 1.5k Jan 03, 2023
Detecting Potentially Harmful and Protective Suicide-related Content on Twitter

TwitterSuicideML Scripts for reproducing the Machine Learning analysis of the paper: Detecting Potentially Harmful and Protective Suicide-related Cont

3 Oct 17, 2022
Pytorch implementation AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

AttnGAN Pytorch implementation for reproducing AttnGAN results in the paper AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative

Tao Xu 1.2k Dec 26, 2022
Stacs-ci - A set of modules to enable integration of STACS with commonly used CI / CD systems

Static Token And Credential Scanner CI Integrations What is it? STACS is a YARA

STACS 18 Aug 04, 2022
Image to Image translation, image generataton, few shot learning

Semi-supervised Learning for Few-shot Image-to-Image Translation [paper] Abstract: In the last few years, unpaired image-to-image translation has witn

yaxingwang 49 Nov 18, 2022
SANet: A Slice-Aware Network for Pulmonary Nodule Detection

SANet: A Slice-Aware Network for Pulmonary Nodule Detection This paper (SANet) has been accepted and early accessed in IEEE TPAMI 2021. This code and

Jie Mei 39 Dec 17, 2022
SAT: 2D Semantics Assisted Training for 3D Visual Grounding, ICCV 2021 (Oral)

SAT: 2D Semantics Assisted Training for 3D Visual Grounding SAT: 2D Semantics Assisted Training for 3D Visual Grounding by Zhengyuan Yang, Songyang Zh

Zhengyuan Yang 22 Nov 30, 2022
Official code for "Distributed Deep Learning in Open Collaborations" (NeurIPS 2021)

Distributed Deep Learning in Open Collaborations This repository contains the code for the NeurIPS 2021 paper "Distributed Deep Learning in Open Colla

Yandex Research 96 Sep 15, 2022
Code for the TPAMI paper: "Syntax Customized Video Captioning by Imitating Exemplar Sentences"

Syntax-Customized-Video-Captioning Code for the TPAMI paper: "Syntax Customized Video Captioning by Imitating Exemplar Sentences". This is my second w

3 Dec 05, 2022