Official implementation of "Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform", ICCV 2021

Overview

Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform

Figure 2 This repository is the implementation of "Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform" (ICCV 2021). Our code is based on CompressAI.

Abstract: We propose a versatile deep image compression network based on Spatial Feature Transform (SFT), which takes a source image and a corresponding quality map as inputs and produce a compressed image with variable rates. Our model covers a wide range of compression rates using a single model, which is controlled by arbitrary pixel-wise quality maps. In addition, the proposed framework allows us to perform task-aware image compressions for various tasks, e.g., classification, by efficiently estimating optimized quality maps specific to target tasks for our encoding network. This is even possible with a pretrained network without learning separate models for individual tasks. Our algorithm achieves outstanding rate-distortion trade-off compared to the approaches based on multiple models that are optimized separately for several different target rates. At the same level of compression, the proposed approach successfully improves performance on image classification and text region quality preservation via task-aware quality map estimation without additional model training.

Installation

We tested our code in ubuntu 16.04, g++ 8.4.0, cuda 10.1, python 3.8.8, pytorch 1.7.1. A C++ 17 compiler is required to use the Range Asymmetric Numeral System implementation.

  1. Check your g++ version >= 7. If not, please update it first and make sure to use the updated version.

    • $ g++ --version
  2. Set up the python environment (Python 3.8).

  3. Install needed packages.

    • $ pip install torch==1.7.1+cu101 torchvision==0.8.2+cu101 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
    • $ pip install -r requirements.txt
    • If some errors occur in installing CompressAI, please install it yourself. It is for the entropy coder.

Dataset

  1. (Training set) Download the following files and decompress them.

    • 2014 Train images [83K/13GB]
    • 2014 Train/Val annotations [241MB]
      • instances_train2014.json
    • 2017 Train images [118K/18GB]
    • 2017 Train/Val annotations [241MB]
      • instances_train2017.json
  2. (Test set) Download Kodak dataset.

  3. Make a directory of structure as follows for the datasets.

├── your_dataset_root
    ├── coco
        |── annotations
            ├── instances_train2014.json
            └── instances_train2017.json
        ├── train2014
        └── train2017
    └── kodak
            ├── 1.png
            ├── ...
  1. Run following command in scripts directory.
    • $ ./prepare.sh your_dataset_root/coco your_dataset_root/kodak
    • trainset_coco.csv and kodak.csv will be created in data directory.

Training

Configuration

We used the same configuration as ./configs/config.yaml to train our model. You can change it as you want. We expect that larger number of training iteration will lead to the better performance.

Train

$ python train.py --config=./configs/config.yaml --name=your_instance_name
The checkpoints of the model will be saved in ./results/your_instance_name/snapshots.
Training for 2M iterations will take about 2-3 weeks on a single GPU like Titan Xp. At least 12GB GPU memory is needed for the default training setting.

Resume from a checkpoint

$ python train.py --resume=./results/your_instance_name/snapshots/your_snapshot_name.pt
By default, the original configuration of the checkpoint ./results/your_instance_name/config.yaml will be used.

Evaluation

$ python eval.py --snapshot=./results/your_instance_name/snapshots/your_snapshot_name.pt --testset=./data/kodak.csv

Final evaluation results

[ Test-1 ] Total: 0.5104 | Real BPP: 0.2362 | BPP: 0.2348 | PSNR: 29.5285 | MS-SSIM: 0.9360 | Aux: 93 | Enc Time: 0.2403s | Dec Time: 0.0356s
[ Test 0 ] Total: 0.2326 | Real BPP: 0.0912 | BPP: 0.0902 | PSNR: 27.1140 | MS-SSIM: 0.8976 | Aux: 93 | Enc Time: 0.2399s | Dec Time: 0.0345s
[ Test 1 ] Total: 0.2971 | Real BPP: 0.1187 | BPP: 0.1176 | PSNR: 27.9824 | MS-SSIM: 0.9159 | Aux: 93 | Enc Time: 0.2460s | Dec Time: 0.0347s
[ Test 2 ] Total: 0.3779 | Real BPP: 0.1559 | BPP: 0.1547 | PSNR: 28.8982 | MS-SSIM: 0.9323 | Aux: 93 | Enc Time: 0.2564s | Dec Time: 0.0370s
[ Test 3 ] Total: 0.4763 | Real BPP: 0.2058 | BPP: 0.2045 | PSNR: 29.9052 | MS-SSIM: 0.9464 | Aux: 93 | Enc Time: 0.2553s | Dec Time: 0.0359s
[ Test 4 ] Total: 0.5956 | Real BPP: 0.2712 | BPP: 0.2697 | PSNR: 30.9739 | MS-SSIM: 0.9582 | Aux: 93 | Enc Time: 0.2548s | Dec Time: 0.0354s
[ Test 5 ] Total: 0.7380 | Real BPP: 0.3558 | BPP: 0.3541 | PSNR: 32.1140 | MS-SSIM: 0.9678 | Aux: 93 | Enc Time: 0.2598s | Dec Time: 0.0358s
[ Test 6 ] Total: 0.9059 | Real BPP: 0.4567 | BPP: 0.4548 | PSNR: 33.2801 | MS-SSIM: 0.9752 | Aux: 93 | Enc Time: 0.2596s | Dec Time: 0.0361s
[ Test 7 ] Total: 1.1050 | Real BPP: 0.5802 | BPP: 0.5780 | PSNR: 34.4822 | MS-SSIM: 0.9811 | Aux: 93 | Enc Time: 0.2590s | Dec Time: 0.0364s
[ Test 8 ] Total: 1.3457 | Real BPP: 0.7121 | BPP: 0.7095 | PSNR: 35.5609 | MS-SSIM: 0.9852 | Aux: 93 | Enc Time: 0.2569s | Dec Time: 0.0367s
[ Test 9 ] Total: 1.6392 | Real BPP: 0.8620 | BPP: 0.8590 | PSNR: 36.5931 | MS-SSIM: 0.9884 | Aux: 93 | Enc Time: 0.2553s | Dec Time: 0.0371s
[ Test10 ] Total: 2.0116 | Real BPP: 1.0179 | BPP: 1.0145 | PSNR: 37.4660 | MS-SSIM: 0.9907 | Aux: 93 | Enc Time: 0.2644s | Dec Time: 0.0376s
[ Test ] Total mean: 0.8841 | Enc Time: 0.2540s | Dec Time: 0.0361s
  • [ TestN ] means to use a uniform quality map of (N/10) value for evaluation.
    • For example, in the case of [ Test8 ], a uniform quality map of 0.8 is used.
  • [ Test-1 ] means to use pre-defined non-uniform quality maps for evaluation.
  • Bpp is the theoretical average bpp calculated by the trained probability model.
  • Real Bpp is the real average bpp for the saved file including quantized latent representations and metadata.
    • All bpps reported in the paper are Real Bpp.
  • Total is the average loss value.

Classification-aware compression

Dataset

We made a test set of ImageNet dataset by sampling 102 categories and choosing 5 images per a category randomly.

  1. Prepare the original ImageNet validation set ILSVRC2012_img_val.
  2. Run following command in scripts directory.
    • $ ./prepare_imagenet.sh your_dataset_root/ILSVRC2012_img_val
    • imagenet_subset.csv will be created in data directory.

Running

$ python classification_aware.py --snapshot=./results/your_instance_name/snapshots/your_snapshot_name.pt
A result plot ./classificatoin_result.png will be generated.

Citation

@inproceedings{song2021variablerate,
  title={Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform}, 
  author={Song, Myungseo and Choi, Jinyoung and Han, Bohyung},
  booktitle={ICCV},
  year={2021}
}
Owner
Myungseo Song
Myungseo Song
Segmentation models with pretrained backbones. PyTorch.

Python library with Neural Networks for Image Segmentation based on PyTorch. The main features of this library are: High level API (just two lines to

Pavel Yakubovskiy 6.6k Jan 06, 2023
Language-Agnostic Website Embedding and Classification

Homepage2Vec Language-Agnostic Website Embedding and Classification based on Curlie labels https://arxiv.org/pdf/2201.03677.pdf Homepage2Vec is a pre-

25 Dec 27, 2022
Novel and high-performance medical image classification pipelines are heavily utilizing ensemble learning strategies

An Analysis on Ensemble Learning optimized Medical Image Classification with Deep Convolutional Neural Networks Novel and high-performance medical ima

14 Dec 18, 2022
The official PyTorch implementation for NCSNv2 (NeurIPS 2020)

Improved Techniques for Training Score-Based Generative Models This repo contains the official implementation for the paper Improved Techniques for Tr

174 Dec 26, 2022
This's an implementation of deepmind Visual Interaction Networks paper using pytorch

Visual-Interaction-Networks An implementation of Deepmind visual interaction networks in Pytorch. Introduction For the purpose of understanding the ch

Mahmoud Gamal Salem 166 Dec 06, 2022
Rethinking Transformer-based Set Prediction for Object Detection

Rethinking Transformer-based Set Prediction for Object Detection Here are the code for the ICCV paper. The code is adapted from Detectron2 and AdelaiD

Zhiqing Sun 62 Dec 03, 2022
Technical Indicators implemented in Python only using Numpy-Pandas as Magic - Very Very Fast! Very tiny! Stock Market Financial Technical Analysis Python library . Quant Trading automation or cryptocoin exchange

MyTT Technical Indicators implemented in Python only using Numpy-Pandas as Magic - Very Very Fast! to Stock Market Financial Technical Analysis Python

dev 34 Dec 27, 2022
Trajectory Extraction of road users via Traffic Camera

Traffic Monitoring Citation The associated paper for this project will be published here as soon as possible. When using this software, please cite th

Julian Strosahl 14 Dec 17, 2022
Official PyTorch implementation of Segmenter: Transformer for Semantic Segmentation

Segmenter: Transformer for Semantic Segmentation Segmenter: Transformer for Semantic Segmentation by Robin Strudel*, Ricardo Garcia*, Ivan Laptev and

594 Jan 06, 2023
An introduction to bioimage analysis - http://bioimagebook.github.io

Introduction to Bioimage Analysis This book tries explain the main ideas of image analysis in a practical and engaging way. It's written primarily for

Bioimage Book 20 Nov 28, 2022
Wikidated : An Evolving Knowledge Graph Dataset of Wikidata’s Revision History

Wikidated Wikidated 1.0 is a dataset of Wikidata’s full revision history, which encodes changes between Wikidata revisions as sets of deletions and ad

Lukas Schmelzeisen 11 Aug 16, 2022
Robot Reinforcement Learning on the Constraint Manifold

Implementation of "Robot Reinforcement Learning on the Constraint Manifold"

31 Dec 05, 2022
YOLOX-Paddle - A reproduction of YOLOX by PaddlePaddle

YOLOX-Paddle A reproduction of YOLOX by PaddlePaddle 数据集准备 下载COCO数据集,准备为如下路径 /ho

QuanHao Guo 6 Dec 18, 2022
Ensemble Visual-Inertial Odometry (EnVIO)

Ensemble Visual-Inertial Odometry (EnVIO) Authors : Jae Hyung Jung, Yeongkwon Choe, and Chan Gook Park 1. Overview This is a ROS package of Ensemble V

Jae Hyung Jung 95 Jan 03, 2023
AI4Good project for detecting waste in the environment

Detect waste AI4Good project for detecting waste in environment. www.detectwaste.ml. Our latest results were published in Waste Management journal in

108 Dec 25, 2022
Implementation of UNet on the Joey ML framework

Independent Research Project - Code Joey can be cloned from here https://github.com/devitocodes/joey/. Devito and other dependencies such as PyTorch a

Navjot Kukreja 1 Oct 21, 2021
SimulLR - PyTorch Implementation of SimulLR

PyTorch Implementation of SimulLR There is an interesting work[1] about simultan

11 Dec 22, 2022
Non-Imaging Transient Reconstruction And TEmporal Search (NITRATES)

Non-Imaging Transient Reconstruction And TEmporal Search (NITRATES) This repo contains the full NITRATES pipeline for maximum likelihood-driven discov

13 Nov 08, 2022
Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019

PoseNet of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image" Introduction This repo is official Py

Gyeongsik Moon 677 Dec 25, 2022
PaRT: Parallel Learning for Robust and Transparent AI

PaRT: Parallel Learning for Robust and Transparent AI This repository contains the code for PaRT, an algorithm for training a base network on multiple

Mahsa 0 May 02, 2022