PyTorch implementation of Wide Residual Networks with 1-bit weights by McDonnell (ICLR 2018)

Last update: Dec 07, 2022

Overview

1-bit Wide ResNet

PyTorch implementation of training 1-bit Wide ResNets from this paper:

Training wide residual networks for deployment using a single bit for each weight by Mark D. McDonnell at ICLR 2018

https://openreview.net/forum?id=rytNfI1AZ

https://arxiv.org/abs/1802.08530

The idea is very simple but surprisingly effective for training ResNets with binary weights. Here is the proposed weight parameterization as PyTorch autograd function:

class ForwardSign(torch.autograd.Function):
    @staticmethod
    def forward(ctx, w):
        return math.sqrt(2. / (w.shape[1] * w.shape[2] * w.shape[3])) * w.sign()

    @staticmethod
    def backward(ctx, g):
        return g

On forward, we take sign of the weights and scale it by He-init constant. On backward, we propagate gradient without changes. WRN-20-10 trained with such parameterization is only slightly off from it's full precision variant, here is what I got myself with this code on CIFAR-100:

network	accuracy (5 runs mean +- std)	checkpoint (Mb)
WRN-20-10	80.5 +- 0.24	205 Mb
WRN-20-10-1bit	80.0 +- 0.26	3.5 Mb

Details

Here are the differences with WRN code https://github.com/szagoruyko/wide-residual-networks:

BatchNorm has no affine weight and bias parameters
First layer has 16 * width channels
Last fc layer is removed in favor of 1x1 conv + F.avg_pool2d
Downsample is done by F.avg_pool2d + torch.cat instead of strided conv
SGD with cosine annealing and warm restarts

I used PyTorch 0.4.1 and Python 3.6 to run the code.

Reproduce WRN-20-10 with 1-bit training on CIFAR-100:

python main.py --binarize --save ./logs/WRN-20-10-1bit_$RANDOM --width 10 --dataset CIFAR100

Convergence plot (train error in dash):

I've also put 3.5 Mb checkpoint with binary weights packed with np.packbits, and a very short script to evaluate it:

python evaluate_packed.py --checkpoint wrn20-10-1bit-packed.pth.tar --width 10 --dataset CIFAR100

S3 url to checkpoint: https://s3.amazonaws.com/modelzoo-networks/wrn20-10-1bit-packed.pth.tar

PyTorch implementation of Wide Residual Networks with 1-bit weights by McDonnell (ICLR 2018)

Related tags

Overview

1-bit Wide ResNet

Details

Owner

Sergey Zagoruyko

This project aims to be a handler for input creation and running of multiple RICEWQ simulations.

LogAvgExp - Pytorch Implementation of LogAvgExp

This is the solution for 2nd rank in Kaggle competition: Feedback Prize - Evaluating Student Writing.

Oscar and VinVL

g2o: A General Framework for Graph Optimization

SEC'21: Sparse Bitmap Compression for Memory-Efficient Training onthe Edge

KoCLIP: Korean port of OpenAI CLIP, in Flax

Frequency Spectrum Augmentation Consistency for Domain Adaptive Object Detection

A script that trains a model to recognize handwritten digits using the MNIST data set.

Deep Halftoning with Reversible Binary Pattern

A collection of Jupyter notebooks to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

Self-Supervised Document-to-Document Similarity Ranking via Contextualized Language Models and Hierarchical Inference

Simple Python application to transform Serial data into OSC messages

Easy way to add GoogleMaps to Flask applications. maintainer: @getcake

Intel® Neural Compressor is an open-source Python library running on Intel CPUs and GPUs

NitroFE is a Python feature engineering engine which provides a variety of modules designed to internally save past dependent values for providing continuous calculation.

Adaptive Pyramid Context Network for Semantic Segmentation (APCNet CVPR'2019)

Unofficial pytorch implementation of 'Image Inpainting for Irregular Holes Using Partial Convolutions'

Knowledge Management for Humans using Machine Learning & Tags

Pytorch code for "DPFM: Deep Partial Functional Maps" - 3DV 2021 (Oral)