ICLR 2021 i-Mix: A Domain-Agnostic Strategy for Contrastive Representation Learning

Last update: Nov 27, 2022

Related tags

Overview

Introduction

PyTorch code for the ICLR 2021 paper [i-Mix: A Domain-Agnostic Strategy for Contrastive Representation Learning].

@inproceedings{lee2021imix,
  title={i-Mix: A Domain-Agnostic Strategy for Contrastive Representation Learning},
  author={Lee, Kibok and Zhu, Yian and Sohn, Kihyuk and Li, Chun-Liang and Shin, Jinwoo and Lee, Honglak},
  booktitle={ICLR},
  year={2021}
}

Dependencies

python 3.7.4
numpy 1.17.2
pytorch 1.4.0
torchvision 0.5.0
cudatoolkit 10.1
librosa 0.8.0 for speech_commands
PIL 6.2.0 for GaussianBlur

Data

CIFAR-10/100 will automatically be downloaded.
For ImageNet, please refer to the [PyTorch ImageNet example]. The folder structure should be like data/imagenet/train/n01440764/
For speech commands, run bash speech_commands/download_speech_commands_dataset.sh.
For tabular datasets, download [covtype.data.gz] and [HIGGS.csv.gz], and place them in data/. They are processed when first loaded.

Running scripts

Please refer to [run.sh].

Plug-in example

For those who want to apply our method in their own code, we provide a minimal example based on [MoCo]:

# mixup: somewhere in main_moco.py
def mixup(input, alpha):
    beta = torch.distributions.beta.Beta(alpha, alpha)
    randind = torch.randperm(input.shape[0], device=input.device)
    lam = beta.sample([input.shape[0]]).to(device=input.device)
    lam = torch.max(lam, 1. - lam)
    lam_expanded = lam.view([-1] + [1]*(input.dim()-1))
    output = lam_expanded * input + (1. - lam_expanded) * input[randind]
    return output, randind, lam

# cutmix: somewhere in main_moco.py
def cutmix(input, alpha):
    beta = torch.distributions.beta.Beta(alpha, alpha)
    randind = torch.randperm(input.shape[0], device=input.device)
    lam = beta.sample().to(device=input.device)
    lam = torch.max(lam, 1. - lam)
    (bbx1, bby1, bbx2, bby2), lam = rand_bbox(input.shape[-2:], lam)
    output = input.clone()
    output[..., bbx1:bbx2, bby1:bby2] = output[randind][..., bbx1:bbx2, bby1:bby2]
    return output, randind, lam

def rand_bbox(size, lam):
    W, H = size
    cut_rat = (1. - lam).sqrt()
    cut_w = (W * cut_rat).to(torch.long)
    cut_h = (H * cut_rat).to(torch.long)

    cx = torch.zeros_like(cut_w, dtype=cut_w.dtype).random_(0, W)
    cy = torch.zeros_like(cut_h, dtype=cut_h.dtype).random_(0, H)

    bbx1 = (cx - cut_w // 2).clamp(0, W)
    bby1 = (cy - cut_h // 2).clamp(0, H)
    bbx2 = (cx + cut_w // 2).clamp(0, W)
    bby2 = (cy + cut_h // 2).clamp(0, H)

    new_lam = 1. - (bbx2 - bbx1).to(lam.dtype) * (bby2 - bby1).to(lam.dtype) / (W * H)

    return (bbx1, bby1, bbx2, bby2), new_lam

# https://github.com/facebookresearch/moco/blob/master/main_moco.py#L193
criterion = nn.CrossEntropyLoss(reduction='none').cuda(args.gpu)

# https://github.com/facebookresearch/moco/blob/master/main_moco.py#L302-L303
images[0], target_aux, lam = mixup(images[0], alpha=1.)
# images[0], target_aux, lam = cutmix(images[0], alpha=1.)
target = torch.arange(images[0].shape[0], dtype=torch.long).cuda()
output, _ = model(im_q=images[0], im_k=images[1])
loss = lam * criterion(output, target) + (1. - lam) * criterion(output, target_aux)

# https://github.com/facebookresearch/moco/blob/master/moco/builder.py#L142-L149
contrast = torch.cat([k, self.queue.clone().detach().t()], dim=0)
logits = torch.mm(q, contrast.t())

Note

builder.py is adapted from [MoCo] and [PyContrast].
main_*.py is adapted from [PyTorch ImageNet example] and [Mo Co].
models/resnet.py is adapted from [torchvision].
speech_commands/ is adapted from [this repo].

ICLR 2021 i-Mix: A Domain-Agnostic Strategy for Contrastive Representation Learning

Related tags

Overview

Introduction

Dependencies

Data

Running scripts

Plug-in example

Note

Owner

Kibok Lee

graph-theoretic framework for robust pairwise data association

Create animations for the optimization trajectory of neural nets

Train a state-of-the-art yolov3 object detector from scratch!

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

This repository contains demos I made with the Transformers library by HuggingFace.

Minimal diffusion models - Minimal code and simple experiments to play with Denoising Diffusion Probabilistic Models (DDPMs)

Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch

Implementation of a Transformer using ReLA (Rectified Linear Attention)

Facial recognition project

Creating multimodal multitask models

Example scripts for the detection of lanes using the ultra fast lane detection model in ONNX.

OBG-FCN - implementation of 'Object Boundary Guided Semantic Segmentation'

mmdetection version of TinyBenchmark.

The implementation for the SportsCap (IJCV 2021)

This repository contains the source code of our work on designing efficient CNNs for computer vision

Pytorch implementation of our method for regularizing nerual radiance fields for few-shot neural volume rendering.

Driller: augmenting AFL with symbolic execution!

TVNet: Temporal Voting Network for Action Localization

AWS provides a Python SDK, "Boto3" ,which can be used to access the AWS-account from the local.

Code for the ICASSP-2021 paper: Continuous Speech Separation with Conformer.