Implementation of "A MLP-like Architecture for Dense Prediction"

Related tags

Deep LearningCycleMLP
Overview

A MLP-like Architecture for Dense Prediction (arXiv)

License: MIT Python 3.8

    

Updates

  • (22/07/2021) Initial release.

Model Zoo

We provide CycleMLP models pretrained on ImageNet 2012.

Model Parameters FLOPs Top 1 Acc. Download
CycleMLP-B1 15M 2.1G 78.9% model
CycleMLP-B2 27M 3.9G 81.6% model
CycleMLP-B3 38M 6.9G 82.4% model
CycleMLP-B4 52M 10.1G 83.0% model
CycleMLP-B5 76M 12.3G 83.2% model

Usage

Install

  • PyTorch 1.7.0+ and torchvision 0.8.1+
  • timm:
pip install 'git+https://github.com/rwightman/[email protected]'

or

git clone https://github.com/rwightman/pytorch-image-models
cd pytorch-image-models
git checkout c2ba229d995c33aaaf20e00a5686b4dc857044be
pip install -e .
  • fvcore (optional, for FLOPs calculation)
  • mmcv, mmdetection, mmsegmentation (optional)

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is:

│path/to/imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

Evaluation

To evaluate a pre-trained CycleMLP-B5 on ImageNet val with a single GPU run:

python main.py --eval --model CycleMLP_B5 --resume path/to/CycleMLP_B5.pth --data-path /path/to/imagenet

Training

To train CycleMLP-B5 on ImageNet on a single node with 8 gpus for 300 epochs run:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --model CycleMLP_B5 --batch-size 128 --data-path /path/to/imagenet --output_dir /path/to/save

Acknowledgement

This code is based on DeiT and pytorch-image-models. Thanks for their wonderful works

Citing

@article{chen2021cyclemlp,
  title={CycleMLP: A MLP-like Architecture for Dense Prediction},
  author={Chen, Shoufa and Xie, Enze and Ge, Chongjian and Liang, Ding and Luo, Ping},
  journal={arXiv preprint arXiv:2107.10224},
  year={2021}
}

License

CycleMLP is released under MIT License.

Comments
  • detection result

    detection result

    Applying PVT detection framework, I tried a CycleMLP-B1 based detector with RetinaNet 1x. I got AP=27.1, fairly inferior to the reported 38.6. Could you give some advices to reproduce the reported result?

    The specific configure is as follows

    base = [ 'base/models/retinanet_r50_fpn.py', 'base/datasets/coco_detection.py', 'base/schedules/schedule_1x.py', 'base/default_runtime.py' ] #optimizer model = dict( pretrained='./pretrained/CycleMLP_B1.pth', backbone=dict( type='CycleMLP_B1_feat', style='pytorch'), neck=dict( type='FPN', in_channels=[64, 128, 320, 512], out_channels=256, start_level=1, add_extra_convs='on_input', num_outs=5)) #optimizer optimizer = dict(delete=True, type='AdamW', lr=0.0001, weight_decay=0.0001) optimizer_config = dict(grad_clip=None)

    find_unused_parameters = True

    opened by mountain111 6
  • Compiling CycleMLP

    Compiling CycleMLP

    Thank you for this great repo and interesting paper.

    I tried compiling CycleMLP to onnx and not surpassingly the process failed since CycleMLP include dynamic offset creation in https://github.com/ShoufaChen/CycleMLP/blob/main/cycle_mlp.py#L132 and as such cannot be converted to a frozen graph. Were you able to convert CycleMLP to onnx or any other frozen graph framework?

    Thanks in advance.

    opened by shairoz-deci 6
  • Questions about offset calculation

    Questions about offset calculation

    Hi, thanks for your wonderful work.

    I'm currently studying your work, and come up with some question about the offset calculations.

    I understood the offset calculation mentioned on the paper, but can't understand about how generated offset is being used in the code.

    For ex) if $S_H \times S_W : 3 \times 1$; I understood how the offset is applied in this figure 스크린샷 2022-06-13 오후 9 18 20

    by calculate like this: 스크린샷 2022-06-13 오후 9 19 57

    However, when I run the offset generating code, I can't figure out how this offset is being used in deform_conv2d 스크린샷 2022-06-13 오후 9 21 57

    Can you provide more detailed information about this??

    And also, the paper contains how $S_H \times S_W: 3 \times 3$ works, but in the code, it seems like either one ofkernel_size[0] or kernel_size[1] has to be 1. So, if I want to use $S_H \times S_W : 3 \times 3$, do I have to make $3 \times 1$ and $1 \times 3$ offsets and add those together?

    Thank you again for your work. I really learned a lot.

    opened by tae-mo 5
  • Example of CycleMLP Configuration for Dense Prediction

    Example of CycleMLP Configuration for Dense Prediction

    Hello.

    First of all, thank you for curating this interesting work. I was wondering, are there any working examples of how I can use CycleMLP for dense prediction while maintaining the original input size (e.g., predict a 0 or 1 value for each pixel in an input image)? In addition, I am interested in only a single ("annotated") output image, although I noticed the model definitions given in this repository output multiple downsampled versions of the original input image. Any thoughts on this?

    Thank you in advance for your time.

    opened by amorehead 2
  • Swin-B vs CycleMLP-B on image classification

    Swin-B vs CycleMLP-B on image classification

    For classificaion on ImageNet-1k, the acuracy of Swin-B is 83.5, which is 0.1 higher than the proposed CycleMLP-B. But, in this paper, the authors reprot that the accuracy of Swin-B is 83.3, which is 0.1 lower than the proposed CycleMLP-B. Why are these accuracies different?

    opened by hkzhang91 1
  • question about the offset

    question about the offset

    Thanks for your work!

    The implementation of this code inspired me. But the calculation of offset here is confusing. Although this issue (https://github.com/ShoufaChen/CycleMLP/issues/10) has asked similar questions, I haven't found a reasonable explanation.

    https://github.com/ShoufaChen/CycleMLP/blob/2f76a1f6e3cc6672143fdac46e3db5f9a7341253/cycle_mlp.py#L127-L136

    kernel_size = (1, 3)
    start_idx = (kernel_size[0] * kernel_size[1]) // 2
    for i in range(num_channels):
        offset[0, 2 * i + 0, 0, 0] = 0
        # relative offset
        offset[0, 2 * i + 1, 0, 0] = (i + start_idx) % kernel_size[1] - (kernel_size[1] // 2)
    offset.reshape(num_channels, 2)
    
    tensor([[ 0.,  0.],
            [ 0.,  1.],
            [ 0., -1.],
            [ 0.,  0.],
            [ 0.,  1.],
            [ 0., -1.]])
    

    the results are different with the figure in paper:

    image

    Some codes for verification:

    import torch
    from torchvision.ops import deform_conv2d
    
    num_channels = 6
    
    data = torch.arange(1, 6).reshape(1, 1, 1, 5).expand(-1, num_channels, -1, -1)
    data
    """
    tensor([[[[1, 2, 3, 4, 5]],
             [[1, 2, 3, 4, 5]],
             [[1, 2, 3, 4, 5]],
             [[1, 2, 3, 4, 5]],
             [[1, 2, 3, 4, 5]],
             [[1, 2, 3, 4, 5]]]])
    """
    
    weight = torch.eye(num_channels).reshape(num_channels, num_channels, 1, 1)
    weight.reshape(num_channels, num_channels)
    """
    tensor([[1., 0., 0., 0., 0., 0.],
            [0., 1., 0., 0., 0., 0.],
            [0., 0., 1., 0., 0., 0.],
            [0., 0., 0., 1., 0., 0.],
            [0., 0., 0., 0., 1., 0.],
            [0., 0., 0., 0., 0., 1.]])
    """
    
    offset = torch.empty(1, 2 * num_channels * 1 * 1, 1, 1)
    kernel_size = (1, 3)
    start_idx = (kernel_size[0] * kernel_size[1]) // 2
    for i in range(num_channels):
        offset[0, 2 * i + 0, 0, 0] = 0
        # relative offset
        offset[0, 2 * i + 1, 0, 0] = (
            (i + start_idx) % kernel_size[1] - (kernel_size[1] // 2)
        )
    offset.reshape(num_channels, 2)
    """
    tensor([[ 0.,  0.],
            [ 0.,  1.],
            [ 0., -1.],
            [ 0.,  0.],
            [ 0.,  1.],
            [ 0., -1.]])
    """
    
    deform_conv2d(
        data.float(), 
        offset=offset.expand(-1, -1, -1, 5).float(), 
        weight=weight.float(), 
        bias=None,
    )
    """
    tensor([[[[1., 2., 3., 4., 5.]],
             [[2., 3., 4., 5., 0.]],
             [[0., 1., 2., 3., 4.]],
             [[1., 2., 3., 4., 5.]],
             [[2., 3., 4., 5., 0.]],
             [[0., 1., 2., 3., 4.]]]])
    """
    
    opened by lartpang 1
  • question about the offset

    question about the offset

    Hi, thank you very much for your excellent work. In Fig.4 of your paper, you show the pseudo-kernel when kernel size is 1x3. But I when I find that function "gen_offset" does not generate the same offset as Fig.4. The offset it generates is "0,1,0,-1,0,0,0,1..." instead of "0,1,0,-1,0,1,0,-1', which is shown in Fig.4. So could you please tell me the reason? image image

    opened by linjing7 1
  • About

    About "crop_pct"

    Hi, thanks for your great work and code. I wonder the parameter crop_pct actually works in which part of code. When I go throught the timm, I can't find out how this crop_pct is loaded.

    opened by ggjy 1
  • How to deploy CycleMLP-T for training?

    How to deploy CycleMLP-T for training?

    Thank you very much for such a wonderful work!

    After learning the cycle_mlp source code in the repository, I am very confused to deploy CycleMLP Block based on Swin Transformer. Is it convenient for you to release swin-based CycleMLP? Looking forward to your reply, Thanks!

    opened by Pak287 0
Owner
Shoufa Chen
Shoufa Chen
This is a Pytorch implementation of paper: DropEdge: Towards Deep Graph Convolutional Networks on Node Classification

DropEdge: Towards Deep Graph Convolutional Networks on Node Classification This is a Pytorch implementation of paper: DropEdge: Towards Deep Graph Con

401 Dec 16, 2022
Stitch it in Time: GAN-Based Facial Editing of Real Videos

STIT - Stitch it in Time [Project Page] Stitch it in Time: GAN-Based Facial Edit

1.1k Jan 04, 2023
Shuwa Gesture Toolkit is a framework that detects and classifies arbitrary gestures in short videos

Shuwa Gesture Toolkit is a framework that detects and classifies arbitrary gestures in short videos

Google 89 Dec 22, 2022
Degree-Quant: Quantization-Aware Training for Graph Neural Networks.

Degree-Quant This repo provides a clean re-implementation of the code associated with the paper Degree-Quant: Quantization-Aware Training for Graph Ne

35 Oct 07, 2022
Fuzzing the Kernel Using Unicornafl and AFL++

Unicorefuzz Fuzzing the Kernel using UnicornAFL and AFL++. For details, skim through the WOOT paper or watch this talk at CCCamp19. Is it any good? ye

Security in Telecommunications 283 Dec 26, 2022
[ICCV 2021] FaPN: Feature-aligned Pyramid Network for Dense Image Prediction

FaPN: Feature-aligned Pyramid Network for Dense Image Prediction [arXiv] [Project Page] @inproceedings{ huang2021fapn, title={{FaPN}: Feature-alig

EMI-Group 175 Dec 30, 2022
Ranger deep learning optimizer rewrite to use newest components

Ranger21 - integrating the latest deep learning components into a single optimizer Ranger deep learning optimizer rewrite to use newest components Ran

Less Wright 266 Dec 28, 2022
GPU implementation of $k$-Nearest Neighbors and Shared-Nearest Neighbors

GPU implementation of kNN and SNN GPU implementation of $k$-Nearest Neighbors and Shared-Nearest Neighbors Supported by numba cuda and faiss library E

Hyeon Jeon 7 Nov 23, 2022
SmallInitEmb - LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence

SmallInitEmb LayerNorm(SmallInit(Embedding)) in a Transformer I find that when t

PENG Bo 11 Dec 25, 2022
Computing Shapley values using VAEAC

Shapley values and the VAEAC method In this GitHub repository, we present the implementation of the VAEAC approach from our paper "Using Shapley Value

3 Nov 23, 2022
Implementation of the Paper: "Parameterized Hypercomplex Graph Neural Networks for Graph Classification" by Tuan Le, Marco Bertolini, Frank Noé and Djork-Arné Clevert

Parameterized Hypercomplex Graph Neural Networks (PHC-GNNs) PHC-GNNs (Le et al., 2021): https://arxiv.org/abs/2103.16584 PHM Linear Layer Illustration

Bayer AG 26 Aug 11, 2022
Tutorial on scikit-learn and IPython for parallel machine learning

Parallel Machine Learning with scikit-learn and IPython Video recording of this tutorial given at PyCon in 2013. The tutorial material has been rearra

Olivier Grisel 1.6k Dec 26, 2022
Implementation of FitVid video prediction model in JAX/Flax.

FitVid Video Prediction Model Implementation of FitVid video prediction model in JAX/Flax. If you find this code useful, please cite it in your paper:

Google Research 62 Nov 25, 2022
Library extending Jupyter notebooks to integrate with Apache TinkerPop and RDF SPARQL.

Graph Notebook: easily query and visualize graphs The graph notebook provides an easy way to interact with graph databases using Jupyter notebooks. Us

Amazon Web Services 501 Dec 28, 2022
TensorFlow code for the neural network presented in the paper: "Structural Language Models of Code" (ICML'2020)

SLM: Structural Language Models of Code This is an official implementation of the model described in: "Structural Language Models of Code" [PDF] To ap

73 Nov 06, 2022
A map update dataset and benchmark

MUNO21 MUNO21 is a dataset and benchmark for machine learning methods that automatically update and maintain digital street map datasets. Previous dat

16 Nov 30, 2022
Denoising Diffusion Probabilistic Models

Denoising Diffusion Probabilistic Models Jonathan Ho, Ajay Jain, Pieter Abbeel Paper: https://arxiv.org/abs/2006.11239 Website: https://hojonathanho.g

Jonathan Ho 1.5k Jan 08, 2023
Betafold - AlphaFold with tunings

BetaFold We (hegelab.org) craeted this standalone AlphaFold (AlphaFold-Multimer,

2 Aug 11, 2022
Approaches to modeling terrain and maps in python

topography 🌎 Contains different approaches to modeling terrain and topographic-style maps in python Features Inverse Distance Weighting (IDW) A given

John Gutierrez 1 Aug 10, 2022
Fewshot-face-translation-GAN - Generative adversarial networks integrating modules from FUNIT and SPADE for face-swapping.

Few-shot face translation A GAN based approach for one model to swap them all. The table below shows our priliminary face-swapping results requiring o

768 Dec 24, 2022