The official implementation of "Rethink Dilated Convolution for Real-time Semantic Segmentation"

Related tags

Deep LearningRegSeg
Overview

RegSeg

The official implementation of "Rethink Dilated Convolution for Real-time Semantic Segmentation"

Paper: arxiv

params

D block

DBlock

Decoder

Decoder

Setup

Install the dependencies in requirements.txt by using pip and virtualenv.

Download Cityscapes

go to https://www.cityscapes-dataset.com, create an account, and download gtFine_trainvaltest.zip and leftImg8bit_trainvaltest.zip. You can delete the test images to save some space if you don't want to submit to the competition. Name the directory cityscapes_dataset. Make sure that you have downloaded the required python packages and run

CITYSCAPES_DATASET=cityscapes_dataset csCreateTrainIdLabelImgs

There are 19 classes.

Results from paper

To see the ablation studies results from the paper, go here.

Usage

To visualize your model, go to show.py. To train, validate, benchmark, and save the results of your model, go to train.py.

Results on Cityscapes server

RegSeg (exp48_decoder26, 30FPS): 78.3

Larger RegSeg (exp53_decoder29, 20 FPS): 79.5

Citation

If you find our work helpful, please consider citing our paper.

@article{gao2021rethink,
  title={Rethink Dilated Convolution for Real-time Semantic Segmentation},
  author={Gao, Roland},
  journal={arXiv preprint arXiv:2111.09957},
  year={2021}
}
Comments
  • question about STDC2-Seg75

    question about STDC2-Seg75

    Hi, I note that you benchmark the computation of STDC2-Seg75 which is not reported in the CVPR2021 paper. Did you test the speed of STDC-Seg on your own platform? How about the results?

    opened by ydhongHIT 2
  • Can not show.py

    Can not show.py

    I try show.py. But I can not.

    $ python3 show.py
    name= cityscapes
    train size: 2975
    val size: 500
    Traceback (most recent call last):
      File "show.py", line 358, in <module>
        show_cityscapes_model()
      File "show.py", line 337, in show_cityscapes_model
        show(model,val_loader,device,show_cityscapes_mask,num_images=num_images,skip=skip,images_per_line=images_per_line)
      File "show.py", line 134, in show
        outputs = model(images)
      File "/home/sounansu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/sounansu/RegSeg/model.py", line 76, in forward
        x=self.stem(x)
      File "/home/sounansu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/sounansu/RegSeg/blocks.py", line 22, in forward
        x = self.conv(x)
      File "/home/sounansu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
        return forward_call(*input, **kwargs)
      File "/home/sounansu/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 446, in forward
        return self._conv_forward(input, self.weight, self.bias)
      File "/home/sounansu/.local/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 442, in _conv_forward
        return F.conv2d(input, weight, bias, self.stride,
    RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should be the same
    
    opened by sounansu 2
  • The pretrained model link

    The pretrained model link

    Hi, thank you for sharing the code. Can you provide download link about the pretrained model(exp48_decoder26 and exp53_decoder29) in Cityscapes dataset, Thank you very much!

    opened by gaowq2017 1
  • About train bug

    About train bug

    When using seg_transforms.py through your scripts 'camvid_efficientnet_b1_hyperseg-s', there always exsist 'TypeError: resize() got an unexpected keyword argument 'interpolation'' in 174 line. Does this bug only appear in this scripts and should I modify the code when using this scripts?

    opened by 870572761 0
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
  • About train code

    About train code

    When training, how did the miou and accuracy calculate? On train dataset or validate dataset? I think it's calculated on val dataset due to https://github.com/RolandGao/RegSeg/blob/main/train.py#L238. I trained the base regseg model with config cityscapes_trainval_1000epochs.yam on Cityscapes and got the unbelievable results. 840794c66f23deb33666dcffc4af5b5

    opened by Asthestarsfalll 6
  • confusion on field of view  and model inference time

    confusion on field of view and model inference time

    Hi, RolandGao, nice to see a good job! I see you've done a lot of experiments on the backbone setting, but I still have some confusion after reading your published paper.

    • First, You calculate the fov of 4095 to see the bottom-right pixel when training cityscape (1024x2048), so you have verify the backbone should be exp48 [ (1,1) + (1,2) + 4 * (1, 4) + 7 *(1, 14) ] with fov (3807). But I also find the same backbone when training the CamVid (720x960). Why not use a shallow backbone? I am training my own dataset with image resolution (512 x 512), do I need to modify the backbone architecture? Can you give some advice?
    • Second, I test inference time of regseg. I notice that the speed is not better than other real-time archs due to split and dilated conv even if model costs low GFLOPs. In the application, what we are concerned about is the speed, so is there any strategy to improve the speed?
    opened by LinaShanghaitech 5
  • Why not pretrain on ImageNet?

    Why not pretrain on ImageNet?

    Hi, Thanks for your excellent work ! I notice that RegSeg can achieve a high accuracy on Cityscapes without pretraining. I also did a lot of ablation studies and I think DDRNet will drop around 3% miou if they do not use ImageNet pretraining. How about trying to train your encoder on ImageNet and see what will happen? I really look forward to your result ! Thanks !

    opened by RobinhoodKi 1
Owner
Roland
University of Toronto CS 2023
Roland
pytorch implementation of trDesign

trdesign-pytorch This repository is a PyTorch implementation of the trDesign paper based on the official TensorFlow implementation. The initial port o

Learn Ventures Inc. 41 Dec 29, 2022
An implementation of "Optimal Textures: Fast and Robust Texture Synthesis and Style Transfer through Optimal Transport"

Optex An implementation of Optimal Textures: Fast and Robust Texture Synthesis and Style Transfer through Optimal Transport for TU Delft CS4240. You c

Hans Brouwer 33 Jan 05, 2023
Neon: an add-on for Lightbulb making it easier to handle component interactions

Neon Neon is an add-on for Lightbulb making it easier to handle component interactions. Installation pip install git+https://github.com/neonjonn/light

Neon Jonn 9 Apr 29, 2022
NuPIC Studio is an all­-in-­one tool that allows users create a HTM neural network from scratch

NuPIC Studio is an all­-in-­one tool that allows users create a HTM neural network from scratch, train it, collect statistics, and share it among the members of the community. It is not just a visual

HTM Community 93 Sep 30, 2022
An investigation project for SISR.

SISR-Survey An investigation project for SISR. This repository is an official project of the paper "From Beginner to Master: A Survey for Deep Learnin

Juncheng Li 79 Oct 20, 2022
This project uses Template Matching technique for object detecting by detection of template image over base image.

Object Detection Project Using OpenCV This project uses Template Matching technique for object detecting by detection the template image over base ima

Pratham Bhatnagar 7 May 29, 2022
Roadmap to becoming a machine learning engineer in 2020

Roadmap to becoming a machine learning engineer in 2020, inspired by web-developer-roadmap.

Chris Hoyean Song 1.7k Dec 29, 2022
Pytorch Lightning Distributed Accelerators using Ray

Distributed PyTorch Lightning Training on Ray This library adds new PyTorch Lightning accelerators for distributed training using the Ray distributed

166 Dec 27, 2022
Tiny Kinetics-400 for test

Kinetics-400迷你数据集 English | 简体中文 该数据集旨在解决的问题:参照Kinetics-400数据格式,训练基于自己数据的视频理解模型。 数据集介绍 Kinetics-400是视频领域benchmark常用数据集,详细介绍可以参考其官方网站Kinetics。整个数据集包含40

38 Jan 06, 2023
Official implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

CrossViT This repository is the official implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification. ArXiv If

International Business Machines 168 Dec 29, 2022
Multi-Template Mouse Brain MRI Atlas (MBMA): both in-vivo and ex-vivo

Multi-template MRI mouse brain atlas (both in vivo and ex vivo) Mouse Brain MRI atlas (both in-vivo and ex-vivo) (repository relocated from the origin

8 Nov 18, 2022
Multi Task RL Baselines

MTRL Multi Task RL Algorithms Contents Introduction Setup Usage Documentation Contributing to MTRL Community Acknowledgements Introduction M

Facebook Research 171 Jan 09, 2023
On-device speech-to-intent engine powered by deep learning

Rhino Made in Vancouver, Canada by Picovoice Rhino is Picovoice's Speech-to-Intent engine. It directly infers intent from spoken commands within a giv

Picovoice 510 Dec 30, 2022
NATS-Bench: Benchmarking NAS Algorithms for Architecture Topology and Size

NATS-Bench: Benchmarking NAS Algorithms for Architecture Topology and Size Xuanyi Dong, Lu Liu, Katarzyna Musial, Bogdan Gabrys in IEEE Transactions o

D-X-Y 137 Dec 20, 2022
Tutorial materials for Part of NSU Intro to Deep Learning with PyTorch.

Intro to Deep Learning Materials are part of North South University (NSU) Intro to Deep Learning with PyTorch workshop series. (Slides) Related materi

Hasib Zunair 9 Jun 08, 2022
Implementation of light baking system for ray tracing based on Activision's UberBake

Vulkan Light Bakary MSU Graphics Group Student's Diploma Project Treefonov Andrey [GitHub] [LinkedIn] Project Goal The goal of the project is to imple

Andrey Treefonov 7 Dec 27, 2022
Bridging Vision and Language Model

BriVL BriVL (Bridging Vision and Language Model) 是首个中文通用图文多模态大规模预训练模型。BriVL模型在图文检索任务上有着优异的效果,超过了同期其他常见的多模态预训练模型(例如UNITER、CLIP)。 BriVL论文:WenLan: Bridgi

235 Dec 27, 2022
Pytorch codes for "Self-supervised Multi-view Stereo via Effective Co-Segmentation and Data-Augmentation"

Self-Supervised-MVS This repository is the official PyTorch implementation of our AAAI 2021 paper: "Self-supervised Multi-view Stereo via Effective Co

hongbin_xu 127 Jan 04, 2023
A Dying Light 2 (DL2) PAKFile Utility for Modders and Mod Makers.

Dying Light 2 PAKFile Utility A Dying Light 2 (DL2) PAKFile Utility for Modders and Mod Makers. This tool aims to make PAKFile (.pak files) modding a

RHQ Online 12 Aug 26, 2022
Pytorch reimplementation of the Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)

Vision Transformer Pytorch reimplementation of Google's repository for the ViT model that was released with the paper An Image is Worth 16x16 Words: T

Eunkwang Jeon 1.4k Dec 28, 2022