Easy to use and customizable SOTA Semantic Segmentation models with abundant datasets in PyTorch

Overview

Semantic Segmentation

Easy to use and customizable SOTA Semantic Segmentation models with abundant datasets in PyTorch

Open In Colab

banner

Features

  • Applicable to following tasks:
    • Scene Parsing
    • Human Parsing
    • Face Parsing
  • 20+ Datasets
  • 10+ SOTA Backbones
  • 10+ SOTA Semantic Segmentation Models
  • PyTorch, ONNX, TFLite and OpenVINO Inference

Model Zoo

Supported Backbones:

Supported Heads/Methods:

Supported Standalone Models:

Supported Modules:

  • PPM (CVPR 2017)
  • PSA (ArXiv 2021)
ADE20K-val (Scene Parsing)
Method Backbone mIoU (%) Params
(M)
GFLOPs
(512x512)
Weights
SegFormer MiT-B1 43.1 14 16 pt
MiT-B2 47.5 28 62 pt
MiT-B3 50.0 47 79 pt
CityScapes-val (Scene Parsing)
Method Backbone mIoU (%) Params (M) GFLOPs Img Size Weights
SegFormer MiT-B0 78.1 4 126 1024x1024 N/A
MiT-B1 80.0 14 244 1024x1024 N/A
FaPN ResNet-50 80.0 33 - 512x1024 N/A
SFNet ResNetD-18 79.0 13 - 1024x1024 N/A
FCHarDNet HarDNet-70 77.7 4 35 1024x1024 pt
DDRNet DDRNet-23slim 77.8 6 36 1024x2048 pt
HELEN-val (Face Parsing)
Method Backbone mIoU (%) Params
(M)
GFLOPs
(512x512)
FPS
(GTX1660ti)
Weights
BiSeNetv1 MobileNetV2-1.0 58.22 5 5 160 pt
BiSeNetv1 ResNet-18 58.50 14 13 263 pt
BiSeNetv2 - 58.58 18 15 195 pt
FCHarDNet HarDNet-70 59.38 4 4 130 pt
DDRNet DDRNet-23slim 61.11 6 5 180 pt|tflite(fp32)|tflite(fp16)|tflite(int8)
SegFormer MiT-B0 59.31 4 8 75 pt
SFNet ResNetD-18 61.00 14 31 56 pt
Backbones
Model Variants ImageNet-1k Top-1 Acc (%) Params (M) GFLOPs Weights
MicroNet M1|M2|M3 51.4|59.4|62.5 1|2|3 6M|12M|21M download
MobileNetV2 1.0 71.9 3 300M download
MobileNetV3 S|L 67.7|74.0 3|5 56M|219M S|L
ResNet 18|50|101 69.8|76.1|77.4 12|25|44 2|4|8 download
ResNetD 18|50|101 - 12|25|44 2|4|8 download
MiT B1|B2|B3 - 14|25|45 2|4|8 download
PVTv2 B1|B2|B4 78.7|82.0|83.6 14|25|63 2|4|10 download
ResT S|B|L 79.6|81.6|83.6 14|30|52 2|4|8 download

Notes: Download backbones' weights for HarDNet-70 and DDRNet-23slim.

Supported Datasets

Dataset Type Categories Train
Images
Val
Images
Test
Images
Image Size
(HxW)
COCO-Stuff General Scene Parsing 171 118,000 5,000 20,000 -
ADE20K General Scene Parsing 150 20,210 2,000 3,352 -
PASCALContext General Scene Parsing 59 4,996 5,104 9,637 -
SUN RGB-D Indoor Scene Parsing 37 2,666 2,619 5,050+labels -
Mapillary Vistas Street Scene Parsing 65 18,000 2,000 5,000 1080x1920
CityScapes Street Scene Parsing 19 2,975 500 1,525+labels 1024x2048
CamVid Street Scene Parsing 11 367 101 233+labels 720x960
MHPv2 Multi-Human Parsing 59 15,403 5,000 5,000 -
MHPv1 Multi-Human Parsing 19 3,000 1,000 980+labels -
LIP Multi-Human Parsing 20 30,462 10,000 - -
CCIHP Multi-Human Parsing 22 28,280 5,000 5,000 -
CIHP Multi-Human Parsing 20 28,280 5,000 5,000 -
ATR Single-Human Parsing 18 16,000 700 1,000+labels -
HELEN Face Parsing 11 2,000 230 100+labels -
LaPa Face Parsing 11 18,176 2,000 2,000+labels -
iBugMask Face Parsing 11 21,866 - 1,000+labels -
CelebAMaskHQ Face Parsing 19 24,183 2,993 2,824+labels 512x512
FaceSynthetics Face Parsing (Synthetic) 19 100,000 1,000 100+labels 512x512
SUIM Underwater Imagery 8 1,525 - 110+labels -

Check DATASETS to find more segmentation datasets.

Datasets Structure (click to expand)

Datasets should have the following structure:

data
|__ ADEChallenge
    |__ ADEChallengeData2016
        |__ images
            |__ training
            |__ validation
        |__ annotations
            |__ training
            |__ validation

|__ CityScapes
    |__ leftImg8bit
        |__ train
        |__ val
        |__ test
    |__ gtFine
        |__ train
        |__ val
        |__ test

|__ CamVid
    |__ train
    |__ val
    |__ test
    |__ train_labels
    |__ val_labels
    |__ test_labels
    
|__ VOCdevkit
    |__ VOC2010
        |__ JPEGImages
        |__ SegmentationClassContext
        |__ ImageSets
            |__ SegmentationContext
                |__ train.txt
                |__ val.txt
    
|__ COCO
    |__ images
        |__ train2017
        |__ val2017
    |__ labels
        |__ train2017
        |__ val2017

|__ MHPv1
    |__ images
    |__ annotations
    |__ train_list.txt
    |__ test_list.txt

|__ MHPv2
    |__ train
        |__ images
        |__ parsing_annos
    |__ val
        |__ images
        |__ parsing_annos

|__ LIP
    |__ LIP
        |__ TrainVal_images
            |__ train_images
            |__ val_images
        |__ TrainVal_parsing_annotations
            |__ train_segmentations
            |__ val_segmentations

    |__ CIHP/CCIHP
        |__ instance-leve_human_parsing
            |__ Training
                |__ Images
                |__ Category_ids
            |__ Validation
                |__ Images
                |__ Category_ids

    |__ ATR
        |__ humanparsing
            |__ JPEGImages
            |__ SegmentationClassAug

|__ SUIM
    |__ train_val
        |__ images
        |__ masks
    |__ TEST
        |__ images
        |__ masks

|__ SunRGBD
    |__ SUNRGBD
        |__ kv1/kv2/realsense/xtion
    |__ SUNRGBDtoolbox
        |__ traintestSUNRGBD
            |__ allsplit.mat

|__ Mapillary
    |__ training
        |__ images
        |__ labels
    |__ validation
        |__ images
        |__ labels

|__ SmithCVPR2013_dataset_resized (HELEN)
    |__ images
    |__ labels
    |__ exemplars.txt
    |__ testing.txt
    |__ tuning.txt

|__ CelebAMask-HQ
    |__ CelebA-HQ-img
    |__ CelebAMask-HQ-mask-anno
    |__ CelebA-HQ-to-CelebA-mapping.txt

|__ LaPa
    |__ train
        |__ images
        |__ labels
    |__ val
        |__ images
        |__ labels
    |__ test
        |__ images
        |__ labels

|__ ibugmask_release
    |__ train
    |__ test

|__ FaceSynthetics
    |__ dataset_100000
    |__ dataset_1000
    |__ dataset_100

Note: For PASCALContext, download the annotations from here and put it in VOC2010.

Note: For CelebAMask-HQ, run the preprocess script. python3 scripts/preprocess_celebamaskhq.py --root .


Augmentations (click to expand)

Check out the notebook here to test the augmentation effects.

Pixel-level Transforms:

  • ColorJitter (Brightness, Contrast, Saturation, Hue)
  • Gamma, Sharpness, AutoContrast, Equalize, Posterize
  • GaussianBlur, Grayscale

Spatial-level Transforms:

  • Affine, RandomRotation
  • HorizontalFlip, VerticalFlip
  • CenterCrop, RandomCrop
  • Pad, ResizePad, Resize
  • RandomResizedCrop

Usage

Requirements
  • python >= 3.6
  • torch >= 1.8.1
  • torchvision >= 0.9.1

Other requirements can be installed with pip install -r requirements.txt.


Configuration (click to expand)

Create a configuration file in configs. Sample configuration for ADE20K dataset can be found here. Then edit the fields you think if it is needed. This configuration file is needed for all of training, evaluation and prediction scripts.


Training (click to expand)

To train with a single GPU:

$ python tools/train.py --cfg configs/CONFIG_FILE.yaml

To train with multiple gpus, set DDP field in config file to true and run as follows:

$ python -m torch.distributed.launch --nproc_per_node=2 --use_env tools/train.py --cfg configs/<CONFIG_FILE_NAME>.yaml

Evaluation (click to expand)

Make sure to set MODEL_PATH of the configuration file to your trained model directory.

$ python tools/val.py --cfg configs/<CONFIG_FILE_NAME>.yaml

To evaluate with multi-scale and flip, change ENABLE field in MSF to true and run the same command as above.


Inference

To make an inference, edit the parameters of the config file from below.

  • Change MODEL >> NAME and VARIANT to your desired pretrained model.
  • Change DATASET >> NAME to the dataset name depending on the pretrained model.
  • Set TEST >> MODEL_PATH to pretrained weights of the testing model.
  • Change TEST >> FILE to the file or image folder path you want to test.
  • Testing results will be saved in SAVE_DIR.
## example using ade20k pretrained models
$ python tools/infer.py --cfg configs/ade20k.yaml

Example test results:

test_result


Convert to other Frameworks (ONNX, CoreML, OpenVINO, TFLite)

To convert to ONNX and CoreML, run:

$ python tools/export.py --cfg configs/<CONFIG_FILE_NAME>.yaml

To convert to OpenVINO and TFLite, see torch_optimize.


Inference (ONNX, OpenVINO, TFLite)
## ONNX Inference
$ python scripts/onnx_infer.py --model <ONNX_MODEL_PATH> --img-path <TEST_IMAGE_PATH>

## OpenVINO Inference
$ python scripts/openvino_infer.py --model <OpenVINO_MODEL_PATH> --img-path <TEST_IMAGE_PATH>

## TFLite Inference
$ python scripts/tflite_infer.py --model <TFLite_MODEL_PATH> --img-path <TEST_IMAGE_PATH>

References (click to expand)

Citations (click to expand)
@article{xie2021segformer,
  title={SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers},
  author={Xie, Enze and Wang, Wenhai and Yu, Zhiding and Anandkumar, Anima and Alvarez, Jose M and Luo, Ping},
  journal={arXiv preprint arXiv:2105.15203},
  year={2021}
}

@misc{xiao2018unified,
  title={Unified Perceptual Parsing for Scene Understanding}, 
  author={Tete Xiao and Yingcheng Liu and Bolei Zhou and Yuning Jiang and Jian Sun},
  year={2018},
  eprint={1807.10221},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@article{hong2021deep,
  title={Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes},
  author={Hong, Yuanduo and Pan, Huihui and Sun, Weichao and Jia, Yisong},
  journal={arXiv preprint arXiv:2101.06085},
  year={2021}
}

@misc{zhang2021rest,
  title={ResT: An Efficient Transformer for Visual Recognition}, 
  author={Qinglong Zhang and Yubin Yang},
  year={2021},
  eprint={2105.13677},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{huang2021fapn,
  title={FaPN: Feature-aligned Pyramid Network for Dense Image Prediction}, 
  author={Shihua Huang and Zhichao Lu and Ran Cheng and Cheng He},
  year={2021},
  eprint={2108.07058},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{wang2021pvtv2,
  title={PVTv2: Improved Baselines with Pyramid Vision Transformer}, 
  author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
  year={2021},
  eprint={2106.13797},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@article{Liu2021PSA,
  title={Polarized Self-Attention: Towards High-quality Pixel-wise Regression},
  author={Huajun Liu and Fuqiang Liu and Xinyi Fan and Dong Huang},
  journal={Arxiv Pre-Print arXiv:2107.00782 },
  year={2021}
}

@misc{chao2019hardnet,
  title={HarDNet: A Low Memory Traffic Network}, 
  author={Ping Chao and Chao-Yang Kao and Yu-Shan Ruan and Chien-Hsiang Huang and Youn-Long Lin},
  year={2019},
  eprint={1909.00948},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@inproceedings{sfnet,
  title={Semantic Flow for Fast and Accurate Scene Parsing},
  author={Li, Xiangtai and You, Ansheng and Zhu, Zhen and Zhao, Houlong and Yang, Maoke and Yang, Kuiyuan and Tong, Yunhai},
  booktitle={ECCV},
  year={2020}
}

@article{Li2020SRNet,
  title={Towards Efficient Scene Understanding via Squeeze Reasoning},
  author={Xiangtai Li and Xia Li and Ansheng You and Li Zhang and Guang-Liang Cheng and Kuiyuan Yang and Y. Tong and Zhouchen Lin},
  journal={ArXiv},
  year={2020},
  volume={abs/2011.03308}
}

@ARTICLE{Yucondnet21,
  author={Yu, Changqian and Shao, Yuanjie and Gao, Changxin and Sang, Nong},
  journal={IEEE Signal Processing Letters}, 
  title={CondNet: Conditional Classifier for Scene Segmentation}, 
  year={2021},
  volume={28},
  number={},
  pages={758-762},
  doi={10.1109/LSP.2021.3070472}
}

Comments
  • RuntimeError: CUDA error: an illegal memory access was encountered

    RuntimeError: CUDA error: an illegal memory access was encountered

    The error was occured after changing batch_size from 8 to 4 in cityscapes.yaml.

    Found 2975 train images. Found 500 val images. Epoch: [1/500] Iter: [1/185] LR: 0.00010049 Loss: 7.71337414: 1%|▌ | 1/185 [00:03<11:41, 3.81s/it] Traceback (most recent call last): File "tools/train.py", line 153, in main(cfg, gpu, save_dir) File "tools/train.py", line 97, in main scaler.scale(loss).backward() File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/_tensor.py", line 255, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/autograd/init.py", line 147, in backward Variable._execution_engine.run_backward( RuntimeError: CUDA error: an illegal memory access was encountered (semseg) [email protected]:~/Documents/projects/semantic-segmentation$ CUDA_LAUNCH_BLOCKING=1 python tools/train.py --cfg configs/DDRNet/cityscapes.yaml Found 2975 train images. Found 500 val images. Epoch: [1/500] Iter: [1/743] LR: 0.00010012 Loss: 7.53493118: 0%|▏ | 1/743 [00:03<37:21, 3.02s/it] Traceback (most recent call last): File "tools/train.py", line 153, in main(cfg, gpu, save_dir) File "tools/train.py", line 95, in main loss = loss_fn(logits, lbl) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/liuwang/Documents/projects/semantic-segmentation/semseg/losses.py", line 43, in forward return sum([w * self._forward(pred, labels) for (pred, w) in zip(preds, self.aux_weights)]) File "/home/liuwang/Documents/projects/semantic-segmentation/semseg/losses.py", line 43, in return sum([w * self._forward(pred, labels) for (pred, w) in zip(preds, self.aux_weights)]) File "/home/liuwang/Documents/projects/semantic-segmentation/semseg/losses.py", line 33, in _forward loss = self.criterion(preds, labels).view(-1) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 1120, in forward return F.cross_entropy(input, target, weight=self.weight, File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/nn/functional.py", line 2824, in cross_entropy return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index) RuntimeError: CUDA error: an illegal memory access was encountered Exception in thread Thread-2: Traceback (most recent call last): File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 28, in _pin_memory_loop r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/queues.py", line 116, in get return _ForkingPickler.loads(res) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 289, in rebuild_storage_fd fd = df.detach() File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/resource_sharer.py", line 57, in detach with _resource_sharer.get_connection(self._id) as conn: File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/resource_sharer.py", line 87, in get_connection c = Client(address, authkey=process.current_process().authkey) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/connection.py", line 508, in Client answer_challenge(c, authkey) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/connection.py", line 752, in answer_challenge message = connection.recv_bytes(256) # reject large message File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes buf = self._recv_bytes(maxlength) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes buf = self._recv(4) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/connection.py", line 379, in _recv chunk = read(handle, remaining) ConnectionResetError: [Errno 104] Connection reset by peer

    opened by StuLiu 9
  • Augmentation configuration

    Augmentation configuration

    Presently the augmentations used during training are hard-coded in semseg/augmentations.py. By default horizontal flip is enabled, which will be problematic for datasets where the orientation of the image matters (e.g. facial datasets may label the left and right eyes independently, and flipping the image would also require swapping these two labels).

    Are there any future plans to allow for augmentation to be configurable?

    opened by markdjwilliams 5
  • Performance issues with ddrnet

    Performance issues with ddrnet

    Hello, I am using ddrnet , i tried several other models with the same dataset i am working on and they improve over time except ddrnet after 10 epoch the miou starts decreasing , Does that mean ddrnet perform bad or is something wrong with the code i should fix?

    opened by mhmd-mst 5
  • Question about the dimensionality of the mask.

    Question about the dimensionality of the mask.

    Thank you for your work.

    It would be nice to see the actual performance of the models in fps on specific hardware. Particularly on devices like jetson.

    The training requires the mask to be gray, and in the file that describes the dataset PALETTE has a dimension of 3. Can you tell me what should be the dimensionality of PALETTE (for example my labels will be (1,1,1) or 1 etc.)

    opened by MsWik 4
  • SegFormer B5 evaluation error on cityscapes

    SegFormer B5 evaluation error on cityscapes

    Hello, I try to implement an evaluation of segformer b5 to cityscapes dataset, but I keep getting an error related to the network's structure. I attached a file with the error. error.txt Even though I modified a custom.yaml file, which is like this one, I still keep getting an error. What can i do to solve this?

        MODEL:                                    
          NAME          : SegFormer                                           # name of the model you are using
          VARIANT       : B5                                                  # model variant
          PRETRAINED    : 'checkpoints/backbones/hardnet/hardnet_70.pth'              # backbone model's weight 
        
        DATASET:
          NAME          : cityscapes                                              # dataset name to be trained with (camvid, cityscapes, ade20k)
          ROOT          : '/media/evdo/DATA/diploma_project/cityscapes/gtFine_trainvaltest'                         # dataset root path
        
        TRAIN:
          IMAGE_SIZE    : [1024, 1024]    # training image size in (h, w)
          BATCH_SIZE    : 8               # batch size used to train
          EPOCHS        : 500             # number of epochs to train
          EVAL_INTERVAL : 50              # evaluation interval during training
          AMP           : false           # use AMP in training
          DDP           : false           # use DDP training
        
        LOSS:
          NAME          : ohemce          # loss function name (ohemce, ce, dice)
          CLS_WEIGHTS   : true            # use class weights in loss calculation
          THRESH        : 0.9             # ohemce threshold or dice delta if you choose ohemce loss or dice loss
        
        OPTIMIZER:
          NAME          : adamw           # optimizer name
          LR            : 0.01            # initial learning rate used in optimizer
          WEIGHT_DECAY  : 0.0001          # decay rate used in optimizer 
        
        SCHEDULER:
          NAME          : warmuppolylr    # scheduler name
          POWER         : 0.9             # scheduler power
          WARMUP        : 10              # warmup epochs used in scheduler
          WARMUP_RATIO  : 0.1             # warmup ratio
          
        
        EVAL:
          MODEL_PATH    : 'checkpoints/pretrained/segformer/segformer.b5.1024x1024.city.160k.pth'  # trained model file path
          IMAGE_SIZE    : [1024, 1024]                                                            # evaluation image size in (h, w)                       
          MSF:  
            ENABLE      : false                                                                 # multi-scale and flip evaluation  
            FLIP        : true                                                                  # use flip in evaluation  
            SCALES      : [0.5, 0.75, 1.0, 1.25, 1.5, 1.75]                                     # scales used in MSF evaluation                
        
        
        TEST:
          MODEL_PATH    : 'checkpoints/pretrained/ddrnet/ddrnet_23_city.pth'                     # trained model file path
          FILE          : 'cityscapes/leftImg8bit_trainextra/leftImg8bit/train_extra/wurzburg' # filename or foldername 
          IMAGE_SIZE    : [480, 640]                                                           # inference image size in (h, w)
          OVERLAY       : false      
    
    opened by evdokimos 3
  • About mIoU result on Cityscapes validation dataset

    About mIoU result on Cityscapes validation dataset

    Hi,

    Thanks for your work and providing code. I am just confuse about the mIoU reported in the paper on cityscapes validation dataset which is around 77%. Kindly let me know is this mIoU achieved on pretrained model (imagenet)? I am training DDR-Net-Slim23 from scratch and I am getting around 55% mIoU on validation data of Cityscapes.

    opened by tanveer6715 3
  • RuntimeError: The size of tensor a (4) must match the size of tensor b (18) at non-singleton dimension 1

    RuntimeError: The size of tensor a (4) must match the size of tensor b (18) at non-singleton dimension 1

    I have 18 classes using the cityscapes dataset, and I used to train my own images but I get this error from the Dice loss function at this point: tp = torch.sum(targets*preds, dim=(2, 3)) RuntimeError: The size of tensor a (4) must match the size of tensor b (18) at non-singleton dimension 1

    opened by ghost 3
  • ImageDraw.textbbox Attribute error in Visualize.py function

    ImageDraw.textbbox Attribute error in Visualize.py function "draw_text" when calling infer.py

    I am running infer.py and I get this error, any idea why? Traceback (most recent call last): File "tools/infer.py", line 100, in segmap,oversegmap,image = semseg.predict(str(test_file), cfg['TEST']['OVERLAY']) File "tools/infer.py", line 73, in predict seg_map,over_seg_map,image = self.postprocess(torch.tensor(image1), seg_map, overlay) File "tools/infer.py", line 60, in postprocess image = draw_text(seg_image, seg_map, self.labels) File "/content/trial/semseg/utils/visualize.py", line 28, in draw_text bbox = draw.textbbox(center, cls, font=fonts) AttributeError: 'ImageDraw' object has no attribute 'textbbox'

    opened by mhmd-mst 2
  • Double Checking implementation detail in SegFormerHead

    Double Checking implementation detail in SegFormerHead

    The paper on SegFormer suggests an All MLP decoder.

    Screen Shot 2022-04-23 at 3 03 57 AM

    The SegformerHead.py shows the use of a Conv2D for the final layer.

    Screen Shot 2022-04-23 at 3 06 18 AM

    Can you help me understand if this is a deviation from the paper or mentioned in a followup paper somewhere? I apologize in advance if there is an obvious answer.

    opened by RahulSinghalChicago 2
  • UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`

    UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`

    /usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)

    opened by RahulSinghalChicago 2
  • ModuleNotFoundError: No module named 'semseg'

    ModuleNotFoundError: No module named 'semseg'

    Hi there, When I tried to run infer.py, I got the following error: (sotas) [email protected]:~/semantic-segmentation$ python tools/infer.py --cfg configs/custom.yaml Traceback (most recent call last): File "tools/infer.py", line 12, in <module> from semseg.models import * ModuleNotFoundError: No module named 'semseg' I also tried to append the path to /semseg into sys.path, but it did not solve the problem. Could you please give a bit suggestions? Thanks

    opened by Shawn207 2
  • BiSeNetv2 implementation difference leads to strange results

    BiSeNetv2 implementation difference leads to strange results

    Hello, I train BiSeNetv2 in my custom dataset and find probability maps have a grid-like look, which is very unnatural.

    After further observation I find you use PixelShuffle on the last upsampling layer, which leads to the grid results(seems like you also use it in v1). But in BiSeNetv2 paper, the author uses simple bilinear interpolation to upsample results, which gives smoothing probability maps in my experiment. why use PixelShuffle here?

    So what's the idea of using PixelShuffle to upsample final results? Also, don't see an obvious improvement on validate score by using it tho.

    By the way, there are some differences between your implementation and the paper on Gather-and-Expansion Layer. Cannot figure out if this will affect training results significantly.

    opened by wasupandceacar 0
  • RuntimeError: transform: failed to sync: cudaErrorIllegalAddress: an 非法内存 access

    RuntimeError: transform: failed to sync: cudaErrorIllegalAddress: an 非法内存 access

    你好,请问我在使用DDR模型遇到这个错误'RuntimeError: transform: failed to sync: cudaErrorIllegalAddress: an 非法内存 access',请教一下遇到这样子错误吗?网上搜有说pytorch版本,还有标注文件 / 标签 等没有设置正确 以及网络模型、训练数据(图片和标签)没有放置到GPU上,我感觉这些都没有问题啊。期待您的回复

    opened by scl666 0
  • training error on colab

    training error on colab

    My all setup is successful on colab for training. However, when I run

    !python tools/train.py --cfg configs/CONFIG_FILE.yaml

    I get error:

    Found 20210 training images. Found 2000 validation images. Epoch: [1/500] Iter: [0/2526] LR: 0.00100000 Loss: 0.00000000: 0% 0/2526 [00:00<?, ?it/s] Traceback (most recent call last): File "tools/train.py", line 128, in main(cfg, gpu, save_dir) File "tools/train.py", line 69, in main for iter, (img, lbl) in pbar: File "/usr/local/lib/python3.7/dist-packages/tqdm/std.py", line 1195, in iter for obj in iterable: File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 681, in next data = self._next_data() File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1376, in _next_data return self._process_data(data) File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1402, in _process_data data.reraise() File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 461, in reraise raise exception RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/content/semantic-segmentation/semseg/datasets/ade20k.py", line 73, in getitem image, label = self.transform(image, label) File "/content/semantic-segmentation/semseg/augmentations.py", line 20, in call img, mask = transform(img, mask) File "/content/semantic-segmentation/semseg/augmentations.py", line 329, in call mask = TF.pad(mask, padding, fill=self.seg_fill) File "/usr/local/lib/python3.7/dist-packages/torchvision/transforms/functional.py", line 481, in pad return F_t.pad(img, padding=padding, fill=fill, padding_mode=padding_mode) File "/usr/local/lib/python3.7/dist-packages/torchvision/transforms/functional_tensor.py", line 418, in pad img = torch_pad(img, p, mode=padding_mode, value=float(fill)) RuntimeError: value cannot be converted to type uint8_t without overflow

    opened by hb0313 2
  • data format

    data format

    Is the annotation format the same for all? eg. ADE20K seems to be color images. I have COCO format original annotation data (grey scale) but it appears I need to change it to color?

    Thanks.

    opened by Tsardoz 1
Releases(v0.2.6)
Owner
sithu3
AI Developer
sithu3
Certified Patch Robustness via Smoothed Vision Transformers

Certified Patch Robustness via Smoothed Vision Transformers This repository contains the code for replicating the results of our paper: Certified Patc

Madry Lab 35 Dec 14, 2022
Learnable Motion Coherence for Correspondence Pruning

Learnable Motion Coherence for Correspondence Pruning Yuan Liu, Lingjie Liu, Cheng Lin, Zhen Dong, Wenping Wang Project Page Any questions or discussi

liuyuan 41 Nov 30, 2022
Implementation and replication of ProGen, Language Modeling for Protein Generation, in Jax

ProGen - (wip) Implementation and replication of ProGen, Language Modeling for Protein Generation, in Pytorch and Jax (the weights will be made easily

Phil Wang 71 Dec 01, 2022
This is a computer vision based implementation of the popular childhood game 'Hand Cricket/Odd or Even' in python

Hand Cricket Table of Content Overview Installation Game rules Project Details Future scope Overview This is a computer vision based implementation of

Abhinav R Nayak 6 Jan 12, 2022
:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

R²SQL The PyTorch implementation of paper Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing. (AAAI 2021) Requirement

huybery 60 Dec 31, 2022
USAD - UnSupervised Anomaly Detection on multivariate time series

USAD - UnSupervised Anomaly Detection on multivariate time series Scripts and utility programs for implementing the USAD architecture. Implementation

116 Jan 04, 2023
Code, environments, and scripts for the paper: "How Private Is Your RL Policy? An Inverse RL Based Analysis Framework"

Privacy-Aware Inverse RL (PRIL) Analysis Framework Code, environments, and scripts for the paper: "How Private Is Your RL Policy? An Inverse RL Based

1 Dec 06, 2021
FedML: A Research Library and Benchmark for Federated Machine Learning

FedML: A Research Library and Benchmark for Federated Machine Learning 📄 https://arxiv.org/abs/2007.13518 News 2021-02-01 (Award): #NeurIPS 2020# Fed

FedML-AI 2.3k Jan 08, 2023
Pytorch Implementation of rpautrat/SuperPoint

SuperPoint-Pytorch (A Pure Pytorch Implementation) SuperPoint: Self-Supervised Interest Point Detection and Description Thanks This work is based on:

76 Dec 27, 2022
Benchmark for the generalization of 3D machine learning models across different remeshing/samplings of a surface.

Discretization Robust Correspondence Benchmark One challenge of machine learning on 3D surfaces is that there are many different representations/sampl

Nicholas Sharp 10 Sep 30, 2022
GRaNDPapA: Generator of Rad Names from Decent Paper Acronyms

GRaNDPapA: Generator of Rad Names from Decent Paper Acronyms Trying to publish a new machine learning model and can't write a decent title for your pa

264 Nov 08, 2022
Code repo for "Cross-Scale Internal Graph Neural Network for Image Super-Resolution" (NeurIPS'20)

IGNN Code repo for "Cross-Scale Internal Graph Neural Network for Image Super-Resolution" [paper] [supp] Prepare datasets 1 Download training dataset

Shangchen Zhou 278 Jan 03, 2023
Predict multi paths to a moving person depending on his trajectory history.

Multi-future Trajectory Prediction The project is about using the Multiverse model to make possible multible-future trajectory prediction for a seen p

Said Gamal 1 Jan 18, 2022
Anomaly Localization in Model Gradients Under Backdoor Attacks Against Federated Learning

Federated_Learning This repo provides a federated learning framework that allows to carry out backdoor attacks under varying conditions. This is a ker

Arçelik ARGE Açık Kaynak Yazılım Organizasyonu 0 Nov 30, 2021
Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Video Object Segmentation.

Training Script for Reuse-VOS This code implementation of CVPR 2021 paper : Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Vi

HYOJINPARK 22 Jan 01, 2023
Hyperbolic Procrustes Analysis Using Riemannian Geometry

Hyperbolic Procrustes Analysis Using Riemannian Geometry The code in this repository creates the figures presented in this article: Please notice that

Ronen Talmon's Lab 2 Jan 08, 2023
Implementation of StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation in PyTorch

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation Implementation of StyleSpace Analysis: Disentangled Controls for StyleGAN Ima

Xuanchi Ren 86 Dec 07, 2022
The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding.

SuperGen The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding. Requirements Before running, you

Yu Meng 38 Dec 12, 2022
Little tool in python to watch anime from the terminal (the better way to watch anime)

ani-cli Script working again :), thanks to the fork by Dink4n for the alternative approach to by pass the captcha on gogoanime A cli to browse and wat

Harshith 4.5k Dec 31, 2022
ESP32 python application to read data from a Tilt™ Hydrometer for homebrewing

TitlESP32 ESP32 MicroPython application to read and log data from a Tilt™ Hydrometer. Requirements A board with an ESP32 chip USB cable - USB A / micr

IoBeer 5 Dec 01, 2022