Easy to use and customizable SOTA Semantic Segmentation models with abundant datasets in PyTorch

Overview

Semantic Segmentation

Easy to use and customizable SOTA Semantic Segmentation models with abundant datasets in PyTorch

Open In Colab

banner

Features

  • Applicable to following tasks:
    • Scene Parsing
    • Human Parsing
    • Face Parsing
  • 20+ Datasets
  • 10+ SOTA Backbones
  • 10+ SOTA Semantic Segmentation Models
  • PyTorch, ONNX, TFLite and OpenVINO Inference

Model Zoo

Supported Backbones:

Supported Heads/Methods:

Supported Standalone Models:

Supported Modules:

  • PPM (CVPR 2017)
  • PSA (ArXiv 2021)
ADE20K-val (Scene Parsing)
Method Backbone mIoU (%) Params
(M)
GFLOPs
(512x512)
Weights
SegFormer MiT-B1 43.1 14 16 pt
MiT-B2 47.5 28 62 pt
MiT-B3 50.0 47 79 pt
CityScapes-val (Scene Parsing)
Method Backbone mIoU (%) Params (M) GFLOPs Img Size Weights
SegFormer MiT-B0 78.1 4 126 1024x1024 N/A
MiT-B1 80.0 14 244 1024x1024 N/A
FaPN ResNet-50 80.0 33 - 512x1024 N/A
SFNet ResNetD-18 79.0 13 - 1024x1024 N/A
FCHarDNet HarDNet-70 77.7 4 35 1024x1024 pt
DDRNet DDRNet-23slim 77.8 6 36 1024x2048 pt
HELEN-val (Face Parsing)
Method Backbone mIoU (%) Params
(M)
GFLOPs
(512x512)
FPS
(GTX1660ti)
Weights
BiSeNetv1 MobileNetV2-1.0 58.22 5 5 160 pt
BiSeNetv1 ResNet-18 58.50 14 13 263 pt
BiSeNetv2 - 58.58 18 15 195 pt
FCHarDNet HarDNet-70 59.38 4 4 130 pt
DDRNet DDRNet-23slim 61.11 6 5 180 pt|tflite(fp32)|tflite(fp16)|tflite(int8)
SegFormer MiT-B0 59.31 4 8 75 pt
SFNet ResNetD-18 61.00 14 31 56 pt
Backbones
Model Variants ImageNet-1k Top-1 Acc (%) Params (M) GFLOPs Weights
MicroNet M1|M2|M3 51.4|59.4|62.5 1|2|3 6M|12M|21M download
MobileNetV2 1.0 71.9 3 300M download
MobileNetV3 S|L 67.7|74.0 3|5 56M|219M S|L
ResNet 18|50|101 69.8|76.1|77.4 12|25|44 2|4|8 download
ResNetD 18|50|101 - 12|25|44 2|4|8 download
MiT B1|B2|B3 - 14|25|45 2|4|8 download
PVTv2 B1|B2|B4 78.7|82.0|83.6 14|25|63 2|4|10 download
ResT S|B|L 79.6|81.6|83.6 14|30|52 2|4|8 download

Notes: Download backbones' weights for HarDNet-70 and DDRNet-23slim.

Supported Datasets

Dataset Type Categories Train
Images
Val
Images
Test
Images
Image Size
(HxW)
COCO-Stuff General Scene Parsing 171 118,000 5,000 20,000 -
ADE20K General Scene Parsing 150 20,210 2,000 3,352 -
PASCALContext General Scene Parsing 59 4,996 5,104 9,637 -
SUN RGB-D Indoor Scene Parsing 37 2,666 2,619 5,050+labels -
Mapillary Vistas Street Scene Parsing 65 18,000 2,000 5,000 1080x1920
CityScapes Street Scene Parsing 19 2,975 500 1,525+labels 1024x2048
CamVid Street Scene Parsing 11 367 101 233+labels 720x960
MHPv2 Multi-Human Parsing 59 15,403 5,000 5,000 -
MHPv1 Multi-Human Parsing 19 3,000 1,000 980+labels -
LIP Multi-Human Parsing 20 30,462 10,000 - -
CCIHP Multi-Human Parsing 22 28,280 5,000 5,000 -
CIHP Multi-Human Parsing 20 28,280 5,000 5,000 -
ATR Single-Human Parsing 18 16,000 700 1,000+labels -
HELEN Face Parsing 11 2,000 230 100+labels -
LaPa Face Parsing 11 18,176 2,000 2,000+labels -
iBugMask Face Parsing 11 21,866 - 1,000+labels -
CelebAMaskHQ Face Parsing 19 24,183 2,993 2,824+labels 512x512
FaceSynthetics Face Parsing (Synthetic) 19 100,000 1,000 100+labels 512x512
SUIM Underwater Imagery 8 1,525 - 110+labels -

Check DATASETS to find more segmentation datasets.

Datasets Structure (click to expand)

Datasets should have the following structure:

data
|__ ADEChallenge
    |__ ADEChallengeData2016
        |__ images
            |__ training
            |__ validation
        |__ annotations
            |__ training
            |__ validation

|__ CityScapes
    |__ leftImg8bit
        |__ train
        |__ val
        |__ test
    |__ gtFine
        |__ train
        |__ val
        |__ test

|__ CamVid
    |__ train
    |__ val
    |__ test
    |__ train_labels
    |__ val_labels
    |__ test_labels
    
|__ VOCdevkit
    |__ VOC2010
        |__ JPEGImages
        |__ SegmentationClassContext
        |__ ImageSets
            |__ SegmentationContext
                |__ train.txt
                |__ val.txt
    
|__ COCO
    |__ images
        |__ train2017
        |__ val2017
    |__ labels
        |__ train2017
        |__ val2017

|__ MHPv1
    |__ images
    |__ annotations
    |__ train_list.txt
    |__ test_list.txt

|__ MHPv2
    |__ train
        |__ images
        |__ parsing_annos
    |__ val
        |__ images
        |__ parsing_annos

|__ LIP
    |__ LIP
        |__ TrainVal_images
            |__ train_images
            |__ val_images
        |__ TrainVal_parsing_annotations
            |__ train_segmentations
            |__ val_segmentations

    |__ CIHP/CCIHP
        |__ instance-leve_human_parsing
            |__ Training
                |__ Images
                |__ Category_ids
            |__ Validation
                |__ Images
                |__ Category_ids

    |__ ATR
        |__ humanparsing
            |__ JPEGImages
            |__ SegmentationClassAug

|__ SUIM
    |__ train_val
        |__ images
        |__ masks
    |__ TEST
        |__ images
        |__ masks

|__ SunRGBD
    |__ SUNRGBD
        |__ kv1/kv2/realsense/xtion
    |__ SUNRGBDtoolbox
        |__ traintestSUNRGBD
            |__ allsplit.mat

|__ Mapillary
    |__ training
        |__ images
        |__ labels
    |__ validation
        |__ images
        |__ labels

|__ SmithCVPR2013_dataset_resized (HELEN)
    |__ images
    |__ labels
    |__ exemplars.txt
    |__ testing.txt
    |__ tuning.txt

|__ CelebAMask-HQ
    |__ CelebA-HQ-img
    |__ CelebAMask-HQ-mask-anno
    |__ CelebA-HQ-to-CelebA-mapping.txt

|__ LaPa
    |__ train
        |__ images
        |__ labels
    |__ val
        |__ images
        |__ labels
    |__ test
        |__ images
        |__ labels

|__ ibugmask_release
    |__ train
    |__ test

|__ FaceSynthetics
    |__ dataset_100000
    |__ dataset_1000
    |__ dataset_100

Note: For PASCALContext, download the annotations from here and put it in VOC2010.

Note: For CelebAMask-HQ, run the preprocess script. python3 scripts/preprocess_celebamaskhq.py --root .


Augmentations (click to expand)

Check out the notebook here to test the augmentation effects.

Pixel-level Transforms:

  • ColorJitter (Brightness, Contrast, Saturation, Hue)
  • Gamma, Sharpness, AutoContrast, Equalize, Posterize
  • GaussianBlur, Grayscale

Spatial-level Transforms:

  • Affine, RandomRotation
  • HorizontalFlip, VerticalFlip
  • CenterCrop, RandomCrop
  • Pad, ResizePad, Resize
  • RandomResizedCrop

Usage

Requirements
  • python >= 3.6
  • torch >= 1.8.1
  • torchvision >= 0.9.1

Other requirements can be installed with pip install -r requirements.txt.


Configuration (click to expand)

Create a configuration file in configs. Sample configuration for ADE20K dataset can be found here. Then edit the fields you think if it is needed. This configuration file is needed for all of training, evaluation and prediction scripts.


Training (click to expand)

To train with a single GPU:

$ python tools/train.py --cfg configs/CONFIG_FILE.yaml

To train with multiple gpus, set DDP field in config file to true and run as follows:

$ python -m torch.distributed.launch --nproc_per_node=2 --use_env tools/train.py --cfg configs/<CONFIG_FILE_NAME>.yaml

Evaluation (click to expand)

Make sure to set MODEL_PATH of the configuration file to your trained model directory.

$ python tools/val.py --cfg configs/<CONFIG_FILE_NAME>.yaml

To evaluate with multi-scale and flip, change ENABLE field in MSF to true and run the same command as above.


Inference

To make an inference, edit the parameters of the config file from below.

  • Change MODEL >> NAME and VARIANT to your desired pretrained model.
  • Change DATASET >> NAME to the dataset name depending on the pretrained model.
  • Set TEST >> MODEL_PATH to pretrained weights of the testing model.
  • Change TEST >> FILE to the file or image folder path you want to test.
  • Testing results will be saved in SAVE_DIR.
## example using ade20k pretrained models
$ python tools/infer.py --cfg configs/ade20k.yaml

Example test results:

test_result


Convert to other Frameworks (ONNX, CoreML, OpenVINO, TFLite)

To convert to ONNX and CoreML, run:

$ python tools/export.py --cfg configs/<CONFIG_FILE_NAME>.yaml

To convert to OpenVINO and TFLite, see torch_optimize.


Inference (ONNX, OpenVINO, TFLite)
## ONNX Inference
$ python scripts/onnx_infer.py --model <ONNX_MODEL_PATH> --img-path <TEST_IMAGE_PATH>

## OpenVINO Inference
$ python scripts/openvino_infer.py --model <OpenVINO_MODEL_PATH> --img-path <TEST_IMAGE_PATH>

## TFLite Inference
$ python scripts/tflite_infer.py --model <TFLite_MODEL_PATH> --img-path <TEST_IMAGE_PATH>

References (click to expand)

Citations (click to expand)
@article{xie2021segformer,
  title={SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers},
  author={Xie, Enze and Wang, Wenhai and Yu, Zhiding and Anandkumar, Anima and Alvarez, Jose M and Luo, Ping},
  journal={arXiv preprint arXiv:2105.15203},
  year={2021}
}

@misc{xiao2018unified,
  title={Unified Perceptual Parsing for Scene Understanding}, 
  author={Tete Xiao and Yingcheng Liu and Bolei Zhou and Yuning Jiang and Jian Sun},
  year={2018},
  eprint={1807.10221},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@article{hong2021deep,
  title={Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes},
  author={Hong, Yuanduo and Pan, Huihui and Sun, Weichao and Jia, Yisong},
  journal={arXiv preprint arXiv:2101.06085},
  year={2021}
}

@misc{zhang2021rest,
  title={ResT: An Efficient Transformer for Visual Recognition}, 
  author={Qinglong Zhang and Yubin Yang},
  year={2021},
  eprint={2105.13677},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{huang2021fapn,
  title={FaPN: Feature-aligned Pyramid Network for Dense Image Prediction}, 
  author={Shihua Huang and Zhichao Lu and Ran Cheng and Cheng He},
  year={2021},
  eprint={2108.07058},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{wang2021pvtv2,
  title={PVTv2: Improved Baselines with Pyramid Vision Transformer}, 
  author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
  year={2021},
  eprint={2106.13797},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@article{Liu2021PSA,
  title={Polarized Self-Attention: Towards High-quality Pixel-wise Regression},
  author={Huajun Liu and Fuqiang Liu and Xinyi Fan and Dong Huang},
  journal={Arxiv Pre-Print arXiv:2107.00782 },
  year={2021}
}

@misc{chao2019hardnet,
  title={HarDNet: A Low Memory Traffic Network}, 
  author={Ping Chao and Chao-Yang Kao and Yu-Shan Ruan and Chien-Hsiang Huang and Youn-Long Lin},
  year={2019},
  eprint={1909.00948},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@inproceedings{sfnet,
  title={Semantic Flow for Fast and Accurate Scene Parsing},
  author={Li, Xiangtai and You, Ansheng and Zhu, Zhen and Zhao, Houlong and Yang, Maoke and Yang, Kuiyuan and Tong, Yunhai},
  booktitle={ECCV},
  year={2020}
}

@article{Li2020SRNet,
  title={Towards Efficient Scene Understanding via Squeeze Reasoning},
  author={Xiangtai Li and Xia Li and Ansheng You and Li Zhang and Guang-Liang Cheng and Kuiyuan Yang and Y. Tong and Zhouchen Lin},
  journal={ArXiv},
  year={2020},
  volume={abs/2011.03308}
}

@ARTICLE{Yucondnet21,
  author={Yu, Changqian and Shao, Yuanjie and Gao, Changxin and Sang, Nong},
  journal={IEEE Signal Processing Letters}, 
  title={CondNet: Conditional Classifier for Scene Segmentation}, 
  year={2021},
  volume={28},
  number={},
  pages={758-762},
  doi={10.1109/LSP.2021.3070472}
}

Comments
  • RuntimeError: CUDA error: an illegal memory access was encountered

    RuntimeError: CUDA error: an illegal memory access was encountered

    The error was occured after changing batch_size from 8 to 4 in cityscapes.yaml.

    Found 2975 train images. Found 500 val images. Epoch: [1/500] Iter: [1/185] LR: 0.00010049 Loss: 7.71337414: 1%|▌ | 1/185 [00:03<11:41, 3.81s/it] Traceback (most recent call last): File "tools/train.py", line 153, in main(cfg, gpu, save_dir) File "tools/train.py", line 97, in main scaler.scale(loss).backward() File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/_tensor.py", line 255, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/autograd/init.py", line 147, in backward Variable._execution_engine.run_backward( RuntimeError: CUDA error: an illegal memory access was encountered (semseg) [email protected]:~/Documents/projects/semantic-segmentation$ CUDA_LAUNCH_BLOCKING=1 python tools/train.py --cfg configs/DDRNet/cityscapes.yaml Found 2975 train images. Found 500 val images. Epoch: [1/500] Iter: [1/743] LR: 0.00010012 Loss: 7.53493118: 0%|▏ | 1/743 [00:03<37:21, 3.02s/it] Traceback (most recent call last): File "tools/train.py", line 153, in main(cfg, gpu, save_dir) File "tools/train.py", line 95, in main loss = loss_fn(logits, lbl) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/liuwang/Documents/projects/semantic-segmentation/semseg/losses.py", line 43, in forward return sum([w * self._forward(pred, labels) for (pred, w) in zip(preds, self.aux_weights)]) File "/home/liuwang/Documents/projects/semantic-segmentation/semseg/losses.py", line 43, in return sum([w * self._forward(pred, labels) for (pred, w) in zip(preds, self.aux_weights)]) File "/home/liuwang/Documents/projects/semantic-segmentation/semseg/losses.py", line 33, in _forward loss = self.criterion(preds, labels).view(-1) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 1120, in forward return F.cross_entropy(input, target, weight=self.weight, File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/nn/functional.py", line 2824, in cross_entropy return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index) RuntimeError: CUDA error: an illegal memory access was encountered Exception in thread Thread-2: Traceback (most recent call last): File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 28, in _pin_memory_loop r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/queues.py", line 116, in get return _ForkingPickler.loads(res) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 289, in rebuild_storage_fd fd = df.detach() File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/resource_sharer.py", line 57, in detach with _resource_sharer.get_connection(self._id) as conn: File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/resource_sharer.py", line 87, in get_connection c = Client(address, authkey=process.current_process().authkey) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/connection.py", line 508, in Client answer_challenge(c, authkey) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/connection.py", line 752, in answer_challenge message = connection.recv_bytes(256) # reject large message File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes buf = self._recv_bytes(maxlength) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes buf = self._recv(4) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/connection.py", line 379, in _recv chunk = read(handle, remaining) ConnectionResetError: [Errno 104] Connection reset by peer

    opened by StuLiu 9
  • Augmentation configuration

    Augmentation configuration

    Presently the augmentations used during training are hard-coded in semseg/augmentations.py. By default horizontal flip is enabled, which will be problematic for datasets where the orientation of the image matters (e.g. facial datasets may label the left and right eyes independently, and flipping the image would also require swapping these two labels).

    Are there any future plans to allow for augmentation to be configurable?

    opened by markdjwilliams 5
  • Performance issues with ddrnet

    Performance issues with ddrnet

    Hello, I am using ddrnet , i tried several other models with the same dataset i am working on and they improve over time except ddrnet after 10 epoch the miou starts decreasing , Does that mean ddrnet perform bad or is something wrong with the code i should fix?

    opened by mhmd-mst 5
  • Question about the dimensionality of the mask.

    Question about the dimensionality of the mask.

    Thank you for your work.

    It would be nice to see the actual performance of the models in fps on specific hardware. Particularly on devices like jetson.

    The training requires the mask to be gray, and in the file that describes the dataset PALETTE has a dimension of 3. Can you tell me what should be the dimensionality of PALETTE (for example my labels will be (1,1,1) or 1 etc.)

    opened by MsWik 4
  • SegFormer B5 evaluation error on cityscapes

    SegFormer B5 evaluation error on cityscapes

    Hello, I try to implement an evaluation of segformer b5 to cityscapes dataset, but I keep getting an error related to the network's structure. I attached a file with the error. error.txt Even though I modified a custom.yaml file, which is like this one, I still keep getting an error. What can i do to solve this?

        MODEL:                                    
          NAME          : SegFormer                                           # name of the model you are using
          VARIANT       : B5                                                  # model variant
          PRETRAINED    : 'checkpoints/backbones/hardnet/hardnet_70.pth'              # backbone model's weight 
        
        DATASET:
          NAME          : cityscapes                                              # dataset name to be trained with (camvid, cityscapes, ade20k)
          ROOT          : '/media/evdo/DATA/diploma_project/cityscapes/gtFine_trainvaltest'                         # dataset root path
        
        TRAIN:
          IMAGE_SIZE    : [1024, 1024]    # training image size in (h, w)
          BATCH_SIZE    : 8               # batch size used to train
          EPOCHS        : 500             # number of epochs to train
          EVAL_INTERVAL : 50              # evaluation interval during training
          AMP           : false           # use AMP in training
          DDP           : false           # use DDP training
        
        LOSS:
          NAME          : ohemce          # loss function name (ohemce, ce, dice)
          CLS_WEIGHTS   : true            # use class weights in loss calculation
          THRESH        : 0.9             # ohemce threshold or dice delta if you choose ohemce loss or dice loss
        
        OPTIMIZER:
          NAME          : adamw           # optimizer name
          LR            : 0.01            # initial learning rate used in optimizer
          WEIGHT_DECAY  : 0.0001          # decay rate used in optimizer 
        
        SCHEDULER:
          NAME          : warmuppolylr    # scheduler name
          POWER         : 0.9             # scheduler power
          WARMUP        : 10              # warmup epochs used in scheduler
          WARMUP_RATIO  : 0.1             # warmup ratio
          
        
        EVAL:
          MODEL_PATH    : 'checkpoints/pretrained/segformer/segformer.b5.1024x1024.city.160k.pth'  # trained model file path
          IMAGE_SIZE    : [1024, 1024]                                                            # evaluation image size in (h, w)                       
          MSF:  
            ENABLE      : false                                                                 # multi-scale and flip evaluation  
            FLIP        : true                                                                  # use flip in evaluation  
            SCALES      : [0.5, 0.75, 1.0, 1.25, 1.5, 1.75]                                     # scales used in MSF evaluation                
        
        
        TEST:
          MODEL_PATH    : 'checkpoints/pretrained/ddrnet/ddrnet_23_city.pth'                     # trained model file path
          FILE          : 'cityscapes/leftImg8bit_trainextra/leftImg8bit/train_extra/wurzburg' # filename or foldername 
          IMAGE_SIZE    : [480, 640]                                                           # inference image size in (h, w)
          OVERLAY       : false      
    
    opened by evdokimos 3
  • About mIoU result on Cityscapes validation dataset

    About mIoU result on Cityscapes validation dataset

    Hi,

    Thanks for your work and providing code. I am just confuse about the mIoU reported in the paper on cityscapes validation dataset which is around 77%. Kindly let me know is this mIoU achieved on pretrained model (imagenet)? I am training DDR-Net-Slim23 from scratch and I am getting around 55% mIoU on validation data of Cityscapes.

    opened by tanveer6715 3
  • RuntimeError: The size of tensor a (4) must match the size of tensor b (18) at non-singleton dimension 1

    RuntimeError: The size of tensor a (4) must match the size of tensor b (18) at non-singleton dimension 1

    I have 18 classes using the cityscapes dataset, and I used to train my own images but I get this error from the Dice loss function at this point: tp = torch.sum(targets*preds, dim=(2, 3)) RuntimeError: The size of tensor a (4) must match the size of tensor b (18) at non-singleton dimension 1

    opened by ghost 3
  • ImageDraw.textbbox Attribute error in Visualize.py function

    ImageDraw.textbbox Attribute error in Visualize.py function "draw_text" when calling infer.py

    I am running infer.py and I get this error, any idea why? Traceback (most recent call last): File "tools/infer.py", line 100, in segmap,oversegmap,image = semseg.predict(str(test_file), cfg['TEST']['OVERLAY']) File "tools/infer.py", line 73, in predict seg_map,over_seg_map,image = self.postprocess(torch.tensor(image1), seg_map, overlay) File "tools/infer.py", line 60, in postprocess image = draw_text(seg_image, seg_map, self.labels) File "/content/trial/semseg/utils/visualize.py", line 28, in draw_text bbox = draw.textbbox(center, cls, font=fonts) AttributeError: 'ImageDraw' object has no attribute 'textbbox'

    opened by mhmd-mst 2
  • Double Checking implementation detail in SegFormerHead

    Double Checking implementation detail in SegFormerHead

    The paper on SegFormer suggests an All MLP decoder.

    Screen Shot 2022-04-23 at 3 03 57 AM

    The SegformerHead.py shows the use of a Conv2D for the final layer.

    Screen Shot 2022-04-23 at 3 06 18 AM

    Can you help me understand if this is a deviation from the paper or mentioned in a followup paper somewhere? I apologize in advance if there is an obvious answer.

    opened by RahulSinghalChicago 2
  • UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`

    UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`

    /usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)

    opened by RahulSinghalChicago 2
  • ModuleNotFoundError: No module named 'semseg'

    ModuleNotFoundError: No module named 'semseg'

    Hi there, When I tried to run infer.py, I got the following error: (sotas) [email protected]:~/semantic-segmentation$ python tools/infer.py --cfg configs/custom.yaml Traceback (most recent call last): File "tools/infer.py", line 12, in <module> from semseg.models import * ModuleNotFoundError: No module named 'semseg' I also tried to append the path to /semseg into sys.path, but it did not solve the problem. Could you please give a bit suggestions? Thanks

    opened by Shawn207 2
  • BiSeNetv2 implementation difference leads to strange results

    BiSeNetv2 implementation difference leads to strange results

    Hello, I train BiSeNetv2 in my custom dataset and find probability maps have a grid-like look, which is very unnatural.

    After further observation I find you use PixelShuffle on the last upsampling layer, which leads to the grid results(seems like you also use it in v1). But in BiSeNetv2 paper, the author uses simple bilinear interpolation to upsample results, which gives smoothing probability maps in my experiment. why use PixelShuffle here?

    So what's the idea of using PixelShuffle to upsample final results? Also, don't see an obvious improvement on validate score by using it tho.

    By the way, there are some differences between your implementation and the paper on Gather-and-Expansion Layer. Cannot figure out if this will affect training results significantly.

    opened by wasupandceacar 0
  • RuntimeError: transform: failed to sync: cudaErrorIllegalAddress: an 非法内存 access

    RuntimeError: transform: failed to sync: cudaErrorIllegalAddress: an 非法内存 access

    你好,请问我在使用DDR模型遇到这个错误'RuntimeError: transform: failed to sync: cudaErrorIllegalAddress: an 非法内存 access',请教一下遇到这样子错误吗?网上搜有说pytorch版本,还有标注文件 / 标签 等没有设置正确 以及网络模型、训练数据(图片和标签)没有放置到GPU上,我感觉这些都没有问题啊。期待您的回复

    opened by scl666 0
  • training error on colab

    training error on colab

    My all setup is successful on colab for training. However, when I run

    !python tools/train.py --cfg configs/CONFIG_FILE.yaml

    I get error:

    Found 20210 training images. Found 2000 validation images. Epoch: [1/500] Iter: [0/2526] LR: 0.00100000 Loss: 0.00000000: 0% 0/2526 [00:00<?, ?it/s] Traceback (most recent call last): File "tools/train.py", line 128, in main(cfg, gpu, save_dir) File "tools/train.py", line 69, in main for iter, (img, lbl) in pbar: File "/usr/local/lib/python3.7/dist-packages/tqdm/std.py", line 1195, in iter for obj in iterable: File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 681, in next data = self._next_data() File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1376, in _next_data return self._process_data(data) File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1402, in _process_data data.reraise() File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 461, in reraise raise exception RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/content/semantic-segmentation/semseg/datasets/ade20k.py", line 73, in getitem image, label = self.transform(image, label) File "/content/semantic-segmentation/semseg/augmentations.py", line 20, in call img, mask = transform(img, mask) File "/content/semantic-segmentation/semseg/augmentations.py", line 329, in call mask = TF.pad(mask, padding, fill=self.seg_fill) File "/usr/local/lib/python3.7/dist-packages/torchvision/transforms/functional.py", line 481, in pad return F_t.pad(img, padding=padding, fill=fill, padding_mode=padding_mode) File "/usr/local/lib/python3.7/dist-packages/torchvision/transforms/functional_tensor.py", line 418, in pad img = torch_pad(img, p, mode=padding_mode, value=float(fill)) RuntimeError: value cannot be converted to type uint8_t without overflow

    opened by hb0313 2
  • data format

    data format

    Is the annotation format the same for all? eg. ADE20K seems to be color images. I have COCO format original annotation data (grey scale) but it appears I need to change it to color?

    Thanks.

    opened by Tsardoz 1
Releases(v0.2.6)
Owner
sithu3
AI Developer
sithu3
NOMAD - A blackbox optimization software

################################################################################### #

Blackbox Optimization 78 Dec 29, 2022
FedTorch is an open-source Python package for distributed and federated training of machine learning models using PyTorch distributed API

FedTorch is a generic repository for benchmarking different federated and distributed learning algorithms using PyTorch Distributed API.

Machine Learning and Optimization Lab @PennState 136 Dec 23, 2022
Code for ECIR'20 paper Diagnosing BERT with Retrieval Heuristics

Bert Axioms This is the repository with the code for the Paper Diagnosing BERT with Retrieval Heuristics Required Data In order to run this code, you

Arthur Câmara 5 Jan 21, 2022
Aerial Single-View Depth Completion with Image-Guided Uncertainty Estimation (RA-L/ICRA 2020)

Aerial Depth Completion This work is described in the letter "Aerial Single-View Depth Completion with Image-Guided Uncertainty Estimation", by Lucas

ETHZ V4RL 70 Dec 22, 2022
Use unsupervised and supervised learning to predict stocks

AIAlpha: Multilayer neural network architecture for stock return prediction This project is meant to be an advanced implementation of stacked neural n

Vivek Palaniappan 1.5k Dec 26, 2022
Read and write layered TIFF ImageSourceData and ImageResources tags

Read and write layered TIFF ImageSourceData and ImageResources tags Psdtags is a Python library to read and write the Adobe Photoshop(r) specific Imag

Christoph Gohlke 4 Feb 05, 2022
A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis

A Shading-Guided Generative Implicit Model for Shape-Accurate 3D-Aware Image Synthesis Figure: Shape-Accurate 3D-Aware Image Synthesis. A Shading-Guid

Xingang Pan 115 Dec 18, 2022
Replication Package for AequeVox:Automated Fariness Testing for Speech Recognition Systems

AequeVox Replication Package for AequeVox:Automated Fariness Testing for Speech Recognition Systems README under development. Python Packages Required

Sai Sathiesh 2 Aug 28, 2022
Pytorch implementation of the DeepDream computer vision algorithm

deep-dream-in-pytorch Pytorch (https://github.com/pytorch/pytorch) implementation of the deep dream (https://en.wikipedia.org/wiki/DeepDream) computer

102 Dec 05, 2022
Sequence lineage information extracted from RKI sequence data repo

Pango lineage information for German SARS-CoV-2 sequences This repository contains a join of the metadata and pango lineage tables of all German SARS-

Cornelius Roemer 24 Oct 26, 2022
The Unsupervised Reinforcement Learning Benchmark (URLB)

The Unsupervised Reinforcement Learning Benchmark (URLB) URLB provides a set of leading algorithms for unsupervised reinforcement learning where agent

259 Dec 26, 2022
UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language This repository contains UA-GEC data and an accompanying Python lib

Grammarly 226 Dec 29, 2022
Code for the paper: Hierarchical Reinforcement Learning With Timed Subgoals, published at NeurIPS 2021

Hierarchical reinforcement learning with Timed Subgoals (HiTS) This repository contains code for reproducing experiments from our paper "Hierarchical

Autonomous Learning Group 21 Dec 03, 2022
[CVPR 2021] Few-shot 3D Point Cloud Semantic Segmentation

Few-shot 3D Point Cloud Semantic Segmentation Created by Na Zhao from National University of Singapore Introduction This repository contains the PyTor

117 Dec 27, 2022
Autoregressive Models in PyTorch.

Autoregressive This repository contains all the necessary PyTorch code, tailored to my presentation, to train and generate data from WaveNet-like auto

Christoph Heindl 41 Oct 09, 2022
Neurons Dataset API - The official dataloader and visualization tools for Neurons Datasets.

Neurons Dataset API - The official dataloader and visualization tools for Neurons Datasets. Introduction We propose our dataloader API for loading and

1 Nov 19, 2021
Analyses of the individual electric field magnitudes with Roast.

Aloi Davide - PhD Student (UoB) Analysis of electric field magnitudes (wp2a dataset only at the moment) and correlation analysis with Dynamic Causal M

Davide Aloi 7 Dec 15, 2022
This implements one of result networks from Large-scale evolution of image classifiers

Exotic structured image classifier This implements one of result networks from Large-scale evolution of image classifiers by Esteban Real, et. al. Req

54 Nov 25, 2022
A Distributional Approach To Controlled Text Generation

A Distributional Approach To Controlled Text Generation This is the repository code for the ICLR 2021 paper "A Distributional Approach to Controlled T

NAVER 102 Jan 07, 2023
Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection, AAAI 2021.

Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection This repository is an official implementation of the AAAI 2021 paper Co-mi

MEGVII Research 20 Dec 07, 2022