Easy to use and customizable SOTA Semantic Segmentation models with abundant datasets in PyTorch

Overview

Semantic Segmentation

Easy to use and customizable SOTA Semantic Segmentation models with abundant datasets in PyTorch

Open In Colab

banner

Features

  • Applicable to following tasks:
    • Scene Parsing
    • Human Parsing
    • Face Parsing
  • 20+ Datasets
  • 10+ SOTA Backbones
  • 10+ SOTA Semantic Segmentation Models
  • PyTorch, ONNX, TFLite and OpenVINO Inference

Model Zoo

Supported Backbones:

Supported Heads/Methods:

Supported Standalone Models:

Supported Modules:

  • PPM (CVPR 2017)
  • PSA (ArXiv 2021)
ADE20K-val (Scene Parsing)
Method Backbone mIoU (%) Params
(M)
GFLOPs
(512x512)
Weights
SegFormer MiT-B1 43.1 14 16 pt
MiT-B2 47.5 28 62 pt
MiT-B3 50.0 47 79 pt
CityScapes-val (Scene Parsing)
Method Backbone mIoU (%) Params (M) GFLOPs Img Size Weights
SegFormer MiT-B0 78.1 4 126 1024x1024 N/A
MiT-B1 80.0 14 244 1024x1024 N/A
FaPN ResNet-50 80.0 33 - 512x1024 N/A
SFNet ResNetD-18 79.0 13 - 1024x1024 N/A
FCHarDNet HarDNet-70 77.7 4 35 1024x1024 pt
DDRNet DDRNet-23slim 77.8 6 36 1024x2048 pt
HELEN-val (Face Parsing)
Method Backbone mIoU (%) Params
(M)
GFLOPs
(512x512)
FPS
(GTX1660ti)
Weights
BiSeNetv1 MobileNetV2-1.0 58.22 5 5 160 pt
BiSeNetv1 ResNet-18 58.50 14 13 263 pt
BiSeNetv2 - 58.58 18 15 195 pt
FCHarDNet HarDNet-70 59.38 4 4 130 pt
DDRNet DDRNet-23slim 61.11 6 5 180 pt|tflite(fp32)|tflite(fp16)|tflite(int8)
SegFormer MiT-B0 59.31 4 8 75 pt
SFNet ResNetD-18 61.00 14 31 56 pt
Backbones
Model Variants ImageNet-1k Top-1 Acc (%) Params (M) GFLOPs Weights
MicroNet M1|M2|M3 51.4|59.4|62.5 1|2|3 6M|12M|21M download
MobileNetV2 1.0 71.9 3 300M download
MobileNetV3 S|L 67.7|74.0 3|5 56M|219M S|L
ResNet 18|50|101 69.8|76.1|77.4 12|25|44 2|4|8 download
ResNetD 18|50|101 - 12|25|44 2|4|8 download
MiT B1|B2|B3 - 14|25|45 2|4|8 download
PVTv2 B1|B2|B4 78.7|82.0|83.6 14|25|63 2|4|10 download
ResT S|B|L 79.6|81.6|83.6 14|30|52 2|4|8 download

Notes: Download backbones' weights for HarDNet-70 and DDRNet-23slim.

Supported Datasets

Dataset Type Categories Train
Images
Val
Images
Test
Images
Image Size
(HxW)
COCO-Stuff General Scene Parsing 171 118,000 5,000 20,000 -
ADE20K General Scene Parsing 150 20,210 2,000 3,352 -
PASCALContext General Scene Parsing 59 4,996 5,104 9,637 -
SUN RGB-D Indoor Scene Parsing 37 2,666 2,619 5,050+labels -
Mapillary Vistas Street Scene Parsing 65 18,000 2,000 5,000 1080x1920
CityScapes Street Scene Parsing 19 2,975 500 1,525+labels 1024x2048
CamVid Street Scene Parsing 11 367 101 233+labels 720x960
MHPv2 Multi-Human Parsing 59 15,403 5,000 5,000 -
MHPv1 Multi-Human Parsing 19 3,000 1,000 980+labels -
LIP Multi-Human Parsing 20 30,462 10,000 - -
CCIHP Multi-Human Parsing 22 28,280 5,000 5,000 -
CIHP Multi-Human Parsing 20 28,280 5,000 5,000 -
ATR Single-Human Parsing 18 16,000 700 1,000+labels -
HELEN Face Parsing 11 2,000 230 100+labels -
LaPa Face Parsing 11 18,176 2,000 2,000+labels -
iBugMask Face Parsing 11 21,866 - 1,000+labels -
CelebAMaskHQ Face Parsing 19 24,183 2,993 2,824+labels 512x512
FaceSynthetics Face Parsing (Synthetic) 19 100,000 1,000 100+labels 512x512
SUIM Underwater Imagery 8 1,525 - 110+labels -

Check DATASETS to find more segmentation datasets.

Datasets Structure (click to expand)

Datasets should have the following structure:

data
|__ ADEChallenge
    |__ ADEChallengeData2016
        |__ images
            |__ training
            |__ validation
        |__ annotations
            |__ training
            |__ validation

|__ CityScapes
    |__ leftImg8bit
        |__ train
        |__ val
        |__ test
    |__ gtFine
        |__ train
        |__ val
        |__ test

|__ CamVid
    |__ train
    |__ val
    |__ test
    |__ train_labels
    |__ val_labels
    |__ test_labels
    
|__ VOCdevkit
    |__ VOC2010
        |__ JPEGImages
        |__ SegmentationClassContext
        |__ ImageSets
            |__ SegmentationContext
                |__ train.txt
                |__ val.txt
    
|__ COCO
    |__ images
        |__ train2017
        |__ val2017
    |__ labels
        |__ train2017
        |__ val2017

|__ MHPv1
    |__ images
    |__ annotations
    |__ train_list.txt
    |__ test_list.txt

|__ MHPv2
    |__ train
        |__ images
        |__ parsing_annos
    |__ val
        |__ images
        |__ parsing_annos

|__ LIP
    |__ LIP
        |__ TrainVal_images
            |__ train_images
            |__ val_images
        |__ TrainVal_parsing_annotations
            |__ train_segmentations
            |__ val_segmentations

    |__ CIHP/CCIHP
        |__ instance-leve_human_parsing
            |__ Training
                |__ Images
                |__ Category_ids
            |__ Validation
                |__ Images
                |__ Category_ids

    |__ ATR
        |__ humanparsing
            |__ JPEGImages
            |__ SegmentationClassAug

|__ SUIM
    |__ train_val
        |__ images
        |__ masks
    |__ TEST
        |__ images
        |__ masks

|__ SunRGBD
    |__ SUNRGBD
        |__ kv1/kv2/realsense/xtion
    |__ SUNRGBDtoolbox
        |__ traintestSUNRGBD
            |__ allsplit.mat

|__ Mapillary
    |__ training
        |__ images
        |__ labels
    |__ validation
        |__ images
        |__ labels

|__ SmithCVPR2013_dataset_resized (HELEN)
    |__ images
    |__ labels
    |__ exemplars.txt
    |__ testing.txt
    |__ tuning.txt

|__ CelebAMask-HQ
    |__ CelebA-HQ-img
    |__ CelebAMask-HQ-mask-anno
    |__ CelebA-HQ-to-CelebA-mapping.txt

|__ LaPa
    |__ train
        |__ images
        |__ labels
    |__ val
        |__ images
        |__ labels
    |__ test
        |__ images
        |__ labels

|__ ibugmask_release
    |__ train
    |__ test

|__ FaceSynthetics
    |__ dataset_100000
    |__ dataset_1000
    |__ dataset_100

Note: For PASCALContext, download the annotations from here and put it in VOC2010.

Note: For CelebAMask-HQ, run the preprocess script. python3 scripts/preprocess_celebamaskhq.py --root .


Augmentations (click to expand)

Check out the notebook here to test the augmentation effects.

Pixel-level Transforms:

  • ColorJitter (Brightness, Contrast, Saturation, Hue)
  • Gamma, Sharpness, AutoContrast, Equalize, Posterize
  • GaussianBlur, Grayscale

Spatial-level Transforms:

  • Affine, RandomRotation
  • HorizontalFlip, VerticalFlip
  • CenterCrop, RandomCrop
  • Pad, ResizePad, Resize
  • RandomResizedCrop

Usage

Requirements
  • python >= 3.6
  • torch >= 1.8.1
  • torchvision >= 0.9.1

Other requirements can be installed with pip install -r requirements.txt.


Configuration (click to expand)

Create a configuration file in configs. Sample configuration for ADE20K dataset can be found here. Then edit the fields you think if it is needed. This configuration file is needed for all of training, evaluation and prediction scripts.


Training (click to expand)

To train with a single GPU:

$ python tools/train.py --cfg configs/CONFIG_FILE.yaml

To train with multiple gpus, set DDP field in config file to true and run as follows:

$ python -m torch.distributed.launch --nproc_per_node=2 --use_env tools/train.py --cfg configs/<CONFIG_FILE_NAME>.yaml

Evaluation (click to expand)

Make sure to set MODEL_PATH of the configuration file to your trained model directory.

$ python tools/val.py --cfg configs/<CONFIG_FILE_NAME>.yaml

To evaluate with multi-scale and flip, change ENABLE field in MSF to true and run the same command as above.


Inference

To make an inference, edit the parameters of the config file from below.

  • Change MODEL >> NAME and VARIANT to your desired pretrained model.
  • Change DATASET >> NAME to the dataset name depending on the pretrained model.
  • Set TEST >> MODEL_PATH to pretrained weights of the testing model.
  • Change TEST >> FILE to the file or image folder path you want to test.
  • Testing results will be saved in SAVE_DIR.
## example using ade20k pretrained models
$ python tools/infer.py --cfg configs/ade20k.yaml

Example test results:

test_result


Convert to other Frameworks (ONNX, CoreML, OpenVINO, TFLite)

To convert to ONNX and CoreML, run:

$ python tools/export.py --cfg configs/<CONFIG_FILE_NAME>.yaml

To convert to OpenVINO and TFLite, see torch_optimize.


Inference (ONNX, OpenVINO, TFLite)
## ONNX Inference
$ python scripts/onnx_infer.py --model <ONNX_MODEL_PATH> --img-path <TEST_IMAGE_PATH>

## OpenVINO Inference
$ python scripts/openvino_infer.py --model <OpenVINO_MODEL_PATH> --img-path <TEST_IMAGE_PATH>

## TFLite Inference
$ python scripts/tflite_infer.py --model <TFLite_MODEL_PATH> --img-path <TEST_IMAGE_PATH>

References (click to expand)

Citations (click to expand)
@article{xie2021segformer,
  title={SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers},
  author={Xie, Enze and Wang, Wenhai and Yu, Zhiding and Anandkumar, Anima and Alvarez, Jose M and Luo, Ping},
  journal={arXiv preprint arXiv:2105.15203},
  year={2021}
}

@misc{xiao2018unified,
  title={Unified Perceptual Parsing for Scene Understanding}, 
  author={Tete Xiao and Yingcheng Liu and Bolei Zhou and Yuning Jiang and Jian Sun},
  year={2018},
  eprint={1807.10221},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@article{hong2021deep,
  title={Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes},
  author={Hong, Yuanduo and Pan, Huihui and Sun, Weichao and Jia, Yisong},
  journal={arXiv preprint arXiv:2101.06085},
  year={2021}
}

@misc{zhang2021rest,
  title={ResT: An Efficient Transformer for Visual Recognition}, 
  author={Qinglong Zhang and Yubin Yang},
  year={2021},
  eprint={2105.13677},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{huang2021fapn,
  title={FaPN: Feature-aligned Pyramid Network for Dense Image Prediction}, 
  author={Shihua Huang and Zhichao Lu and Ran Cheng and Cheng He},
  year={2021},
  eprint={2108.07058},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{wang2021pvtv2,
  title={PVTv2: Improved Baselines with Pyramid Vision Transformer}, 
  author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
  year={2021},
  eprint={2106.13797},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@article{Liu2021PSA,
  title={Polarized Self-Attention: Towards High-quality Pixel-wise Regression},
  author={Huajun Liu and Fuqiang Liu and Xinyi Fan and Dong Huang},
  journal={Arxiv Pre-Print arXiv:2107.00782 },
  year={2021}
}

@misc{chao2019hardnet,
  title={HarDNet: A Low Memory Traffic Network}, 
  author={Ping Chao and Chao-Yang Kao and Yu-Shan Ruan and Chien-Hsiang Huang and Youn-Long Lin},
  year={2019},
  eprint={1909.00948},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@inproceedings{sfnet,
  title={Semantic Flow for Fast and Accurate Scene Parsing},
  author={Li, Xiangtai and You, Ansheng and Zhu, Zhen and Zhao, Houlong and Yang, Maoke and Yang, Kuiyuan and Tong, Yunhai},
  booktitle={ECCV},
  year={2020}
}

@article{Li2020SRNet,
  title={Towards Efficient Scene Understanding via Squeeze Reasoning},
  author={Xiangtai Li and Xia Li and Ansheng You and Li Zhang and Guang-Liang Cheng and Kuiyuan Yang and Y. Tong and Zhouchen Lin},
  journal={ArXiv},
  year={2020},
  volume={abs/2011.03308}
}

@ARTICLE{Yucondnet21,
  author={Yu, Changqian and Shao, Yuanjie and Gao, Changxin and Sang, Nong},
  journal={IEEE Signal Processing Letters}, 
  title={CondNet: Conditional Classifier for Scene Segmentation}, 
  year={2021},
  volume={28},
  number={},
  pages={758-762},
  doi={10.1109/LSP.2021.3070472}
}

Comments
  • RuntimeError: CUDA error: an illegal memory access was encountered

    RuntimeError: CUDA error: an illegal memory access was encountered

    The error was occured after changing batch_size from 8 to 4 in cityscapes.yaml.

    Found 2975 train images. Found 500 val images. Epoch: [1/500] Iter: [1/185] LR: 0.00010049 Loss: 7.71337414: 1%|▌ | 1/185 [00:03<11:41, 3.81s/it] Traceback (most recent call last): File "tools/train.py", line 153, in main(cfg, gpu, save_dir) File "tools/train.py", line 97, in main scaler.scale(loss).backward() File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/_tensor.py", line 255, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/autograd/init.py", line 147, in backward Variable._execution_engine.run_backward( RuntimeError: CUDA error: an illegal memory access was encountered (semseg) [email protected]:~/Documents/projects/semantic-segmentation$ CUDA_LAUNCH_BLOCKING=1 python tools/train.py --cfg configs/DDRNet/cityscapes.yaml Found 2975 train images. Found 500 val images. Epoch: [1/500] Iter: [1/743] LR: 0.00010012 Loss: 7.53493118: 0%|▏ | 1/743 [00:03<37:21, 3.02s/it] Traceback (most recent call last): File "tools/train.py", line 153, in main(cfg, gpu, save_dir) File "tools/train.py", line 95, in main loss = loss_fn(logits, lbl) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/liuwang/Documents/projects/semantic-segmentation/semseg/losses.py", line 43, in forward return sum([w * self._forward(pred, labels) for (pred, w) in zip(preds, self.aux_weights)]) File "/home/liuwang/Documents/projects/semantic-segmentation/semseg/losses.py", line 43, in return sum([w * self._forward(pred, labels) for (pred, w) in zip(preds, self.aux_weights)]) File "/home/liuwang/Documents/projects/semantic-segmentation/semseg/losses.py", line 33, in _forward loss = self.criterion(preds, labels).view(-1) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 1120, in forward return F.cross_entropy(input, target, weight=self.weight, File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/nn/functional.py", line 2824, in cross_entropy return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index) RuntimeError: CUDA error: an illegal memory access was encountered Exception in thread Thread-2: Traceback (most recent call last): File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 28, in _pin_memory_loop r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/queues.py", line 116, in get return _ForkingPickler.loads(res) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 289, in rebuild_storage_fd fd = df.detach() File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/resource_sharer.py", line 57, in detach with _resource_sharer.get_connection(self._id) as conn: File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/resource_sharer.py", line 87, in get_connection c = Client(address, authkey=process.current_process().authkey) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/connection.py", line 508, in Client answer_challenge(c, authkey) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/connection.py", line 752, in answer_challenge message = connection.recv_bytes(256) # reject large message File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes buf = self._recv_bytes(maxlength) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes buf = self._recv(4) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/connection.py", line 379, in _recv chunk = read(handle, remaining) ConnectionResetError: [Errno 104] Connection reset by peer

    opened by StuLiu 9
  • Augmentation configuration

    Augmentation configuration

    Presently the augmentations used during training are hard-coded in semseg/augmentations.py. By default horizontal flip is enabled, which will be problematic for datasets where the orientation of the image matters (e.g. facial datasets may label the left and right eyes independently, and flipping the image would also require swapping these two labels).

    Are there any future plans to allow for augmentation to be configurable?

    opened by markdjwilliams 5
  • Performance issues with ddrnet

    Performance issues with ddrnet

    Hello, I am using ddrnet , i tried several other models with the same dataset i am working on and they improve over time except ddrnet after 10 epoch the miou starts decreasing , Does that mean ddrnet perform bad or is something wrong with the code i should fix?

    opened by mhmd-mst 5
  • Question about the dimensionality of the mask.

    Question about the dimensionality of the mask.

    Thank you for your work.

    It would be nice to see the actual performance of the models in fps on specific hardware. Particularly on devices like jetson.

    The training requires the mask to be gray, and in the file that describes the dataset PALETTE has a dimension of 3. Can you tell me what should be the dimensionality of PALETTE (for example my labels will be (1,1,1) or 1 etc.)

    opened by MsWik 4
  • SegFormer B5 evaluation error on cityscapes

    SegFormer B5 evaluation error on cityscapes

    Hello, I try to implement an evaluation of segformer b5 to cityscapes dataset, but I keep getting an error related to the network's structure. I attached a file with the error. error.txt Even though I modified a custom.yaml file, which is like this one, I still keep getting an error. What can i do to solve this?

        MODEL:                                    
          NAME          : SegFormer                                           # name of the model you are using
          VARIANT       : B5                                                  # model variant
          PRETRAINED    : 'checkpoints/backbones/hardnet/hardnet_70.pth'              # backbone model's weight 
        
        DATASET:
          NAME          : cityscapes                                              # dataset name to be trained with (camvid, cityscapes, ade20k)
          ROOT          : '/media/evdo/DATA/diploma_project/cityscapes/gtFine_trainvaltest'                         # dataset root path
        
        TRAIN:
          IMAGE_SIZE    : [1024, 1024]    # training image size in (h, w)
          BATCH_SIZE    : 8               # batch size used to train
          EPOCHS        : 500             # number of epochs to train
          EVAL_INTERVAL : 50              # evaluation interval during training
          AMP           : false           # use AMP in training
          DDP           : false           # use DDP training
        
        LOSS:
          NAME          : ohemce          # loss function name (ohemce, ce, dice)
          CLS_WEIGHTS   : true            # use class weights in loss calculation
          THRESH        : 0.9             # ohemce threshold or dice delta if you choose ohemce loss or dice loss
        
        OPTIMIZER:
          NAME          : adamw           # optimizer name
          LR            : 0.01            # initial learning rate used in optimizer
          WEIGHT_DECAY  : 0.0001          # decay rate used in optimizer 
        
        SCHEDULER:
          NAME          : warmuppolylr    # scheduler name
          POWER         : 0.9             # scheduler power
          WARMUP        : 10              # warmup epochs used in scheduler
          WARMUP_RATIO  : 0.1             # warmup ratio
          
        
        EVAL:
          MODEL_PATH    : 'checkpoints/pretrained/segformer/segformer.b5.1024x1024.city.160k.pth'  # trained model file path
          IMAGE_SIZE    : [1024, 1024]                                                            # evaluation image size in (h, w)                       
          MSF:  
            ENABLE      : false                                                                 # multi-scale and flip evaluation  
            FLIP        : true                                                                  # use flip in evaluation  
            SCALES      : [0.5, 0.75, 1.0, 1.25, 1.5, 1.75]                                     # scales used in MSF evaluation                
        
        
        TEST:
          MODEL_PATH    : 'checkpoints/pretrained/ddrnet/ddrnet_23_city.pth'                     # trained model file path
          FILE          : 'cityscapes/leftImg8bit_trainextra/leftImg8bit/train_extra/wurzburg' # filename or foldername 
          IMAGE_SIZE    : [480, 640]                                                           # inference image size in (h, w)
          OVERLAY       : false      
    
    opened by evdokimos 3
  • About mIoU result on Cityscapes validation dataset

    About mIoU result on Cityscapes validation dataset

    Hi,

    Thanks for your work and providing code. I am just confuse about the mIoU reported in the paper on cityscapes validation dataset which is around 77%. Kindly let me know is this mIoU achieved on pretrained model (imagenet)? I am training DDR-Net-Slim23 from scratch and I am getting around 55% mIoU on validation data of Cityscapes.

    opened by tanveer6715 3
  • RuntimeError: The size of tensor a (4) must match the size of tensor b (18) at non-singleton dimension 1

    RuntimeError: The size of tensor a (4) must match the size of tensor b (18) at non-singleton dimension 1

    I have 18 classes using the cityscapes dataset, and I used to train my own images but I get this error from the Dice loss function at this point: tp = torch.sum(targets*preds, dim=(2, 3)) RuntimeError: The size of tensor a (4) must match the size of tensor b (18) at non-singleton dimension 1

    opened by ghost 3
  • ImageDraw.textbbox Attribute error in Visualize.py function

    ImageDraw.textbbox Attribute error in Visualize.py function "draw_text" when calling infer.py

    I am running infer.py and I get this error, any idea why? Traceback (most recent call last): File "tools/infer.py", line 100, in segmap,oversegmap,image = semseg.predict(str(test_file), cfg['TEST']['OVERLAY']) File "tools/infer.py", line 73, in predict seg_map,over_seg_map,image = self.postprocess(torch.tensor(image1), seg_map, overlay) File "tools/infer.py", line 60, in postprocess image = draw_text(seg_image, seg_map, self.labels) File "/content/trial/semseg/utils/visualize.py", line 28, in draw_text bbox = draw.textbbox(center, cls, font=fonts) AttributeError: 'ImageDraw' object has no attribute 'textbbox'

    opened by mhmd-mst 2
  • Double Checking implementation detail in SegFormerHead

    Double Checking implementation detail in SegFormerHead

    The paper on SegFormer suggests an All MLP decoder.

    Screen Shot 2022-04-23 at 3 03 57 AM

    The SegformerHead.py shows the use of a Conv2D for the final layer.

    Screen Shot 2022-04-23 at 3 06 18 AM

    Can you help me understand if this is a deviation from the paper or mentioned in a followup paper somewhere? I apologize in advance if there is an obvious answer.

    opened by RahulSinghalChicago 2
  • UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`

    UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`

    /usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)

    opened by RahulSinghalChicago 2
  • ModuleNotFoundError: No module named 'semseg'

    ModuleNotFoundError: No module named 'semseg'

    Hi there, When I tried to run infer.py, I got the following error: (sotas) [email protected]:~/semantic-segmentation$ python tools/infer.py --cfg configs/custom.yaml Traceback (most recent call last): File "tools/infer.py", line 12, in <module> from semseg.models import * ModuleNotFoundError: No module named 'semseg' I also tried to append the path to /semseg into sys.path, but it did not solve the problem. Could you please give a bit suggestions? Thanks

    opened by Shawn207 2
  • BiSeNetv2 implementation difference leads to strange results

    BiSeNetv2 implementation difference leads to strange results

    Hello, I train BiSeNetv2 in my custom dataset and find probability maps have a grid-like look, which is very unnatural.

    After further observation I find you use PixelShuffle on the last upsampling layer, which leads to the grid results(seems like you also use it in v1). But in BiSeNetv2 paper, the author uses simple bilinear interpolation to upsample results, which gives smoothing probability maps in my experiment. why use PixelShuffle here?

    So what's the idea of using PixelShuffle to upsample final results? Also, don't see an obvious improvement on validate score by using it tho.

    By the way, there are some differences between your implementation and the paper on Gather-and-Expansion Layer. Cannot figure out if this will affect training results significantly.

    opened by wasupandceacar 0
  • RuntimeError: transform: failed to sync: cudaErrorIllegalAddress: an 非法内存 access

    RuntimeError: transform: failed to sync: cudaErrorIllegalAddress: an 非法内存 access

    你好,请问我在使用DDR模型遇到这个错误'RuntimeError: transform: failed to sync: cudaErrorIllegalAddress: an 非法内存 access',请教一下遇到这样子错误吗?网上搜有说pytorch版本,还有标注文件 / 标签 等没有设置正确 以及网络模型、训练数据(图片和标签)没有放置到GPU上,我感觉这些都没有问题啊。期待您的回复

    opened by scl666 0
  • training error on colab

    training error on colab

    My all setup is successful on colab for training. However, when I run

    !python tools/train.py --cfg configs/CONFIG_FILE.yaml

    I get error:

    Found 20210 training images. Found 2000 validation images. Epoch: [1/500] Iter: [0/2526] LR: 0.00100000 Loss: 0.00000000: 0% 0/2526 [00:00<?, ?it/s] Traceback (most recent call last): File "tools/train.py", line 128, in main(cfg, gpu, save_dir) File "tools/train.py", line 69, in main for iter, (img, lbl) in pbar: File "/usr/local/lib/python3.7/dist-packages/tqdm/std.py", line 1195, in iter for obj in iterable: File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 681, in next data = self._next_data() File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1376, in _next_data return self._process_data(data) File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1402, in _process_data data.reraise() File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 461, in reraise raise exception RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/content/semantic-segmentation/semseg/datasets/ade20k.py", line 73, in getitem image, label = self.transform(image, label) File "/content/semantic-segmentation/semseg/augmentations.py", line 20, in call img, mask = transform(img, mask) File "/content/semantic-segmentation/semseg/augmentations.py", line 329, in call mask = TF.pad(mask, padding, fill=self.seg_fill) File "/usr/local/lib/python3.7/dist-packages/torchvision/transforms/functional.py", line 481, in pad return F_t.pad(img, padding=padding, fill=fill, padding_mode=padding_mode) File "/usr/local/lib/python3.7/dist-packages/torchvision/transforms/functional_tensor.py", line 418, in pad img = torch_pad(img, p, mode=padding_mode, value=float(fill)) RuntimeError: value cannot be converted to type uint8_t without overflow

    opened by hb0313 2
  • data format

    data format

    Is the annotation format the same for all? eg. ADE20K seems to be color images. I have COCO format original annotation data (grey scale) but it appears I need to change it to color?

    Thanks.

    opened by Tsardoz 1
Releases(v0.2.6)
Owner
sithu3
AI Developer
sithu3
TorchMD-Net provides state-of-the-art graph neural networks and equivariant transformer neural networks potentials for learning molecular potentials

TorchMD-net TorchMD-Net provides state-of-the-art graph neural networks and equivariant transformer neural networks potentials for learning molecular

TorchMD 104 Jan 03, 2023
Pixel-Perfect Structure-from-Motion with Featuremetric Refinement (ICCV 2021, Oral)

Pixel-Perfect Structure-from-Motion (ICCV 2021 Oral) We introduce a framework that improves the accuracy of Structure-from-Motion by refining keypoint

Computer Vision and Geometry Lab 831 Dec 29, 2022
Code for Multinomial Diffusion

Code for Multinomial Diffusion Abstract Generative flows and diffusion models have been predominantly trained on ordinal data, for example natural ima

104 Jan 04, 2023
SalGAN: Visual Saliency Prediction with Generative Adversarial Networks

SalGAN: Visual Saliency Prediction with Adversarial Networks Junting Pan Cristian Canton Ferrer Kevin McGuinness Noel O'Connor Jordi Torres Elisa Sayr

Image Processing Group - BarcelonaTECH - UPC 347 Nov 22, 2022
An investigation project for SISR.

SISR-Survey An investigation project for SISR. This repository is an official project of the paper "From Beginner to Master: A Survey for Deep Learnin

Juncheng Li 79 Oct 20, 2022
给yolov5加个gui界面,使用pyqt5,yolov5是5.0版本

博文地址 https://xugaoxiang.com/2021/06/30/yolov5-pyqt5 代码执行 项目中使用YOLOv5的v5.0版本,界面文件是project.ui pip install -r requirements.txt python main.py 图片检测 视频检测

Xu GaoXiang 215 Dec 30, 2022
learned_optimization: Training and evaluating learned optimizers in JAX

learned_optimization: Training and evaluating learned optimizers in JAX learned_optimization is a research codebase for training learned optimizers. I

Google 533 Dec 30, 2022
Official PyTorch implementation of the paper Image-Based CLIP-Guided Essence Transfer.

TargetCLIP- official pytorch implementation of the paper Image-Based CLIP-Guided Essence Transfer This repository finds a global direction in StyleGAN

Hila Chefer 221 Dec 13, 2022
Self-Supervised Pre-Training for Transformer-Based Person Re-Identification

Self-Supervised Pre-Training for Transformer-Based Person Re-Identification [pdf] The official repository for Self-Supervised Pre-Training for Transfo

Hao Luo 116 Jan 04, 2023
Our CIKM21 Paper "Incorporating Query Reformulating Behavior into Web Search Evaluation"

Reformulation-Aware-Metrics Introduction This codebase contains source-code of the Python-based implementation of our CIKM 2021 paper. Chen, Jia, et a

xuanyuan14 5 Mar 05, 2022
Implemenets the Contourlet-CNN as described in C-CNN: Contourlet Convolutional Neural Networks, using PyTorch

C-CNN: Contourlet Convolutional Neural Networks This repo implemenets the Contourlet-CNN as described in C-CNN: Contourlet Convolutional Neural Networ

Goh Kun Shun (KHUN) 10 Nov 03, 2022
A solution to the 2D Ising model of ferromagnetism, implemented using the Metropolis algorithm

Solving the Ising model on a 2D lattice using the Metropolis Algorithm Introduction The Ising model is a simplified model of ferromagnetism, the pheno

Rohit Prabhu 5 Nov 13, 2022
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)

English | 简体中文 Welcome to the PaddlePaddle GitHub. PaddlePaddle, as the only independent R&D deep learning platform in China, has been officially open

19.4k Jan 04, 2023
Pytorch implementation of Decoupled Spatial-Temporal Transformer for Video Inpainting

Decoupled Spatial-Temporal Transformer for Video Inpainting By Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, J

51 Dec 13, 2022
Research code of ICCV 2021 paper "Mesh Graphormer"

MeshGraphormer ✨ ✨ This is our research code of Mesh Graphormer. Mesh Graphormer is a new transformer-based method for human pose and mesh reconsructi

Microsoft 251 Jan 08, 2023
PyTorch implementation of Barlow Twins.

Barlow Twins: Self-Supervised Learning via Redundancy Reduction PyTorch implementation of Barlow Twins. @article{zbontar2021barlow, title={Barlow Tw

Facebook Research 839 Dec 29, 2022
The Agriculture Domain of ERPNext comes with features to record crops and land

Agriculture The Agriculture Domain of ERPNext comes with features to record crops and land, track plant, soil, water, weather analytics, and even trac

Frappe 21 Jan 02, 2023
A very simple baseline to estimate 2D & 3D SMPL-compatible keypoints from a single color image.

Minimal Body A very simple baseline to estimate 2D & 3D SMPL-compatible keypoints from a single color image. The model file is only 51.2 MB and runs a

Yuxiao Zhou 49 Dec 05, 2022
FMA: A Dataset For Music Analysis

FMA: A Dataset For Music Analysis Michaël Defferrard, Kirell Benzi, Pierre Vandergheynst, Xavier Bresson. International Society for Music Information

Michaël Defferrard 1.8k Dec 29, 2022
Codes for our paper "SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge" (EMNLP 2020)

SentiLARE: Sentiment-Aware Language Representation Learning with Linguistic Knowledge Introduction SentiLARE is a sentiment-aware pre-trained language

74 Dec 30, 2022