Easy to use and customizable SOTA Semantic Segmentation models with abundant datasets in PyTorch

Last update: Jan 05, 2023

Overview

Semantic Segmentation

Easy to use and customizable SOTA Semantic Segmentation models with abundant datasets in PyTorch

Features

Applicable to following tasks:
- Scene Parsing
- Human Parsing
- Face Parsing
20+ Datasets
10+ SOTA Backbones
10+ SOTA Semantic Segmentation Models
PyTorch, ONNX, TFLite and OpenVINO Inference

Model Zoo

Supported Backbones:

ResNet (CVPR 2016)
ResNetD (ArXiv 2018)
MobileNetV2 (CVPR 2018)
MobileNetV3 (ICCV 2019)
PVTv2 (ArXiv 2021)
ResT (ArXiv 2021)
MicroNet (ICCV 2021)

Supported Heads/Methods:

FCN (CVPR 2015)
UPerNet (ECCV 2018)
BiSeNetv1 (ECCV 2018)
FPN (CVPR 2019)
SFNet (ECCV 2020)
SegFormer (ArXiv 2021)
FaPN (ICCV 2021)
CondNet (IEEE SPL 2021)

Supported Standalone Models:

FCHarDNet (ICCV 2019)
BiSeNetv2 (IJCV 2021)
DDRNet (ArXiv 2021)

Supported Modules:

PPM (CVPR 2017)
PSA (ArXiv 2021)

ADE20K-val (Scene Parsing)

Method	Backbone	mIoU (%)	Params ^(M)	GFLOPs ^(512x512)	Weights
SegFormer	MiT-B1	43.1	14	16	pt
	MiT-B2	47.5	28	62	pt
	MiT-B3	50.0	47	79	pt

CityScapes-val (Scene Parsing)

Method	Backbone	mIoU (%)	Params (M)	GFLOPs	Img Size	Weights
SegFormer	MiT-B0	78.1	4	126	1024x1024	N/A
	MiT-B1	80.0	14	244	1024x1024	N/A
FaPN	ResNet-50	80.0	33	-	512x1024	N/A
SFNet	ResNetD-18	79.0	13	-	1024x1024	N/A
FCHarDNet	HarDNet-70	77.7	4	35	1024x1024	pt
DDRNet	DDRNet-23slim	77.8	6	36	1024x2048	pt

HELEN-val (Face Parsing)

Method	Backbone	mIoU (%)	Params ^(M)	GFLOPs ^(512x512)	FPS ^(GTX1660ti)	Weights
BiSeNetv1	MobileNetV2-1.0	58.22	5	5	160	pt
BiSeNetv1	ResNet-18	58.50	14	13	263	pt
BiSeNetv2	-	58.58	18	15	195	pt
FCHarDNet	HarDNet-70	59.38	4	4	130	pt
DDRNet	DDRNet-23slim	61.11	6	5	180	pt\|tflite(fp32)\|tflite(fp16)\|tflite(int8)
SegFormer	MiT-B0	59.31	4	8	75	pt
SFNet	ResNetD-18	61.00	14	31	56	pt

Backbones

Model	Variants	ImageNet-1k Top-1 Acc (%)	Params (M)	GFLOPs	Weights
MicroNet	M1\|M2\|M3	51.4`\|`59.4`\|`62.5	1`\|`2`\|`3	6M`\|`12M`\|`21M	download
MobileNetV2	1.0	71.9	3	300M	download
MobileNetV3	S\|L	67.7`\|`74.0	3`\|`5	56M`\|`219M	S\|L

ResNet	18\|50\|101	69.8`\|`76.1`\|`77.4	12`\|`25`\|`44	2`\|`4`\|`8	download
ResNetD	18\|50\|101	-	12`\|`25`\|`44	2`\|`4`\|`8	download
MiT	B1\|B2\|B3	-	14`\|`25`\|`45	2`\|`4`\|`8	download
PVTv2	B1\|B2\|B4	78.7`\|`82.0`\|`83.6	14`\|`25`\|`63	2`\|`4`\|`10	download
ResT	S\|B\|L	79.6`\|`81.6`\|`83.6	14`\|`30`\|`52	2`\|`4`\|`8	download

Notes: Download backbones' weights for HarDNet-70 and DDRNet-23slim.

Supported Datasets

Dataset	Type	Categories	Train ^Images	Val ^Images	Test ^Images	Image Size ^(HxW)
COCO-Stuff	General Scene Parsing	171	118,000	5,000	20,000	-
ADE20K	General Scene Parsing	150	20,210	2,000	3,352	-
PASCALContext	General Scene Parsing	59	4,996	5,104	9,637	-

SUN RGB-D	Indoor Scene Parsing	37	2,666	2,619	5,050^+labels	-

Mapillary Vistas	Street Scene Parsing	65	18,000	2,000	5,000	1080x1920
CityScapes	Street Scene Parsing	19	2,975	500	1,525^+labels	1024x2048
CamVid	Street Scene Parsing	11	367	101	233^+labels	720x960

MHPv2	Multi-Human Parsing	59	15,403	5,000	5,000	-
MHPv1	Multi-Human Parsing	19	3,000	1,000	980^+labels	-
LIP	Multi-Human Parsing	20	30,462	10,000	-	-
CCIHP	Multi-Human Parsing	22	28,280	5,000	5,000	-
CIHP	Multi-Human Parsing	20	28,280	5,000	5,000	-
ATR	Single-Human Parsing	18	16,000	700	1,000^+labels	-

HELEN	Face Parsing	11	2,000	230	100^+labels	-
LaPa	Face Parsing	11	18,176	2,000	2,000^+labels	-
iBugMask	Face Parsing	11	21,866	-	1,000^+labels	-
CelebAMaskHQ	Face Parsing	19	24,183	2,993	2,824^+labels	512x512
FaceSynthetics	Face Parsing (Synthetic)	19	100,000	1,000	100^+labels	512x512

SUIM	Underwater Imagery	8	1,525	-	110^+labels	-

Check DATASETS to find more segmentation datasets.

Datasets Structure (click to expand)

Datasets should have the following structure:

data
|__ ADEChallenge
    |__ ADEChallengeData2016
        |__ images
            |__ training
            |__ validation
        |__ annotations
            |__ training
            |__ validation

|__ CityScapes
    |__ leftImg8bit
        |__ train
        |__ val
        |__ test
    |__ gtFine
        |__ train
        |__ val
        |__ test

|__ CamVid
    |__ train
    |__ val
    |__ test
    |__ train_labels
    |__ val_labels
    |__ test_labels
    
|__ VOCdevkit
    |__ VOC2010
        |__ JPEGImages
        |__ SegmentationClassContext
        |__ ImageSets
            |__ SegmentationContext
                |__ train.txt
                |__ val.txt
    
|__ COCO
    |__ images
        |__ train2017
        |__ val2017
    |__ labels
        |__ train2017
        |__ val2017

|__ MHPv1
    |__ images
    |__ annotations
    |__ train_list.txt
    |__ test_list.txt

|__ MHPv2
    |__ train
        |__ images
        |__ parsing_annos
    |__ val
        |__ images
        |__ parsing_annos

|__ LIP
    |__ LIP
        |__ TrainVal_images
            |__ train_images
            |__ val_images
        |__ TrainVal_parsing_annotations
            |__ train_segmentations
            |__ val_segmentations

    |__ CIHP/CCIHP
        |__ instance-leve_human_parsing
            |__ Training
                |__ Images
                |__ Category_ids
            |__ Validation
                |__ Images
                |__ Category_ids

    |__ ATR
        |__ humanparsing
            |__ JPEGImages
            |__ SegmentationClassAug

|__ SUIM
    |__ train_val
        |__ images
        |__ masks
    |__ TEST
        |__ images
        |__ masks

|__ SunRGBD
    |__ SUNRGBD
        |__ kv1/kv2/realsense/xtion
    |__ SUNRGBDtoolbox
        |__ traintestSUNRGBD
            |__ allsplit.mat

|__ Mapillary
    |__ training
        |__ images
        |__ labels
    |__ validation
        |__ images
        |__ labels

|__ SmithCVPR2013_dataset_resized (HELEN)
    |__ images
    |__ labels
    |__ exemplars.txt
    |__ testing.txt
    |__ tuning.txt

|__ CelebAMask-HQ
    |__ CelebA-HQ-img
    |__ CelebAMask-HQ-mask-anno
    |__ CelebA-HQ-to-CelebA-mapping.txt

|__ LaPa
    |__ train
        |__ images
        |__ labels
    |__ val
        |__ images
        |__ labels
    |__ test
        |__ images
        |__ labels

|__ ibugmask_release
    |__ train
    |__ test

|__ FaceSynthetics
    |__ dataset_100000
    |__ dataset_1000
    |__ dataset_100

Note: For PASCALContext, download the annotations from here and put it in VOC2010.

Note: For CelebAMask-HQ, run the preprocess script. python3 scripts/preprocess_celebamaskhq.py --root.

Augmentations (click to expand)

Check out the notebook here to test the augmentation effects.

Pixel-level Transforms:

ColorJitter (Brightness, Contrast, Saturation, Hue)
Gamma, Sharpness, AutoContrast, Equalize, Posterize
GaussianBlur, Grayscale

Spatial-level Transforms:

Affine, RandomRotation
HorizontalFlip, VerticalFlip
CenterCrop, RandomCrop
Pad, ResizePad, Resize
RandomResizedCrop

Usage

Requirements

python >= 3.6
torch >= 1.8.1
torchvision >= 0.9.1

Other requirements can be installed with pip install -r requirements.txt.

Configuration (click to expand)

Create a configuration file in configs. Sample configuration for ADE20K dataset can be found here. Then edit the fields you think if it is needed. This configuration file is needed for all of training, evaluation and prediction scripts.

Training (click to expand)

To train with a single GPU:

$ python tools/train.py --cfg configs/CONFIG_FILE.yaml

To train with multiple gpus, set DDP field in config file to true and run as follows:

$ python -m torch.distributed.launch --nproc_per_node=2 --use_env tools/train.py --cfg configs/<CONFIG_FILE_NAME>.yaml

Evaluation (click to expand)

Make sure to set MODEL_PATH of the configuration file to your trained model directory.

$ python tools/val.py --cfg configs/<CONFIG_FILE_NAME>.yaml

To evaluate with multi-scale and flip, change ENABLE field in MSF to true and run the same command as above.

Inference

To make an inference, edit the parameters of the config file from below.

Change MODEL >> NAME and VARIANT to your desired pretrained model.
Change DATASET >> NAME to the dataset name depending on the pretrained model.
Set TEST >> MODEL_PATH to pretrained weights of the testing model.
Change TEST >> FILE to the file or image folder path you want to test.
Testing results will be saved in SAVE_DIR.

## example using ade20k pretrained models
$ python tools/infer.py --cfg configs/ade20k.yaml

Example test results:

Convert to other Frameworks (ONNX, CoreML, OpenVINO, TFLite)

To convert to ONNX and CoreML, run:

$ python tools/export.py --cfg configs/<CONFIG_FILE_NAME>.yaml

To convert to OpenVINO and TFLite, see torch_optimize.

Inference (ONNX, OpenVINO, TFLite)

## ONNX Inference
$ python scripts/onnx_infer.py --model <ONNX_MODEL_PATH> --img-path <TEST_IMAGE_PATH>

## OpenVINO Inference
$ python scripts/openvino_infer.py --model <OpenVINO_MODEL_PATH> --img-path <TEST_IMAGE_PATH>

## TFLite Inference
$ python scripts/tflite_infer.py --model <TFLite_MODEL_PATH> --img-path <TEST_IMAGE_PATH>

References (click to expand)

Citations (click to expand)

@article{xie2021segformer,
  title={SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers},
  author={Xie, Enze and Wang, Wenhai and Yu, Zhiding and Anandkumar, Anima and Alvarez, Jose M and Luo, Ping},
  journal={arXiv preprint arXiv:2105.15203},
  year={2021}
}

@misc{xiao2018unified,
  title={Unified Perceptual Parsing for Scene Understanding}, 
  author={Tete Xiao and Yingcheng Liu and Bolei Zhou and Yuning Jiang and Jian Sun},
  year={2018},
  eprint={1807.10221},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@article{hong2021deep,
  title={Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes},
  author={Hong, Yuanduo and Pan, Huihui and Sun, Weichao and Jia, Yisong},
  journal={arXiv preprint arXiv:2101.06085},
  year={2021}
}

@misc{zhang2021rest,
  title={ResT: An Efficient Transformer for Visual Recognition}, 
  author={Qinglong Zhang and Yubin Yang},
  year={2021},
  eprint={2105.13677},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{huang2021fapn,
  title={FaPN: Feature-aligned Pyramid Network for Dense Image Prediction}, 
  author={Shihua Huang and Zhichao Lu and Ran Cheng and Cheng He},
  year={2021},
  eprint={2108.07058},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{wang2021pvtv2,
  title={PVTv2: Improved Baselines with Pyramid Vision Transformer}, 
  author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
  year={2021},
  eprint={2106.13797},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@article{Liu2021PSA,
  title={Polarized Self-Attention: Towards High-quality Pixel-wise Regression},
  author={Huajun Liu and Fuqiang Liu and Xinyi Fan and Dong Huang},
  journal={Arxiv Pre-Print arXiv:2107.00782 },
  year={2021}
}

@misc{chao2019hardnet,
  title={HarDNet: A Low Memory Traffic Network}, 
  author={Ping Chao and Chao-Yang Kao and Yu-Shan Ruan and Chien-Hsiang Huang and Youn-Long Lin},
  year={2019},
  eprint={1909.00948},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@inproceedings{sfnet,
  title={Semantic Flow for Fast and Accurate Scene Parsing},
  author={Li, Xiangtai and You, Ansheng and Zhu, Zhen and Zhao, Houlong and Yang, Maoke and Yang, Kuiyuan and Tong, Yunhai},
  booktitle={ECCV},
  year={2020}
}

@article{Li2020SRNet,
  title={Towards Efficient Scene Understanding via Squeeze Reasoning},
  author={Xiangtai Li and Xia Li and Ansheng You and Li Zhang and Guang-Liang Cheng and Kuiyuan Yang and Y. Tong and Zhouchen Lin},
  journal={ArXiv},
  year={2020},
  volume={abs/2011.03308}
}

@ARTICLE{Yucondnet21,
  author={Yu, Changqian and Shao, Yuanjie and Gao, Changxin and Sang, Nong},
  journal={IEEE Signal Processing Letters}, 
  title={CondNet: Conditional Classifier for Scene Segmentation}, 
  year={2021},
  volume={28},
  number={},
  pages={758-762},
  doi={10.1109/LSP.2021.3070472}
}

Comments

RuntimeError: CUDA error: an illegal memory access was encountered

The error was occured after changing batch_size from 8 to 4 in cityscapes.yaml.

Found 2975 train images. Found 500 val images. Epoch: [1/500] Iter: [1/185] LR: 0.00010049 Loss: 7.71337414: 1%|▌ | 1/185 [00:03<11:41, 3.81s/it] Traceback (most recent call last): File "tools/train.py", line 153, in main(cfg, gpu, save_dir) File "tools/train.py", line 97, in main scaler.scale(loss).backward() File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/_tensor.py", line 255, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/autograd/init.py", line 147, in backward Variable._execution_engine.run_backward( RuntimeError: CUDA error: an illegal memory access was encountered (semseg) [email protected]:~/Documents/projects/semantic-segmentation$ CUDA_LAUNCH_BLOCKING=1 python tools/train.py --cfg configs/DDRNet/cityscapes.yaml Found 2975 train images. Found 500 val images. Epoch: [1/500] Iter: [1/743] LR: 0.00010012 Loss: 7.53493118: 0%|▏ | 1/743 [00:03<37:21, 3.02s/it] Traceback (most recent call last): File "tools/train.py", line 153, in main(cfg, gpu, save_dir) File "tools/train.py", line 95, in main loss = loss_fn(logits, lbl) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/liuwang/Documents/projects/semantic-segmentation/semseg/losses.py", line 43, in forward return sum([w * self._forward(pred, labels) for (pred, w) in zip(preds, self.aux_weights)]) File "/home/liuwang/Documents/projects/semantic-segmentation/semseg/losses.py", line 43, in return sum([w * self._forward(pred, labels) for (pred, w) in zip(preds, self.aux_weights)]) File "/home/liuwang/Documents/projects/semantic-segmentation/semseg/losses.py", line 33, in _forward loss = self.criterion(preds, labels).view(-1) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 1120, in forward return F.cross_entropy(input, target, weight=self.weight, File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/nn/functional.py", line 2824, in cross_entropy return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index) RuntimeError: CUDA error: an illegal memory access was encountered Exception in thread Thread-2: Traceback (most recent call last): File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 28, in _pin_memory_loop r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/queues.py", line 116, in get return _ForkingPickler.loads(res) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 289, in rebuild_storage_fd fd = df.detach() File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/resource_sharer.py", line 57, in detach with _resource_sharer.get_connection(self._id) as conn: File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/resource_sharer.py", line 87, in get_connection c = Client(address, authkey=process.current_process().authkey) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/connection.py", line 508, in Client answer_challenge(c, authkey) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/connection.py", line 752, in answer_challenge message = connection.recv_bytes(256) # reject large message File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes buf = self._recv_bytes(maxlength) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes buf = self._recv(4) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/connection.py", line 379, in _recv chunk = read(handle, remaining) ConnectionResetError: [Errno 104] Connection reset by peer

opened by StuLiu 9
Augmentation configuration

Presently the augmentations used during training are hard-coded in semseg/augmentations.py. By default horizontal flip is enabled, which will be problematic for datasets where the orientation of the image matters (e.g. facial datasets may label the left and right eyes independently, and flipping the image would also require swapping these two labels).

Are there any future plans to allow for augmentation to be configurable?

opened by markdjwilliams 5
Performance issues with ddrnet

Hello, I am using ddrnet , i tried several other models with the same dataset i am working on and they improve over time except ddrnet after 10 epoch the miou starts decreasing , Does that mean ddrnet perform bad or is something wrong with the code i should fix?

opened by mhmd-mst 5
Question about the dimensionality of the mask.

Thank you for your work.

It would be nice to see the actual performance of the models in fps on specific hardware. Particularly on devices like jetson.

The training requires the mask to be gray, and in the file that describes the dataset PALETTE has a dimension of 3. Can you tell me what should be the dimensionality of PALETTE (for example my labels will be (1,1,1) or 1 etc.)

opened by MsWik 4

SegFormer B5 evaluation error on cityscapes

Hello, I try to implement an evaluation of segformer b5 to cityscapes dataset, but I keep getting an error related to the network's structure. I attached a file with the error. error.txt Even though I modified a custom.yaml file, which is like this one, I still keep getting an error. What can i do to solve this?

    MODEL:                                    
      NAME          : SegFormer                                           # name of the model you are using
      VARIANT       : B5                                                  # model variant
      PRETRAINED    : 'checkpoints/backbones/hardnet/hardnet_70.pth'              # backbone model's weight 
    
    DATASET:
      NAME          : cityscapes                                              # dataset name to be trained with (camvid, cityscapes, ade20k)
      ROOT          : '/media/evdo/DATA/diploma_project/cityscapes/gtFine_trainvaltest'                         # dataset root path
    
    TRAIN:
      IMAGE_SIZE    : [1024, 1024]    # training image size in (h, w)
      BATCH_SIZE    : 8               # batch size used to train
      EPOCHS        : 500             # number of epochs to train
      EVAL_INTERVAL : 50              # evaluation interval during training
      AMP           : false           # use AMP in training
      DDP           : false           # use DDP training
    
    LOSS:
      NAME          : ohemce          # loss function name (ohemce, ce, dice)
      CLS_WEIGHTS   : true            # use class weights in loss calculation
      THRESH        : 0.9             # ohemce threshold or dice delta if you choose ohemce loss or dice loss
    
    OPTIMIZER:
      NAME          : adamw           # optimizer name
      LR            : 0.01            # initial learning rate used in optimizer
      WEIGHT_DECAY  : 0.0001          # decay rate used in optimizer 
    
    SCHEDULER:
      NAME          : warmuppolylr    # scheduler name
      POWER         : 0.9             # scheduler power
      WARMUP        : 10              # warmup epochs used in scheduler
      WARMUP_RATIO  : 0.1             # warmup ratio
      
    
    EVAL:
      MODEL_PATH    : 'checkpoints/pretrained/segformer/segformer.b5.1024x1024.city.160k.pth'  # trained model file path
      IMAGE_SIZE    : [1024, 1024]                                                            # evaluation image size in (h, w)                       
      MSF:  
        ENABLE      : false                                                                 # multi-scale and flip evaluation  
        FLIP        : true                                                                  # use flip in evaluation  
        SCALES      : [0.5, 0.75, 1.0, 1.25, 1.5, 1.75]                                     # scales used in MSF evaluation                
    
    
    TEST:
      MODEL_PATH    : 'checkpoints/pretrained/ddrnet/ddrnet_23_city.pth'                     # trained model file path
      FILE          : 'cityscapes/leftImg8bit_trainextra/leftImg8bit/train_extra/wurzburg' # filename or foldername 
      IMAGE_SIZE    : [480, 640]                                                           # inference image size in (h, w)
      OVERLAY       : false

opened by evdokimos 3

About mIoU result on Cityscapes validation dataset

Hi,

Thanks for your work and providing code. I am just confuse about the mIoU reported in the paper on cityscapes validation dataset which is around 77%. Kindly let me know is this mIoU achieved on pretrained model (imagenet)? I am training DDR-Net-Slim23 from scratch and I am getting around 55% mIoU on validation data of Cityscapes.

opened by tanveer6715 3
RuntimeError: The size of tensor a (4) must match the size of tensor b (18) at non-singleton dimension 1

I have 18 classes using the cityscapes dataset, and I used to train my own images but I get this error from the Dice loss function at this point: tp = torch.sum(targets*preds, dim=(2, 3)) RuntimeError: The size of tensor a (4) must match the size of tensor b (18) at non-singleton dimension 1

opened by ghost 3
ImageDraw.textbbox Attribute error in Visualize.py function "draw_text" when calling infer.py

I am running infer.py and I get this error, any idea why? Traceback (most recent call last): File "tools/infer.py", line 100, in segmap,oversegmap,image = semseg.predict(str(test_file), cfg['TEST']['OVERLAY']) File "tools/infer.py", line 73, in predict seg_map,over_seg_map,image = self.postprocess(torch.tensor(image1), seg_map, overlay) File "tools/infer.py", line 60, in postprocess image = draw_text(seg_image, seg_map, self.labels) File "/content/trial/semseg/utils/visualize.py", line 28, in draw_text bbox = draw.textbbox(center, cls, font=fonts) AttributeError: 'ImageDraw' object has no attribute 'textbbox'

opened by mhmd-mst 2
Double Checking implementation detail in SegFormerHead

The paper on SegFormer suggests an All MLP decoder.

The SegformerHead.py shows the use of a Conv2D for the final layer.

Can you help me understand if this is a deviation from the paper or mentioned in a followup paper somewhere? I apologize in advance if there is an obvious answer.

opened by RahulSinghalChicago 2
UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`

/usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)

opened by RahulSinghalChicago 2
ModuleNotFoundError: No module named 'semseg'

Hi there, When I tried to run infer.py, I got the following error: (sotas) [email protected]:~/semantic-segmentation$ python tools/infer.py --cfg configs/custom.yaml Traceback (most recent call last): File "tools/infer.py", line 12, in <module> from semseg.models import * ModuleNotFoundError: No module named 'semseg' I also tried to append the path to /semseg into sys.path, but it did not solve the problem. Could you please give a bit suggestions? Thanks

opened by Shawn207 2
BiSeNetv2 implementation difference leads to strange results

Hello, I train BiSeNetv2 in my custom dataset and find probability maps have a grid-like look, which is very unnatural.

After further observation I find you use PixelShuffle on the last upsampling layer, which leads to the grid results(seems like you also use it in v1). But in BiSeNetv2 paper, the author uses simple bilinear interpolation to upsample results, which gives smoothing probability maps in my experiment. why use PixelShuffle here?

So what's the idea of using PixelShuffle to upsample final results? Also, don't see an obvious improvement on validate score by using it tho.

By the way, there are some differences between your implementation and the paper on Gather-and-Expansion Layer. Cannot figure out if this will affect training results significantly.

opened by wasupandceacar 0
RuntimeError： transform： failed to sync： cudaErrorIllegalAddress： an 非法内存 access

你好，请问我在使用DDR模型遇到这个错误'RuntimeError： transform： failed to sync： cudaErrorIllegalAddress： an 非法内存 access'，请教一下遇到这样子错误吗？网上搜有说pytorch版本，还有标注文件 / 标签等没有设置正确以及网络模型、训练数据（图片和标签）没有放置到GPU上，我感觉这些都没有问题啊。期待您的回复

opened by scl666 0
training error on colab

My all setup is successful on colab for training. However, when I run

!python tools/train.py --cfg configs/CONFIG_FILE.yaml

I get error:

Found 20210 training images. Found 2000 validation images. Epoch: [1/500] Iter: [0/2526] LR: 0.00100000 Loss: 0.00000000: 0% 0/2526 [00:00<?, ?it/s] Traceback (most recent call last): File "tools/train.py", line 128, in main(cfg, gpu, save_dir) File "tools/train.py", line 69, in main for iter, (img, lbl) in pbar: File "/usr/local/lib/python3.7/dist-packages/tqdm/std.py", line 1195, in iter for obj in iterable: File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 681, in next data = self._next_data() File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1376, in _next_data return self._process_data(data) File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1402, in _process_data data.reraise() File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 461, in reraise raise exception RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/content/semantic-segmentation/semseg/datasets/ade20k.py", line 73, in getitem image, label = self.transform(image, label) File "/content/semantic-segmentation/semseg/augmentations.py", line 20, in call img, mask = transform(img, mask) File "/content/semantic-segmentation/semseg/augmentations.py", line 329, in call mask = TF.pad(mask, padding, fill=self.seg_fill) File "/usr/local/lib/python3.7/dist-packages/torchvision/transforms/functional.py", line 481, in pad return F_t.pad(img, padding=padding, fill=fill, padding_mode=padding_mode) File "/usr/local/lib/python3.7/dist-packages/torchvision/transforms/functional_tensor.py", line 418, in pad img = torch_pad(img, p, mode=padding_mode, value=float(fill)) RuntimeError: value cannot be converted to type uint8_t without overflow

opened by hb0313 2
data format

Is the annotation format the same for all? eg. ADE20K seems to be color images. I have COCO format original annotation data (grey scale) but it appears I need to change it to color?

Thanks.

opened by Tsardoz 1

Releases(v0.2.6)

v0.2.6(Sep 24, 2021)

Add CondNet and bug fixes.
Source code(tar.gz)
Source code(zip)
v0.2.5(Sep 12, 2021)
Add FPN and FaPN heads

Some fixes and improvements

Source code(tar.gz)
Source code(zip)
v0.2.0(Aug 25, 2021)
New Datasets:

Sun RGBD

Mapillary Vistas

SUIM

New Backbones:

PVTv2, ResT, CycleMLP

New Module:

PSA

New Model:

DDRNet

Add tutorial colab notebook and augmentation testing notebook.
Source code(tar.gz)
Source code(zip)
v0.1.0(Aug 6, 2021)
Datasets:

ADE20K

CityScapes

CamVid

PASCAL-Context

COCO-Stuff

MHPv1/v2

LIP

CIHP

ATR

Model:

SegFormer

Source code(tar.gz)
Source code(zip)

Owner

sithu3

AI Developer

GitHub Repository

Model	Variants	ImageNet-1k Top-1 Acc (%)	Params (M)	GFLOPs	Weights
MicroNet	M1\|M2\|M3	51.4`\|`59.4`\|`62.5	1`\|`2`\|`3	6M`\|`12M`\|`21M	download
MobileNetV2	1.0	71.9	3	300M	download
MobileNetV3	S\|L	67.7`\|`74.0	3`\|`5	56M`\|`219M	S\|L

ResNet	18\|50\|101	69.8`\|`76.1`\|`77.4	12`\|`25`\|`44	2`\|`4`\|`8	download
ResNetD	18\|50\|101	-	12`\|`25`\|`44	2`\|`4`\|`8	download
MiT	B1\|B2\|B3	-	14`\|`25`\|`45	2`\|`4`\|`8	download
PVTv2	B1\|B2\|B4	78.7`\|`82.0`\|`83.6	14`\|`25`\|`63	2`\|`4`\|`10	download
ResT	S\|B\|L	79.6`\|`81.6`\|`83.6	14`\|`30`\|`52	2`\|`4`\|`8	download

Easy to use and customizable SOTA Semantic Segmentation models with abundant datasets in PyTorch

Related tags

Overview

Semantic Segmentation

Features

Model Zoo

Supported Datasets

Usage

Comments

Releases(v0.2.6)

v0.2.6(Sep 24, 2021)

v0.2.5(Sep 12, 2021)

v0.2.0(Aug 25, 2021)

v0.1.0(Aug 6, 2021)

Owner

sithu3

3D detection and tracking viewer (visualization) for kitti & waymo dataset

Official Implementation of "Tracking Grow-Finish Pigs Across Large Pens Using Multiple Cameras"

Learning hierarchical attention for weakly-supervised chest X-ray abnormality localization and diagnosis

The official implementation of Autoregressive Image Generation using Residual Quantization (CVPR '22)

Face Library is an open source package for accurate and real-time face detection and recognition

Code for Ditto: Building Digital Twins of Articulated Objects from Interaction

Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Human Dynamics from Monocular Video with Dynamic Camera Movements

Code for Deep Single-image Portrait Image Relighting

A python implementation of Deep-Image-Analogy based on pytorch.

Code for the KDD 2021 paper 'Filtration Curves for Graph Representation'

Prototype-based Incremental Few-Shot Semantic Segmentation

Second-Order Neural ODE Optimizer, NeurIPS 2021 spotlight

This Deep Learning Model Predicts that from which disease you are suffering.

Segmentation vgg16 fcn - cityscapes

[cvpr22] Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation

PyTorch implementation for our NeurIPS 2021 Spotlight paper "Long Short-Term Transformer for Online Action Detection".

Cluttered MNIST Dataset

YOLOX_AUDIO is an audio event detection model based on YOLOX

Latte: Cross-framework Python Package for Evaluation of Latent-based Generative Models