Human segmentation models, training/inference code, and trained weights, implemented in PyTorch

Last update: Dec 19, 2022

Overview

Human-Segmentation-PyTorch

Human segmentation models, training/inference code, and trained weights, implemented in PyTorch.

Supported networks

UNet: backbones MobileNetV2 (all aphas and expansions), ResNetV1 (all num_layers)
DeepLab3+: backbones ResNetV1 (num_layers=18,34,50,101), VGG16_bn
BiSeNet: backbones ResNetV1 (num_layers=18)
PSPNet: backbones ResNetV1 (num_layers=18,34,50,101)
ICNet: backbones ResNetV1 (num_layers=18,34,50,101)

To assess architecture, memory, forward time (in either cpu or gpu), numper of parameters, and number of FLOPs of a network, use this command:

python measure_model.py

Dataset

Portrait Segmentation (Human/Background)

Automatic Portrait Segmentation for Image Stylization: 1800 images
Supervisely Person: 5711 images

Set

Python3.6.x is used in this repository.
Clone the repository:

git clone --recursive https://github.com/AntiAegis/Human-Segmentation-PyTorch.git
cd Human-Segmentation-PyTorch
git submodule sync
git submodule update --init --recursive

To install required packages, use pip:

workon humanseg
pip install -r requirements.txt
pip install -e models/pytorch-image-models

Training

For training a network from scratch, for example DeepLab3+, use this command:

python train.py --config config/config_DeepLab.json --device 0

where config/config_DeepLab.json is the configuration file which contains network, dataloader, optimizer, losses, metrics, and visualization configurations.

For resuming training the network from a checkpoint, use this command:

python train.py --config config/config_DeepLab.json --device 0 --resume path_to_checkpoint/model_best.pth

One can open tensorboard to monitor the training progress by enabling the visualization mode in the configuration file.

Inference

There are two modes of inference: video and webcam.

python inference_video.py --watch --use_cuda --checkpoint path_to_checkpoint/model_best.pth
python inference_webcam.py --use_cuda --checkpoint path_to_checkpoint/model_best.pth

Benchmark

Networks are trained on a combined dataset from the two mentioned datasets above. There are 6627 training and 737 testing images.
Input size of model is set to 320.
The CPU and GPU time is the averaged inference time of 10 runs (there are also 10 warm-up runs before measuring) with batch size 1.
The mIoU is measured on the testing subset (737 images) from the combined dataset.
Hardware configuration for benchmarking:

CPU: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
GPU: GeForce GTX 1050 Mobile, CUDA 9.0

Model	Parameters	FLOPs	CPU time	GPU time	mIoU
UNet_MobileNetV2 (alpha=1.0, expansion=6)	4.7M	1.3G	167ms	17ms	91.37%
UNet_ResNet18	16.6M	9.1G	165ms	21ms	90.09%
DeepLab3+_ResNet18	16.6M	9.1G	133ms	28ms	91.21%
BiSeNet_ResNet18	11.9M	4.7G	88ms	10ms	87.02%
PSPNet_ResNet18	12.6M	20.7G	235ms	666ms	---
ICNet_ResNet18	11.6M	2.0G	48ms	55ms	86.27%

Human segmentation models, training/inference code, and trained weights, implemented in PyTorch

Related tags

Overview

Human-Segmentation-PyTorch

Supported networks

Dataset

Set

Training

Inference

Benchmark

Owner

Thuy Ng

Official repository for "Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems"

A-ESRGAN aims to provide better super-resolution images by using multi-scale attention U-net discriminators.

The repository contains reproducible PyTorch source code of our paper Generative Modeling with Optimal Transport Maps, ICLR 2022.

Defocus Map Estimation and Deblurring from a Single Dual-Pixel Image

Transformer in Vision

Official code for our EMNLP2021 Outstanding Paper MindCraft: Theory of Mind Modeling for Situated Dialogue in Collaborative Tasks

[AAAI 2021] EMLight: Lighting Estimation via Spherical Distribution Approximation and [ICCV 2021] Sparse Needlets for Lighting Estimation with Spherical Transport Loss

Inverse Optimal Control Adapted to the Noise Characteristics of the Human Sensorimotor System

Toontown House CT Edition

Tree-based Search Graph for Approximate Nearest Neighbor Search

Safe Control for Black-box Dynamical Systems via Neural Barrier Certificates

Old Photo Restoration (Official PyTorch Implementation)

Ensemble Visual-Inertial Odometry (EnVIO)

Unsupervised Image Generation with Infinite Generative Adversarial Networks

Arbitrary Distribution Modeling with Censorship in Real Time 59 2 60 3 Bidding Advertising for KDD'21

A small library for doing fluid simulation with neural networks.

FFCV: Fast Forward Computer Vision (and other ML workloads!)

Cmsc11 arcade - Final Project for CMSC11

This is a model to classify Vietnamese sign language using Motion history image (MHI) algorithm and CNN.

Tensors and Dynamic neural networks in Python with strong GPU acceleration