PixelPick This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick."

Overview

PixelPick

This is an official implementation of the paper "All you need are a few pixels: semantic segmentation with PixelPick."

[Project page] [Paper]

Table of contents

Abstract

A central challenge for the task of semantic segmentation is the prohibitive cost of obtaining dense pixel-level annotations to supervise model training. In this work, we show that in order to achieve a good level of segmentation performance, all you need are a few well-chosen pixel labels. We make the following contributions: (i) We investigate the novel semantic segmentation setting in which labels are supplied only at sparse pixel locations, and show that deep neural networks can use a handful of such labels to good effect; (ii) We demonstrate how to exploit this phenomena within an active learning framework, termed PixelPick, to radically reduce labelling cost, and propose an efficient “mouse-free” annotation strategy to implement our approach; (iii) We conduct extensive experiments to study the influence of annotation diversity under a fixed budget, model pretraining, model capacity and the sampling mechanism for picking pixels in this low annotation regime; (iv) We provide comparisons to the existing state of the art in semantic segmentation with active learning, and demonstrate comparable performance with up to two orders of magnitude fewer pixel annotations on the CamVid, Cityscapes and PASCAL VOC 2012 benchmarks; (v) Finally, we evaluate the efficiency of our annotation pipeline and its sensitivity to annotator error to demonstrate its practicality. Our code, models and annotation tool will be made publicly available.

Installation

Prerequisites

Our code is based on Python 3.8 and uses the following Python packages.

torch>=1.8.1
torchvision>=0.9.1
tqdm>=4.59.0
cv2>=4.5.1.48
Clone this repository
git clone https://github.com/NoelShin/PixelPick.git
cd PixelPick
Download dataset

Follow one of the instructions below to download a dataset you are interest in. Then, set the dir_dataset variable in args.py to the directory path which contains the downloaded dataset.

  • For CamVid, you need to download SegNet-Tutorial codebase as a zip file and use CamVid directory which contains images/annotations for training and test after unzipping it. You don't need to change the directory structure. [CamVid]

  • For Cityscapes, first visit the link and login to download. Once downloaded, you need to unzip it. You don't need to change the directory structure. It is worth noting that, if you set downsample variable in args.py (4 by default), it will first downsample train and val images of Cityscapes and store them within {dir_dataset}_d{downsample} folder which will be located in the same directory of dir_dataset. This is to enable a faster dataloading during training. [Cityscapes]

  • For PASCAL VOC 2012, the dataset will be automatically downloaded via torchvision.datasets.VOCSegmentation. You just need to specify which directory you want to download it with dir_dataset variable. If the automatic download fails, you can manually download through the following page (you don't need to untar VOCtrainval_11-May-2012.tar file which will be downloaded). [PASCAL VOC 2012 segmentation]

For more details about the data we used to train/validate our model, please visit datasets directory and find {camvid, cityscapes, voc}_{train, val}.txt file.

Train and validate

By default, the current code validates the model every epoch while training. To train a MobileNetv2-based DeepLabv3+ network, follow the below lines. (The pretrained MobileNetv2 will be loaded automatically.)

cd scripts
sh pixelpick-dl-cv.sh

Benchmark results

For CamVid and Cityscapes, we report the average of 5 different runs and 3 different runs for PASCAL VOC 2012. Please refer to our paper for details. ± one std of mean IoU is denoted.

CamVid
model backbone (encoder) # labelled pixels per img (% annotation) mean IoU (%)
PixelPick MobileNetv2 20 (0.012) 50.8 ± 0.2
PixelPick MobileNetv2 40 (0.023) 53.9 ± 0.7
PixelPick MobileNetv2 60 (0.035) 55.3 ± 0.5
PixelPick MobileNetv2 80 (0.046) 55.2 ± 0.7
PixelPick MobileNetv2 100 (0.058) 55.9 ± 0.1
Fully-supervised MobileNetv2 360x480 (100) 58.2 ± 0.6
PixelPick ResNet50 20 (0.012) 59.7 ± 0.9
PixelPick ResNet50 40 (0.023) 62.3 ± 0.5
PixelPick ResNet50 60 (0.035) 64.0 ± 0.3
PixelPick ResNet50 80 (0.046) 64.4 ± 0.6
PixelPick ResNet50 100 (0.058) 65.1 ± 0.3
Fully-supervised ResNet50 360x480 (100) 67.8 ± 0.3
Cityscapes

Note that to make training time manageable, we train on the quarter resolution (256x512) of the original Cityscapes images (1024x2048).

model backbone (encoder) # labelled pixels per img (% annotation) mean IoU (%)
PixelPick MobileNetv2 20 (0.015) 52.0 ± 0.6
PixelPick MobileNetv2 40 (0.031) 54.7 ± 0.4
PixelPick MobileNetv2 60 (0.046) 55.5 ± 0.6
PixelPick MobileNetv2 80 (0.061) 56.1 ± 0.3
PixelPick MobileNetv2 100 (0.076) 56.5 ± 0.3
Fully-supervised MobileNetv2 256x512 (100) 61.4 ± 0.5
PixelPick ResNet50 20 (0.015) 56.1 ± 0.4
PixelPick ResNet50 40 (0.031) 60.0 ± 0.3
PixelPick ResNet50 60 (0.046) 61.6 ± 0.4
PixelPick ResNet50 80 (0.061) 62.3 ± 0.4
PixelPick ResNet50 100 (0.076) 62.8 ± 0.4
Fully-supervised ResNet50 256x512 (100) 68.5 ± 0.3
PASCAL VOC 2012
model backbone (encoder) # labelled pixels per img (% annotation) mean IoU (%)
PixelPick MobileNetv2 10 (0.009) 51.7 ± 0.2
PixelPick MobileNetv2 20 (0.017) 53.9 ± 0.8
PixelPick MobileNetv2 30 (0.026) 56.7 ± 0.3
PixelPick MobileNetv2 40 (0.034) 56.9 ± 0.7
PixelPick MobileNetv2 50 (0.043) 57.2 ± 0.3
Fully-supervised MobileNetv2 N/A (100) 57.9 ± 0.5
PixelPick ResNet50 10 (0.009) 59.7 ± 0.8
PixelPick ResNet50 20 (0.017) 65.6 ± 0.5
PixelPick ResNet50 30 (0.026) 66.4 ± 0.2
PixelPick ResNet50 40 (0.034) 67.2 ± 0.1
PixelPick ResNet50 50 (0.043) 67.4 ± 0.5
Fully-supervised ResNet50 N/A (100) 69.4 ± 0.3

Models

model dataset backbone (encoder) # labelled pixels per img (% annotation) mean IoU (%) Download
PixelPick CamVid MobileNetv2 100 (0.058) 56.1 Link
PixelPick CamVid ResNet50 100 (0.058) TBU TBU
PixelPick Cityscapes MobileNetv2 100 (0.076) 56.8 Link
PixelPick Cityscapes ResNet50 100 (0.076) 63.3 Link
PixelPick VOC 2012 MobileNetv2 50 (0.043) 57.4 Link
PixelPick VOC 2012 ResNet50 50 (0.043) 68.0 Link

PixelPick mouse-free annotation tool

Code for the annotation tool will be made available.

Citation

To be updated.

Acknowledgements

We borrowed code for the MobileNetv2-based DeepLabv3+ network from https://github.com/Shuai-Xie/DEAL.

If you have any questions, please contact us at {gyungin, weidi, samuel}@robots.ox.ac.uk.

Owner
Gyungin Shin
Serving others
Gyungin Shin
PyTorch implementation of Value Iteration Networks (VIN): Clean, Simple and Modular. Visualization in Visdom.

VIN: Value Iteration Networks This is an implementation of Value Iteration Networks (VIN) in PyTorch to reproduce the results.(TensorFlow version) Key

Xingdong Zuo 215 Dec 07, 2022
(CVPR 2022) A minimalistic mapless end-to-end stack for joint perception, prediction, planning and control for self driving.

LAV Learning from All Vehicles Dian Chen, Philipp Krähenbühl CVPR 2022 (also arXiV 2203.11934) This repo contains code for paper Learning from all veh

Dian Chen 300 Dec 15, 2022
A proof of concept ai-powered Recaptcha v2 solver

Recaptcha Fullauto I've decided to open source my old Recaptcha v2 solver. My latest version will be opened sourced this summer. I am hoping this proj

Nate 60 Dec 20, 2022
The code written during my Bachelor Thesis "Classification of Human Whole-Body Motion using Hidden Markov Models".

This code was written during the course of my Bachelor thesis Classification of Human Whole-Body Motion using Hidden Markov Models. Some things might

Matthias Plappert 14 Dec 06, 2022
Deep Q Learning with OpenAI Gym and Pokemon Showdown

pokemon-deep-learning An openAI gym project for pokemon involving deep q learning. Made by myself, Sam Little, and Layton Webber. This code captures g

2 Dec 22, 2021
YoHa - A practical hand tracking engine.

YoHa - A practical hand tracking engine.

2k Jan 06, 2023
a project for 3D multi-object tracking

a project for 3D multi-object tracking

155 Jan 04, 2023
Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization

Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization Official PyTorch implementation for our URST (Ultra-Resolution Sty

czczup 148 Dec 27, 2022
Accurate Phylogenetic Inference with Symmetry-Preserving Neural Networks

Accurate Phylogenetic Inference with a Symmetry-preserving Neural Network Model Claudia Solis-Lemus Shengwen Yang Leonardo Zepeda-Núñez This repositor

Leonardo Zepeda-Núñez 2 Feb 11, 2022
Deep Learning ❤️ OneFlow

Deep Learning with OneFlow made easy 🚀 ! Carefree? carefree-learn aims to provide CAREFREE usages for both users and developers. User Side Computer V

21 Oct 27, 2022
Repository for the electrical and ICT benchmark model developed in the ERIGrid 2.0 project.

Benchmark Model Electrical and ICT System This repository contains the documentation, code, and models for the electrical and ICT benchmark model deve

ERIGrid 2.0 1 Nov 29, 2021
Do Neural Networks for Segmentation Understand Insideness?

This is part of the code to reproduce the results of the paper Do Neural Networks for Segmentation Understand Insideness? [pdf] by K. Villalobos (*),

biolins 0 Mar 20, 2021
Convenient tool for speeding up the intern/officer review process.

icpc-app-screen Convenient tool for speeding up the intern/officer applicant review process. Eliminates the pain from reading application responses of

1 Oct 30, 2021
An NVDA add-on to split screen reader and audio from other programs to different sound channels

An NVDA add-on to split screen reader and audio from other programs to different sound channels (add-on idea credit: Tony Malykh)

Joseph Lee 7 Dec 25, 2022
Romanian Automatic Speech Recognition from the ROBIN project

RobinASR This repository contains Robin's Automatic Speech Recognition (RobinASR) for the Romanian language based on the DeepSpeech2 architecture, tog

RACAI 10 Jan 01, 2023
Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo simulator

DRL-robot-navigation Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo simulator. Using Twin Delayed Deep Deterministic Policy Gra

87 Jan 07, 2023
Learning Skeletal Articulations with Neural Blend Shapes

This repository provides an end-to-end library for automatic character rigging and blend shapes generation as well as a visualization tool. It is based on our work Learning Skeletal Articulations wit

Peizhuo 504 Dec 30, 2022
[NeurIPS'21] Shape As Points: A Differentiable Poisson Solver

Shape As Points (SAP) Paper | Project Page | Short Video (6 min) | Long Video (12 min) This repository contains the implementation of the paper: Shape

394 Dec 30, 2022
Brain Tumor Detection with Tensorflow Neural Networks.

Brain-Tumor-Detection A convolutional neural network model built with Tensorflow & Keras to detect brain tumor and its different variants. Data of the

404ErrorNotFound 5 Aug 23, 2022
an implementation of Video Frame Interpolation via Adaptive Separable Convolution using PyTorch

This work has now been superseded by: https://github.com/sniklaus/revisiting-sepconv sepconv-slomo This is a reference implementation of Video Frame I

Simon Niklaus 985 Jan 08, 2023