Weakly Supervised Segmentation with Tensorflow. Implements instance segmentation as described in Simple Does It: Weakly Supervised Instance and Semantic Segmentation, by Khoreva et al. (CVPR 2017).

Overview

Weakly Supervised Segmentation with TensorFlow

This repo contains a TensorFlow implementation of weakly supervised instance segmentation as described in Simple Does It: Weakly Supervised Instance and Semantic Segmentation, by Khoreva et al. (CVPR 2017).

The idea behind weakly supervised segmentation is to train a model using cheap-to-generate label approximations (e.g., bounding boxes) as substitute/guiding labels for computer vision classification tasks that usually require very detailed labels. In semantic labelling, each image pixel is assigned to a specific class (e.g., boat, car, background, etc.). In instance segmentation, all the pixels belonging to the same object instance are given the same instance ID.

Per [2014a], pixelwise mask annotations are far more expensive to generate than object bounding box annotations (requiring up to 15x more time). Some models, like Simply Does It (SDI) [2016a] claim they can use a weak supervision approach to reach 95% of the quality of the fully supervised model, both for semantic labelling and instance segmentation.

Simple Does It (SDI)

Experimental Setup for Instance Segmentation

In weakly supervised instance segmentation, there are no pixel-wise annotations (i.e., no segmentation masks) that can be used to train a model. Yet, we aim to train a model that can still predict segmentation masks by only being given an input image and bounding boxes for the objects of interest in that image.

The masks used for training are generated starting from individual object bounding boxes. For each annotated bounding box, we generate a segmentation mask using the GrabCut method (although, any other method could be used), and train a convnet to regress from the image and bounding box information to the instance segmentation mask.

Note that in the original paper, a more sophisticated segmenter is used (M∩G+).

Network

SDI validates its work repurposing two different instance segmentation architectures (DeepMask [2015a] and DeepLab2 VGG-16 [2016b]). Here we use the OSVOS FCN (See section 3.1 of [2016c]).

Setup

The code in this repo was developed and tested using Anaconda3 v.4.4.0. To reproduce our conda environment, please use the following files:

On Ubuntu:

On Windows:

Jupyter Notebooks

The recommended way to test this implementation is to use the following jupyter notebooks:

  • VGG16 Net Surgery: The weakly supervised segmentation techniques presented in the "Simply Does It" paper use a backbone convnet (either DeepLab or VGG16 network) pre-trained on ImageNet. This pre-trained network takes RGB images as an input (W x H x 3). Remember that the weakly supervised version is trained using 4-channel inputs: RGB + a binary mask with a filled bounding box of the object instance. Therefore, we need to perform net surgery and create a 4-channel input version of the VGG16 net, initialized with the 3-channel parameter values except for the additional convolutional filters (we use Gaussian initialization for them).
  • "Simple Does It" Grabcut Training for Instance Segmentation: This notebook performs training of the SDI Grabcut weakly supervised model for instance segmentation. Following the instructions provided in Section "6. Instance Segmentation Results" of the "Simple Does It" paper, we use the Berkeley-augmented Pascal VOC segmentation dataset that provides per-instance segmentation masks for VOC2012 data. The Berkley augmented dataset can be downloaded from here. Again, the SDI Grabcut training is done using a 4-channel input VGG16 network pre-trained on ImageNet, so make sure to run the VGG16 Net Surgery notebook first!
  • "Simple Does It" Weakly Supervised Instance Segmentation (Testing): The sample results shown in the notebook come from running our trained model on the validation split of the Berkeley-augmented dataset.

Link to Pre-trained model and BK-VOC data files

The pre-processed BK-VOC dataset, "grabcut" segmentations, and results as well as pre-trained models (vgg_16_4chan_weak.ckpt-50000) can be found here:

If you'd rather download the Berkeley-augmented Pascal VOC segmentation dataset that provides per-instance segmentation masks for VOC2012 data from its origin, click here. Then, execute lines similar to these lines in dataset.py to generate the intermediary files used by this project:

if __name__ == '__main__':
    dataset = BKVOCDataset()
    dataset.prepare()

Make sure to set the paths at the top of dataset.py to the correct location:

if sys.platform.startswith("win"):
    _BK_VOC_DATASET = "E:/datasets/bk-voc/benchmark_RELEASE/dataset"
else:
    _BK_VOC_DATASET = '/media/EDrive/datasets/bk-voc/benchmark_RELEASE/dataset'

Training

The fully supervised version of the instance segmentation network whose performance we're trying to match is trained using the RGB images as inputs. The weakly supervised version is trained using 4-channel inputs: RGB + a binary mask with a filled bounding box of the object instance. In the latter case, the same RGB image may appear in several input samples (as many times as there are object instances associated with that RGB image).

To be clear, the output labels used for training are NOT user-provided detailed groundtruth annotations. There are no such groundtruths in the weakly supervised scenario. Instead, the labels are the segmentation masks generated using the GrabCut+ method. The weakly supoervised model is trained to regress from an image and bounding box information to a generated segmentation mask.

Testing

The sample results shown here come from running our trained model on the validation split of the Berkeley-augmented dataset (see the testing notebook). Below, we (very) subjectively categorize them as "pretty good" and "not so great".

Pretty good

Not so great

References

2016

  • [2016a] Khoreva et al. 2016. Simple Does It: Weakly Supervised Instance and Semantic Segmentation. [arXiv] [web]
  • [2016b] Chen et al. 2016. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. [arXiv]
  • [2016c] Caelles et al. 2016. OSVOS: One-Shot Video Object Segmentation. [arXiv]

2015

  • [2015a] Pinheiro et al. 2015. DeepMask: Learning to Segment Object Candidates. [arXiv]

2014

  • [2014a] Lin et al. 2014. Microsoft COCO: Common Objects in Context. [arXiv] [web]
Owner
Phil Ferriere
Former Microsoft Development Lead passionate about Deep Learning with a focus on Computer Vision.
Phil Ferriere
Distributing reference energies for SMIRNOFF implementations

Warning: This code is currently experimental and under active development. Is it not yet suitable for distribution or use as reference implementation.

Open Force Field Initiative 1 Dec 07, 2021
VoxHRNet - Whole Brain Segmentation with Full Volume Neural Network

VoxHRNet This is the official implementation of the following paper: Whole Brain Segmentation with Full Volume Neural Network Yeshu Li, Jonathan Cui,

Microsoft 12 Nov 24, 2022
An OpenAI Gym environment for Super Mario Bros

gym-super-mario-bros An OpenAI Gym environment for Super Mario Bros. & Super Mario Bros. 2 (Lost Levels) on The Nintendo Entertainment System (NES) us

Andrew Stelmach 1 Jan 05, 2022
Adaptive Pyramid Context Network for Semantic Segmentation (APCNet CVPR'2019)

Adaptive Pyramid Context Network for Semantic Segmentation (APCNet CVPR'2019) Introduction Official implementation of Adaptive Pyramid Context Network

21 Nov 09, 2022
Neural Magic Eye: Learning to See and Understand the Scene Behind an Autostereogram, arXiv:2012.15692.

Neural Magic Eye Preprint | Project Page | Colab Runtime Official PyTorch implementation of the preprint paper "NeuralMagicEye: Learning to See and Un

Zhengxia Zou 56 Jul 15, 2022
Implementation of the paper ''Implicit Feature Refinement for Instance Segmentation''.

Implicit Feature Refinement for Instance Segmentation This repository is an official implementation of the ACM Multimedia 2021 paper Implicit Feature

Lufan Ma 17 Dec 28, 2022
A Fast Sequence Transducer Implementation with PyTorch Bindings

transducer A Fast Sequence Transducer Implementation with PyTorch Bindings. The corresponding publication is Sequence Transduction with Recurrent Neur

Awni Hannun 184 Dec 18, 2022
A simple implementation of Kalman filter in single object tracking

kalman-filter-in-single-object-tracking A simple implementation of Kalman filter in single object tracking https://www.bilibili.com/video/BV1Qf4y1J7D4

130 Dec 26, 2022
RATCHET is a Medical Transformer for Chest X-ray Diagnosis and Reporting

RATCHET: RAdiological Text Captioning for Human Examined Thoraxes RATCHET is a Medical Transformer for Chest X-ray Diagnosis and Reporting. Based on t

26 Nov 14, 2022
Official code of our work, AVATAR: A Parallel Corpus for Java-Python Program Translation.

AVATAR Official code of our work, AVATAR: A Parallel Corpus for Java-Python Program Translation. AVATAR stands for jAVA-pyThon progrAm tRanslation. AV

Wasi Ahmad 26 Dec 03, 2022
Unofficial PyTorch implementation of Attention Free Transformer (AFT) layers by Apple Inc.

aft-pytorch Unofficial PyTorch implementation of Attention Free Transformer's layers by Zhai, et al. [abs, pdf] from Apple Inc. Installation You can i

Rishabh Anand 184 Dec 12, 2022
Python scripts form performing stereo depth estimation using the HITNET model in ONNX.

ONNX-HITNET-Stereo-Depth-estimation Python scripts form performing stereo depth estimation using the HITNET model in ONNX. Stereo depth estimation on

Ibai Gorordo 30 Nov 08, 2022
Official PyTorch implementation of RIO

Image-Level or Object-Level? A Tale of Two Resampling Strategies for Long-Tailed Detection Figure 1: Our proposed Resampling at image-level and obect-

NVIDIA Research Projects 17 May 20, 2022
A clean and robust Pytorch implementation of PPO on continuous action space.

PPO-Continuous-Pytorch I found the current implementation of PPO on continuous action space is whether somewhat complicated or not stable. And this is

XinJingHao 56 Dec 16, 2022
Code and data for ACL2021 paper Cross-Lingual Abstractive Summarization with Limited Parallel Resources.

Multi-Task Framework for Cross-Lingual Abstractive Summarization (MCLAS) The code for ACL2021 paper Cross-Lingual Abstractive Summarization with Limit

Yu Bai 43 Nov 07, 2022
hySLAM is a hybrid SLAM/SfM system designed for mapping

HySLAM Overview hySLAM is a hybrid SLAM/SfM system designed for mapping. The system is based on ORB-SLAM2 with some modifications and refactoring. Raú

Brian Hopkinson 15 Oct 10, 2022
NICE-GAN — Official PyTorch Implementation Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation

NICE-GAN-pytorch - Official PyTorch implementation of NICE-GAN: Reusing Discriminators for Encoding: Towards Unsupervised Image-to-Image Translation

Runfa Chen 208 Nov 25, 2022
Source code of AAAI 2022 paper "Towards End-to-End Image Compression and Analysis with Transformers".

Towards End-to-End Image Compression and Analysis with Transformers Source code of our AAAI 2022 paper "Towards End-to-End Image Compression and Analy

37 Dec 21, 2022
The materials used in the SaxonJS tutorial presented at Declarative Amsterdam, 2021

SaxonJS-Tutorial-2021, version 1.0.4 Last updated on 4 November, 2021. Table of contents Background Prerequisites Starting a web server Running a Java

Saxonica 11 Oct 23, 2022
PyTorch wrapper for Taichi data-oriented class

Stannum PyTorch wrapper for Taichi data-oriented class PRs are welcomed, please see TODOs. Usage from stannum import Tin import torch data_oriented =

86 Dec 23, 2022