Weakly Supervised Segmentation with Tensorflow. Implements instance segmentation as described in Simple Does It: Weakly Supervised Instance and Semantic Segmentation, by Khoreva et al. (CVPR 2017).

Overview

Weakly Supervised Segmentation with TensorFlow

This repo contains a TensorFlow implementation of weakly supervised instance segmentation as described in Simple Does It: Weakly Supervised Instance and Semantic Segmentation, by Khoreva et al. (CVPR 2017).

The idea behind weakly supervised segmentation is to train a model using cheap-to-generate label approximations (e.g., bounding boxes) as substitute/guiding labels for computer vision classification tasks that usually require very detailed labels. In semantic labelling, each image pixel is assigned to a specific class (e.g., boat, car, background, etc.). In instance segmentation, all the pixels belonging to the same object instance are given the same instance ID.

Per [2014a], pixelwise mask annotations are far more expensive to generate than object bounding box annotations (requiring up to 15x more time). Some models, like Simply Does It (SDI) [2016a] claim they can use a weak supervision approach to reach 95% of the quality of the fully supervised model, both for semantic labelling and instance segmentation.

Simple Does It (SDI)

Experimental Setup for Instance Segmentation

In weakly supervised instance segmentation, there are no pixel-wise annotations (i.e., no segmentation masks) that can be used to train a model. Yet, we aim to train a model that can still predict segmentation masks by only being given an input image and bounding boxes for the objects of interest in that image.

The masks used for training are generated starting from individual object bounding boxes. For each annotated bounding box, we generate a segmentation mask using the GrabCut method (although, any other method could be used), and train a convnet to regress from the image and bounding box information to the instance segmentation mask.

Note that in the original paper, a more sophisticated segmenter is used (M∩G+).

Network

SDI validates its work repurposing two different instance segmentation architectures (DeepMask [2015a] and DeepLab2 VGG-16 [2016b]). Here we use the OSVOS FCN (See section 3.1 of [2016c]).

Setup

The code in this repo was developed and tested using Anaconda3 v.4.4.0. To reproduce our conda environment, please use the following files:

On Ubuntu:

On Windows:

Jupyter Notebooks

The recommended way to test this implementation is to use the following jupyter notebooks:

  • VGG16 Net Surgery: The weakly supervised segmentation techniques presented in the "Simply Does It" paper use a backbone convnet (either DeepLab or VGG16 network) pre-trained on ImageNet. This pre-trained network takes RGB images as an input (W x H x 3). Remember that the weakly supervised version is trained using 4-channel inputs: RGB + a binary mask with a filled bounding box of the object instance. Therefore, we need to perform net surgery and create a 4-channel input version of the VGG16 net, initialized with the 3-channel parameter values except for the additional convolutional filters (we use Gaussian initialization for them).
  • "Simple Does It" Grabcut Training for Instance Segmentation: This notebook performs training of the SDI Grabcut weakly supervised model for instance segmentation. Following the instructions provided in Section "6. Instance Segmentation Results" of the "Simple Does It" paper, we use the Berkeley-augmented Pascal VOC segmentation dataset that provides per-instance segmentation masks for VOC2012 data. The Berkley augmented dataset can be downloaded from here. Again, the SDI Grabcut training is done using a 4-channel input VGG16 network pre-trained on ImageNet, so make sure to run the VGG16 Net Surgery notebook first!
  • "Simple Does It" Weakly Supervised Instance Segmentation (Testing): The sample results shown in the notebook come from running our trained model on the validation split of the Berkeley-augmented dataset.

Link to Pre-trained model and BK-VOC data files

The pre-processed BK-VOC dataset, "grabcut" segmentations, and results as well as pre-trained models (vgg_16_4chan_weak.ckpt-50000) can be found here:

If you'd rather download the Berkeley-augmented Pascal VOC segmentation dataset that provides per-instance segmentation masks for VOC2012 data from its origin, click here. Then, execute lines similar to these lines in dataset.py to generate the intermediary files used by this project:

if __name__ == '__main__':
    dataset = BKVOCDataset()
    dataset.prepare()

Make sure to set the paths at the top of dataset.py to the correct location:

if sys.platform.startswith("win"):
    _BK_VOC_DATASET = "E:/datasets/bk-voc/benchmark_RELEASE/dataset"
else:
    _BK_VOC_DATASET = '/media/EDrive/datasets/bk-voc/benchmark_RELEASE/dataset'

Training

The fully supervised version of the instance segmentation network whose performance we're trying to match is trained using the RGB images as inputs. The weakly supervised version is trained using 4-channel inputs: RGB + a binary mask with a filled bounding box of the object instance. In the latter case, the same RGB image may appear in several input samples (as many times as there are object instances associated with that RGB image).

To be clear, the output labels used for training are NOT user-provided detailed groundtruth annotations. There are no such groundtruths in the weakly supervised scenario. Instead, the labels are the segmentation masks generated using the GrabCut+ method. The weakly supoervised model is trained to regress from an image and bounding box information to a generated segmentation mask.

Testing

The sample results shown here come from running our trained model on the validation split of the Berkeley-augmented dataset (see the testing notebook). Below, we (very) subjectively categorize them as "pretty good" and "not so great".

Pretty good

Not so great

References

2016

  • [2016a] Khoreva et al. 2016. Simple Does It: Weakly Supervised Instance and Semantic Segmentation. [arXiv] [web]
  • [2016b] Chen et al. 2016. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. [arXiv]
  • [2016c] Caelles et al. 2016. OSVOS: One-Shot Video Object Segmentation. [arXiv]

2015

  • [2015a] Pinheiro et al. 2015. DeepMask: Learning to Segment Object Candidates. [arXiv]

2014

  • [2014a] Lin et al. 2014. Microsoft COCO: Common Objects in Context. [arXiv] [web]
Owner
Phil Ferriere
Former Microsoft Development Lead passionate about Deep Learning with a focus on Computer Vision.
Phil Ferriere
PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis

PyTorch Implementation of Daft-Exprt: Robust Prosody Transfer Across Speakers for Expressive Speech Synthesis

Ubisoft 76 Dec 30, 2022
Earth Vision Foundation

EVer - A Library for Earth Vision Researcher EVer is a Pytorch-based Python library to simplify the training and inference of the deep learning model.

Zhuo Zheng 34 Nov 26, 2022
This is a demo app to be used in the video streaming applications

MoViDNN: A Mobile Platform for Evaluating Video Quality Enhancement with Deep Neural Networks MoViDNN is an Android application that can be used to ev

ATHENA Christian Doppler (CD) Laboratory 7 Jul 21, 2022
PyTorch implementation of Towards Accurate Alignment in Real-time 3D Hand-Mesh Reconstruction (ICCV 2021).

Towards Accurate Alignment in Real-time 3D Hand-Mesh Reconstruction Introduction This is official PyTorch implementation of Towards Accurate Alignment

TANG Xiao 96 Dec 27, 2022
Improving Deep Network Debuggability via Sparse Decision Layers

Improving Deep Network Debuggability via Sparse Decision Layers This repository contains the code for our paper: Leveraging Sparse Linear Layers for D

Madry Lab 35 Nov 14, 2022
Vision-Language Pre-training for Image Captioning and Question Answering

VLP This repo hosts the source code for our AAAI2020 work Vision-Language Pre-training (VLP). We have released the pre-trained model on Conceptual Cap

Luowei Zhou 373 Jan 03, 2023
Let's Git - Versionsverwaltung & Open Source Hausaufgabe

Let's Git - Versionsverwaltung & Open Source Hausaufgabe Herzlich Willkommen zu dieser Hausaufgabe für unseren MOOC: Let's Git! Wir hoffen, dass Du vi

1 Dec 13, 2021
Everything about being a TA for ITP/AP course!

تی‌ای بودن! تی‌ای یا دستیار استاد از نقش‌های رایج بین دانشجویان مهندسی است، این ریپوزیتوری قرار است نکات مهم درمورد تی‌ای بودن و تی ای شدن را به ما نش

<a href=[email protected]"> 14 Sep 10, 2022
System-oriented IR evaluations are limited to rather abstract understandings of real user behavior

Validating Simulations of User Query Variants This repository contains the scripts of the experiments and evaluations, simulated queries, as well as t

IR Group at Technische Hochschule Köln 2 Nov 23, 2022
(Arxiv 2021) NeRF--: Neural Radiance Fields Without Known Camera Parameters

NeRF--: Neural Radiance Fields Without Known Camera Parameters Project Page | Arxiv | Colab Notebook | Data Zirui Wang¹, Shangzhe Wu², Weidi Xie², Min

Active Vision Laboratory 411 Dec 26, 2022
Dilated Convolution for Semantic Image Segmentation

Multi-Scale Context Aggregation by Dilated Convolutions Introduction Properties of dilated convolution are discussed in our ICLR 2016 conference paper

Fisher Yu 764 Dec 26, 2022
Keras Image Embeddings using Contrastive Loss

Keras-Image-Embeddings-using-Contrastive-Loss Image to Embedding projection in vector space. Implementation in keras and tensorflow for custom data. B

Shravan Anand K 5 Mar 21, 2022
A Pytorch reproduction of Range Loss, which is proposed in paper 《Range Loss for Deep Face Recognition with Long-Tailed Training Data》

RangeLoss Pytorch This is a Pytorch reproduction of Range Loss, which is proposed in paper 《Range Loss for Deep Face Recognition with Long-Tailed Trai

Youzhi Gu 7 Nov 27, 2021
Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

DART Implementation for ICLR2022 paper Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners. Environment

ZJUNLP 83 Dec 27, 2022
Dynamic vae - Dynamic VAE algorithm is used for anomaly detection of battery data

Dynamic VAE frame Automatic feature extraction can be achieved by probability di

10 Oct 07, 2022
The source code and data of the paper "Instance-wise Graph-based Framework for Multivariate Time Series Forecasting".

IGMTF The source code and data of the paper "Instance-wise Graph-based Framework for Multivariate Time Series Forecasting". Requirements The framework

Wentao Xu 24 Dec 05, 2022
Interactive web apps created using geemap and streamlit

geemap-apps Introduction This repo demostrates how to build a multi-page Earth Engine App using streamlit and geemap. You can deploy the app on variou

Qiusheng Wu 27 Dec 23, 2022
Source code for paper "Deep Diffusion Models for Robust Channel Estimation", TBA.

diffusion-channels Source code for paper "Deep Diffusion Models for Robust Channel Estimation". Generic flow: Use 'matlab/main.mat' to generate traini

The University of Texas Computational Sensing and Imaging Lab 15 Dec 22, 2022
A short code in python, Enchpyter, is able to encrypt and decrypt words as you determine, of course

Enchpyter Enchpyter is a program do encrypt and decrypt any word you want (just letters). You enter how many letters jumps and write the word, so, the

João Assalim 2 Oct 10, 2022
Loopy belief propagation for factor graphs on discrete variables, in JAX!

PGMax implements general factor graphs for discrete probabilistic graphical models (PGMs), and hardware-accelerated differentiable loopy belief propagation (LBP) in JAX.

Vicarious 62 Dec 23, 2022