This repository contains the code used in the paper "Prompt-Based Multi-Modal Image Segmentation".

Last update: Dec 30, 2022

Related tags

Deep Learning clipseg

Overview

Prompt-Based Multi-Modal Image Segmentation

This repository contains the code used in the paper "Prompt-Based Multi-Modal Image Segmentation".

The systems allows to create segmentation models without training based on:

An arbitrary text query
Or an image with a mask highlighting stuff or an object.

Quick Start

In the Quickstart.ipynb notebook we provide the code for using a pre-trained CLIPSeg model. It can also be used interactively using MyBinder (please note that the VM does not use a GPU, thus inference takes a few seconds).

Dependencies

This code base depends on pytorch, torchvision and clip (pip install git+https://github.com/openai/CLIP.git). Additional dependencies are hidden for double blind review.

Datasets

PhraseCut and PhraseCutPlus: Referring expression dataset
PFEPascalWrapper: Wrapper class for PFENet's Pascal-5i implementation
PascalZeroShot: Wrapper class for PascalZeroShot
COCOWrapper: Wrapper class for COCO.

Models

CLIPDensePredT: CLIPSeg model with transformer-based decoder.
ViTDensePredT: CLIPSeg model with transformer-based decoder.

Third Party Dependencies

For some of the datasets third party dependencies are required. Run the following commands in the third_party folder.

git clone https://github.com/cvlab-yonsei/JoEm
git clone https://github.com/Jia-Research-Lab/PFENet.git
git clone https://github.com/ChenyunWu/PhraseCutDataset.git
git clone https://github.com/juhongm999/hsnet.git

Weights

CLIPSeg-D64 (4.1MB, without CLIP weights)
CLIPSeg-D16 (1.1MB, without CLIP weights)

Training

See the experiment folder for yaml definitions of the training configurations. The training code is in experiment_setup.py.

Usage of PFENet Wrappers

In order to use the dataset and model wrappers for PFENet, the PFENet repository needs to be cloned to the root folder. git clone https://github.com/Jia-Research-Lab/PFENet.git

Citation

@article{lueddecke21
    title={Prompt-Based Multi-Modal Image Segmentation},
    author={Timo Lüddecke and Alexander Ecker},
    journal={arXiv preprint arXiv:2112.10003},
    year={2021}
}

This repository contains the code used in the paper "Prompt-Based Multi-Modal Image Segmentation".

Related tags

Overview

Prompt-Based Multi-Modal Image Segmentation

Quick Start

Dependencies

Datasets

Models

Third Party Dependencies

Weights

Training

Usage of PFENet Wrappers

Citation

Owner

Timo Lüddecke

High level network definitions with pre-trained weights in TensorFlow

Code + pre-trained models for the paper Keeping Your Eye on the Ball Trajectory Attention in Video Transformers

PyTorch implementation of "Representing Shape Collections with Alignment-Aware Linear Models" paper.

Efficient 3D Backbone Network for Temporal Modeling

A Python library that enables ML teams to share, load, and transform data in a collaborative, flexible, and efficient way :chestnut:

This repository implements variational graph auto encoder by Thomas Kipf.

Code for "Adversarial Training for a Hybrid Approach to Aspect-Based Sentiment Analysis

Code for our paper "MG-GAN: A Multi-Generator Model Preventing Out-of-Distribution Samples in Pedestrian Trajectory Prediction" published at ICCV 2021.

Hunt down social media accounts by username across social networks

PyTorch implementation of our ICCV 2021 paper Intrinsic-Extrinsic Preserved GANs for Unsupervised 3D Pose Transfer.

Group-Free 3D Object Detection via Transformers

A PyTorch implementation of unsupervised SimCSE

This project is a loose implementation of paper "Algorithmic Financial Trading with Deep Convolutional Neural Networks: Time Series to Image Conversion Approach"

A Python toolbox to create adversarial examples that fool neural networks in PyTorch, TensorFlow, and JAX

DeepRec is a recommendation engine based on TensorFlow.

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

A Collection of Papers and Codes for ICCV2021 Low Level Vision and Image Generation

The materials used in the SaxonJS tutorial presented at Declarative Amsterdam, 2021

E-RAFT: Dense Optical Flow from Event Cameras

Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models