SPEAR: Semi suPErvised dAta progRamming

Last update: Dec 06, 2022

Overview

Semi-Supervised Data Programming for Data Efficient Machine Learning

SPEAR is a library for data programming with semi-supervision. The package implements several recent data programming approaches including facility to programmatically label and build training data.

Pipeline

Design Labeling functions(LFs)
generate pickle file containing labels by passing raw data to LFs
Use one of the Label Aggregators(LA) to get final labels

SPEAR provides functionality such as

development of LFs/rules/heuristics for quick labeling
compare against several data programming approaches
compare against semi-supervised data programming approaches
use subset selection to make best use of the annotation efforts

Labelling Functions (LFs)

discrete LFs - Users can define LFs that return discrete labels
continuous LFs - return continuous scores/confidence to the labels assigned

Approaches Implemented

You can read this paper to know about below approaches

Only-L
Learning to Reweight
Posterior Regularization
Imply Loss
CAGE
Joint Learning

Data folder for SMS can be found here. This folder needs to be placed in the same directory as notebooks folder is in, to run the notebooks or examples.

Installation

Method 1

To install latest version of SPEAR package using PyPI:

pip install decile-spear

Method 2

SPEAR requires Python 3.6 or later. First install submodlib. Then install SPEAR:

git clone https://github.com/decile-team/spear.git
cd spear
pip install -r requirements/requirements.txt

Citation

@misc{abhishek2021spear,
      title={SPEAR : Semi-supervised Data Programming in Python}, 
      author={Guttu Sai Abhishek and Harshad Ingole and Parth Laturia and Vineeth Dorna and Ayush Maheshwari and Ganesh Ramakrishnan and Rishabh Iyer},
      year={2021},
      eprint={2108.00373},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Quick Links

Acknowledgment

SPEAR takes inspiration, builds upon, and uses pieces of code from several open source codebases. These include Snorkel, Snuba & Imply Loss. Also, SPEAR uses SUBMODLIB for subset selection, which is provided by DECILE too.

Team

SPEAR is created and maintained by Ayush, Abhishek, Vineeth, Harshad, Parth, Pankaj, Rishabh Iyer, and Ganesh Ramakrishnan. We look forward to have SPEAR more community driven. Please use it and contribute to it for your research, and feel free to use it for your commercial projects. We will add the major contributors here.

Publications

[1] Maheshwari, Ayush, et al. Data Programming using Semi-Supervision and Subset Selection, In Findings of ACL (Long Paper) 2021.

[2] Chatterjee, Oishik, Ganesh Ramakrishnan, and Sunita Sarawagi. Data Programming using Continuous and Quality-Guided Labeling Functions, In AAAI 2020.

[3] Sahay, Atul, et al. Rule augmented unsupervised constituency parsing, In Findings of ACL (Short Paper) 2021.

Shape-aware Semi-supervised 3D Semantic Segmentation for Medical Images

SASSnet Code for paper: Shape-aware Semi-supervised 3D Semantic Segmentation for Medical Images(MICCAI 2020) Our code is origin from UA-MT You can fin

125 Jan 3, 2023

Semi-supervised Learning for Sentiment Analysis

Neural-Semi-supervised-Learning-for-Text-Classification-Under-Large-Scale-Pretraining Code, models and Datasets for《Neural Semi-supervised Learning fo

47 Jan 1, 2023

Semi-supervised Semantic Segmentation with Directional Context-aware Consistency (CVPR 2021)

Semi-supervised Semantic Segmentation with Directional Context-aware Consistency (CAC) Xin Lai*, Zhuotao Tian*, Li Jiang, Shu Liu, Hengshuang Zhao, Li

137 Dec 14, 2022

From Fidelity to Perceptual Quality: A Semi-Supervised Approach for Low-Light Image Enhancement (CVPR'2020)

Under-exposure introduces a series of visual degradation, i.e. decreased visibility, intensive noise, and biased color, etc. To address these problems, we propose a novel semi-supervised learning approach for low-light image enhancement.

117 Jan 3, 2023

Semi-supervised Video Deraining with Dynamical Rain Generator (CVPR, 2021, Pytorch)

S2VD Semi-supervised Video Deraining with Dynamical Rain Generator (CVPR, 2021) Requirements and Dependencies Ubuntu 16.04, cuda 10.0 Python 3.6.10, P

53 Nov 23, 2022

[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

MiVOS (CVPR 2021) - Mask Propagation Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang [arXiv] [Paper PDF] [Project Page] [Papers with Code] This repo impleme

106 Jan 3, 2023

Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation (CVPR 2021)

Comments

Updated condition for Gold Label check and passing parameter name passing
Current Version of Spear fails when we are trying to do LF analysis without passing Gold Labels and their values is passed as None and is causing the following error as it is not checked

Y = np.array([self.mapping[v] for v in Y]) TypeError: 'NoneType' object is not iterable

Also their is a function call of confusion_matrix in lf_summary method, which requires the parameter name to execute properly else it fails with following error of argument passing

confusion_matrix(Y, self.L[:, i], labels)[1:, 1:] for i in range(m) TypeError: confusion_matrix() takes 2 positional arguments but 3 were given

The current code change fixes these two issues.
opened by kasuba-badri-vishal 1
sms_jl.ipynb ISSUE with "Some Labelling Functions" code snippet

I have changed the directory of previously glove_w2v.txt and then ran on my local pc and installed all reqd libraries but it shows an invalid literal for int() with base 10: 'import'

I think its an issue with gensim but can;t seem to resolve it

i'm attaching a picture down below :

https://cdn.discordapp.com/attachments/754057588714373325/989172192078098442/unknown.png

opened by Brshank 1

SPEAR: Semi suPErvised dAta progRamming

Related tags

Overview

Semi-Supervised Data Programming for Data Efficient Machine Learning

Pipeline

SPEAR provides functionality such as

Labelling Functions (LFs)

Approaches Implemented

Installation

Method 1

Method 2

Citation

Quick Links

Acknowledgment

Team

Publications

You might also like...

Shape-aware Semi-supervised 3D Semantic Segmentation for Medical Images

Semi-supervised Learning for Sentiment Analysis

Semi-supervised Semantic Segmentation with Directional Context-aware Consistency (CVPR 2021)

From Fidelity to Perceptual Quality: A Semi-Supervised Approach for Low-Light Image Enhancement (CVPR'2020)

Semi-supervised Video Deraining with Dynamical Rain Generator (CVPR, 2021, Pytorch)

[CVPR 2021] MiVOS - Mask Propagation module. Reproduced STM (and better) with training code :star2:. Semi-supervised video object segmentation evaluation.

Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation (CVPR 2021)

Semi-supervised Semantic Segmentation with Directional Context-aware Consistency (CVPR 2021)

[CVPR 2021] Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision

Comments

Updated condition for Gold Label check and passing parameter name passing

sms_jl.ipynb ISSUE with "Some Labelling Functions" code snippet

Releases(v1.0.0)

v1.0.0(Nov 3, 2021)

Owner

decile-team

공공장소에서 눈만 돌리면 CCTV가 보인다는 말이 과언이 아닐 정도로 CCTV가 우리 생활에 깊숙이 자리 잡았습니다.

Hierarchical probabilistic 3D U-Net, with attention mechanisms (—𝘈𝘵𝘵𝘦𝘯𝘵𝘪𝘰𝘯 𝘜-𝘕𝘦𝘵, 𝘚𝘌𝘙𝘦𝘴𝘕𝘦𝘵) and a nested decoder structure with deep supervision (—𝘜𝘕𝘦𝘵++).

Official Pytorch implementation of the paper "MotionCLIP: Exposing Human Motion Generation to CLIP Space"

No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consistency

MVFNet: Multi-View Fusion Network for Efficient Video Recognition (AAAI 2021)

Official code for 'Pixel-wise Energy-biased Abstention Learning for Anomaly Segmentationon Complex Urban Driving Scenes'

PyTorch implementation of "Efficient Neural Architecture Search via Parameters Sharing"

The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".

Unofficial implementation of Fast-SCNN: Fast Semantic Segmentation Network

Code for "Universal inference meets random projections: a scalable test for log-concavity"

Prefix-Tuning: Optimizing Continuous Prompts for Generation

(CVPR 2021) Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds

Ontologysim: a Owlready2 library for applied production simulation

An open source bike computer based on Raspberry Pi Zero (W, WH) with GPS and ANT+. Including offline map and navigation.

(AAAI 2021) Progressive One-shot Human Parsing

🏅 The Most Comprehensive List of Kaggle Solutions and Ideas 🏅

Pure python implementation reverse-mode automatic differentiation

Streaming over lightweight data transformations

RIM: Reliable Influence-based Active Learning on Graphs.

Graph Transformer Architecture. Source code for