Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets

Overview

Crowd-Kit: Computational Quality Control for Crowdsourcing

GitHub Tests Codecov

Documentation

Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets. We strive to implement functionality that simplifies working with crowdsourced data.

Currently, Crowd-Kit contains:

  • implementations of commonly-used aggregation methods for categorical, pairwise, textual, and segmentation responses
  • metrics of uncertainty, consistency, and agreement with aggregate
  • loaders for popular crowdsourced datasets

The library is currently in a heavy development state, and interfaces are subject to change.

Installing

Installing Crowd-Kit is as easy as pip install crowd-kit

Getting Started

This example shows how to use Crowd-Kit for categorical aggregation using the classical Dawid-Skene algorithm.

First, let us do all the necessary imports.

from crowdkit.aggregation import DawidSkene
from crowdkit.datasets import load_dataset

import pandas as pd

Then, you need to read your annotations into Pandas DataFrame with columns task, performer, label. Alternatively, you can download an example dataset.

df = pd.read_csv('results.csv')  # should contain columns: task, performer, label
# df, ground_truth = load_dataset('relevance-2')  # or download an example dataset

Then you can aggregate the performer responses as easily as in scikit-learn:

aggregated_labels = DawidSkene(n_iter=100).fit_predict(df)

More usage examples

Implemented Aggregation Methods

Below is the list of currently implemented methods, including the already available () and in progress ( 🟡 ).

Categorical Responses

Method Status
Majority Vote
Dawid-Skene
Gold Majority Vote
M-MSR
Wawa
Zero-Based Skill
GLAD
BCC 🟡

Textual Responses

Method Status
RASA
HRRASA
ROVER

Image Segmentation

Method Status
Segmentation MV
Segmentation RASA
Segmentation EM

Pairwise Comparisons

Method Status
Bradley-Terry
Noisy Bradley-Terry

Citation

@inproceedings{HCOMP2021/CrowdKit,
  author    = {Ustalov, Dmitry and Pavlichenko, Nikita and Losev, Vladimir and Giliazev, Iulian and Tulin, Evgeny},
  title     = {{A General-Purpose Crowdsourcing Computational Quality Control Toolkit for Python}},
  year      = {2021},
  booktitle = {The Ninth AAAI Conference on Human Computation and Crowdsourcing: Works-in-Progress and Demonstration Track},
  series    = {HCOMP~2021},
  eprint    = {2109.08584},
  eprinttype = {arxiv},
  eprintclass = {cs.HC},
  url       = {https://www.humancomputation.com/assets/wips_demos/HCOMP_2021_paper_85.pdf},
  language  = {english},
}

Questions and Bug Reports

License

© YANDEX LLC, 2020-2021. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.

Comments
  • Crowd-Kit Learning

    Crowd-Kit Learning

    This is just an example of what this subpackage will contain.

    We need to configure setup.cfg and add new tests. Here I suggest to discuss the concept.

    opened by pilot7747 10
  • Fix the documentation generation issues

    Fix the documentation generation issues

    Stick to YAML files hosted in https://github.com/Toloka/docs and use the proper includes.

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)
    • [x] Documentation and examples improvement (changes affected documentation and/or examples)

    Checklist:

    • [x] I have read the CONTRIBUTING document.
    • [x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
    • [x] My change requires a change to the documentation.
    • [x] I have updated the documentation accordingly.
    • [ ] I have added tests to cover my changes.
    • [ ] All new and existing tests passed.
    documentation enhancement 
    opened by dustalov 9
  • Add MACE

    Add MACE

    Is it possible that you add MACE ? It is often used in my field but there is only a Java implementation that is hard to integrate into Python projects.

    enhancement good first issue 
    opened by jcklie 4
  • Add MACE aggregation model

    Add MACE aggregation model

    I have added the MACE aggregation model. https://www.cs.cmu.edu/~hovy/papers/13HLT-MACE.pdf

    Description

    Based on the original VB inference implementation, I wrote it in Python.

    Connected issues (if any)

    https://github.com/Toloka/crowd-kit/issues/5

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)
    • [ ] Documentation and examples improvement (changes affected documentation and/or examples)

    Checklist:

    • [x] I have read the CONTRIBUTING document.
    • [x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
    • [x] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [x] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    opened by pilot7747 3
  • Documentation updates

    Documentation updates

    Updated index.md and the Classification section:

    1. added extra information to the models descriptions;
    2. added descriptions for parameters;
    3. fixed error and typos in descriptions.
    opened by Natalyl3 2
  • Binary Relevance aggregation

    Binary Relevance aggregation

    Description

    I have added code for Binary Relevance aggregation - simple method for multi-label classification. This approach treats each label as a class in binary classification task and aggregates it separately.

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)
    • [ ] Documentation and examples improvement (changes affected documentation and/or examples)

    Checklist:

    • [x] I have read the CONTRIBUTING document.
    • [x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [x] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    opened by denaxen 2
  • Use mypy --strict

    Use mypy --strict

    Description

    This pull request enforces a stricter set of mypy type checks by enabling the strict mode. It also fixes several type inconsistencies. As the NumPy type annotations were introduced in version 1.20 (January 2021), some Crowd-Kit installations might broke, but I believe it is a worthy contribution.

    Connected issues (if any)

    Types of changes

    • [x] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [x] Breaking change (fix or feature that would cause existing functionality to change)
    • [ ] Documentation and examples improvement (changes affected documentation and/or examples)

    Checklist:

    • [x] I have read the CONTRIBUTING document.
    • [x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [x] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    enhancement 
    opened by dustalov 2
  • Run Jupyter notebooks with tests

    Run Jupyter notebooks with tests

    Description

    This pull request runs the Jupyter notebooks with examples on the current version of Crowd-Kit with the rest of the test suite on GitHub Actions.

    Connected issues (if any)

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)
    • [x] Documentation and examples improvement (changes affected documentation and/or examples)

    Checklist:

    • [x] I have read the CONTRIBUTING document.
    • [x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [x] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    enhancement good first issue 
    opened by dustalov 2
  • Dramatically improve the code maintainability

    Dramatically improve the code maintainability

    This pull request is probably the best thing that could happen to Crowd-Kit code maintainability.

    Description

    In this pull request, we switch from unnecessarily verbose Python stub files to more convenient inline type annotations. During this, many type annotations were fixed. We also removed the manage_docstring decorator and the corresponding utility functions.

    I think this change might break the documentation generation process. We will release a new version of Crowd-Kit only after this is fixed.

    Connected issues (if any)

    Types of changes

    • [x] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [x] Breaking change (fix or feature that would cause existing functionality to change)
    • [x] Documentation and examples improvement (changes affected documentation and/or examples)

    Checklist:

    • [x] I have read the CONTRIBUTING document.
    • [x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
    • [x] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [x] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    bug documentation enhancement 
    opened by dustalov 2
  • Add header and LM-based aggregation item

    Add header and LM-based aggregation item

    Description

    This pull request makes README.md nicer. It adds the missing language model-based textual aggregation method.

    Connected issues (if any)

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)
    • [x] Documentation and examples improvement (changes affected documentation and/or examples)

    Checklist:

    • [x] I have read the CONTRIBUTING document.
    • [x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [ ] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    documentation 
    opened by dustalov 2
  • Renamed columns?

    Renamed columns?

    Hi, the guide says

    df = pd.read_csv('results.csv') # should contain columns: task, performer, label

    but when I load this file, then the second column is worker and not performer. I had used crowdkit with dataframes that had columns: task, performer, label, but after an update, it broke.

    opened by jcklie 2
  • Ordinal Labels

    Ordinal Labels

    Is it possible to support aggregation of ordinal labels as a part of this toolkit via this reduction algorithm.

    • Labels are categorical but have an ordering defined 1 < ... < K.
    • The K class ordinal labels are transformed into K−1 binary class label data.
    • Each of the binary task is then aggregated via crowdkit to estimate Pr[yi > c] for c = 1,...,K −1.
    • The probability of the actual class values can then be obtained as Pr[yi = c] = Pr[yi > c−1 and yi ≤ c] = Pr[yi > c−1]−Pr[yi > c].
    • The class with the maximum probability is assigned to the instance
    enhancement 
    opened by vikasraykar 2
Releases(v1.2.0)
Owner
Toloka
Data labeling platform for ML
Toloka
Advanced yabai wooting scripts

Yabai Wooting scripts Installation requirements Both https://github.com/xiamaz/python-yabai-client and https://github.com/xiamaz/python-wooting-rgb ne

Max Zhao 3 Dec 31, 2021
Machine learning library for fast and efficient Gaussian mixture models

This repository contains code which implements the Stochastic Gaussian Mixture Model (S-GMM) for event-based datasets Dependencies CMake Premake4 Blaz

Omar Oubari 1 Dec 19, 2022
Lightweight mmm - Lightweight (Bayesian) Media Mix Model

Lightweight (Bayesian) Media Mix Model This is not an official Google product. L

Google 342 Jan 03, 2023
image scene graph generation benchmark

Scene Graph Benchmark in PyTorch 1.7 This project is based on maskrcnn-benchmark Highlights Upgrad to pytorch 1.7 Multi-GPU training and inference Bat

Microsoft 303 Dec 27, 2022
Pytorch implementation of AngularGrad: A New Optimization Technique for Angular Convergence of Convolutional Neural Networks

AngularGrad Optimizer This repository contains the oficial implementation for AngularGrad: A New Optimization Technique for Angular Convergence of Con

mario 124 Sep 16, 2022
Simple-System-Convert--C--F - Simple System Convert With Python

Simple-System-Convert--C--F REQUIREMENTS Python version : 3 HOW TO USE Run the c

Jonathan Santos 2 Feb 16, 2022
PyTorch Implementation of our paper Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation

PyTorch Implementation of our paper Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation

Zechen Bai 12 Jul 08, 2022
PyTorch implementation of our ICCV paper DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection.

Introduction This repo contains the official PyTorch implementation of our ICCV paper DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection. Up

133 Dec 29, 2022
SigOpt wrappers for scikit-learn methods

SigOpt + scikit-learn Interfacing This package implements useful interfaces and wrappers for using SigOpt and scikit-learn together Getting Started In

SigOpt 73 Sep 30, 2022
Official PyTorch Implementation of Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity

UnRigidFlow This is the official PyTorch implementation of UnRigidFlow (IJCAI2019). Here are two sample results (~10MB gif for each) of our unsupervis

Liang Liu 28 Nov 16, 2022
Learning-based agent for Google Research Football

TiKick 1.Introduction Learning-based agent for Google Research Football Code accompanying the paper "TiKick: Towards Playing Multi-agent Football Full

Tsinghua AI Research Team for Reinforcement Learning 90 Dec 26, 2022
VolumeGAN - 3D-aware Image Synthesis via Learning Structural and Textural Representations

VolumeGAN - 3D-aware Image Synthesis via Learning Structural and Textural Representations 3D-aware Image Synthesis via Learning Structural and Textura

GenForce: May Generative Force Be with You 116 Dec 26, 2022
Code for Massive-scale Decoding for Text Generation using Lattices

Massive-scale Decoding for Text Generation using Lattices Jiacheng Xu, Greg Durrett TL;DR: a new search algorithm to construct lattices encoding many

Jiacheng Xu 37 Dec 18, 2022
An end-to-end machine learning library to directly optimize AUC loss

LibAUC An end-to-end machine learning library for AUC optimization. Why LibAUC? Deep AUC Maximization (DAM) is a paradigm for learning a deep neural n

Andrew 75 Dec 12, 2022
Trax — Deep Learning with Clear Code and Speed

Trax — Deep Learning with Clear Code and Speed Trax is an end-to-end library for deep learning that focuses on clear code and speed. It is actively us

Google 7.3k Dec 26, 2022
A list of awesome PyTorch scholarship articles, guides, blogs, courses and other resources.

Awesome PyTorch Scholarship Resources A collection of awesome PyTorch and Python learning resources. Contributions are always welcome! Course Informat

Arnas Gečas 302 Dec 03, 2022
An experiment to bait a generalized frontrunning MEV bot

Honeypot 🍯 A simple experiment that: Creates a honeypot contract Baits a generalized fronturnning bot with a unique transaction Analyze bot behaviour

0x1355 14 Nov 24, 2022
[ACL-IJCNLP 2021] Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning

CLNER The code is for our ACL-IJCNLP 2021 paper: Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning CLNER is a

71 Dec 08, 2022
[Pedestron] Generalizable Pedestrian Detection: The Elephant In The Room. @ CVPR2021

Pedestron Pedestron is a MMdetection based repository, that focuses on the advancement of research on pedestrian detection. We provide a list of detec

Irtiza Hasan 594 Jan 05, 2023
Face Mask Detection is a project to determine whether someone is wearing mask or not, using deep neural network.

face-mask-detection Face Mask Detection is a project to determine whether someone is wearing mask or not, using deep neural network. It contains 3 scr

amirsalar 13 Jan 18, 2022