Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets

Overview

Crowd-Kit: Computational Quality Control for Crowdsourcing

GitHub Tests Codecov

Documentation

Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets. We strive to implement functionality that simplifies working with crowdsourced data.

Currently, Crowd-Kit contains:

  • implementations of commonly-used aggregation methods for categorical, pairwise, textual, and segmentation responses
  • metrics of uncertainty, consistency, and agreement with aggregate
  • loaders for popular crowdsourced datasets

The library is currently in a heavy development state, and interfaces are subject to change.

Installing

Installing Crowd-Kit is as easy as pip install crowd-kit

Getting Started

This example shows how to use Crowd-Kit for categorical aggregation using the classical Dawid-Skene algorithm.

First, let us do all the necessary imports.

from crowdkit.aggregation import DawidSkene
from crowdkit.datasets import load_dataset

import pandas as pd

Then, you need to read your annotations into Pandas DataFrame with columns task, performer, label. Alternatively, you can download an example dataset.

df = pd.read_csv('results.csv')  # should contain columns: task, performer, label
# df, ground_truth = load_dataset('relevance-2')  # or download an example dataset

Then you can aggregate the performer responses as easily as in scikit-learn:

aggregated_labels = DawidSkene(n_iter=100).fit_predict(df)

More usage examples

Implemented Aggregation Methods

Below is the list of currently implemented methods, including the already available ( ✅ ) and in progress ( 🟡 ).

Categorical Responses

Method Status
Majority Vote ✅
Dawid-Skene ✅
Gold Majority Vote ✅
M-MSR ✅
Wawa ✅
Zero-Based Skill ✅
GLAD ✅
BCC 🟡

Textual Responses

Method Status
RASA ✅
HRRASA ✅
ROVER ✅

Image Segmentation

Method Status
Segmentation MV ✅
Segmentation RASA ✅
Segmentation EM ✅

Pairwise Comparisons

Method Status
Bradley-Terry ✅
Noisy Bradley-Terry ✅

Citation

@inproceedings{HCOMP2021/CrowdKit,
  author    = {Ustalov, Dmitry and Pavlichenko, Nikita and Losev, Vladimir and Giliazev, Iulian and Tulin, Evgeny},
  title     = {{A General-Purpose Crowdsourcing Computational Quality Control Toolkit for Python}},
  year      = {2021},
  booktitle = {The Ninth AAAI Conference on Human Computation and Crowdsourcing: Works-in-Progress and Demonstration Track},
  series    = {HCOMP~2021},
  eprint    = {2109.08584},
  eprinttype = {arxiv},
  eprintclass = {cs.HC},
  url       = {https://www.humancomputation.com/assets/wips_demos/HCOMP_2021_paper_85.pdf},
  language  = {english},
}

Questions and Bug Reports

License

© YANDEX LLC, 2020-2021. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.

Comments
  • Crowd-Kit Learning

    Crowd-Kit Learning

    This is just an example of what this subpackage will contain.

    We need to configure setup.cfg and add new tests. Here I suggest to discuss the concept.

    opened by pilot7747 10
  • Fix the documentation generation issues

    Fix the documentation generation issues

    Stick to YAML files hosted in https://github.com/Toloka/docs and use the proper includes.

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)
    • [x] Documentation and examples improvement (changes affected documentation and/or examples)

    Checklist:

    • [x] I have read the CONTRIBUTING document.
    • [x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
    • [x] My change requires a change to the documentation.
    • [x] I have updated the documentation accordingly.
    • [ ] I have added tests to cover my changes.
    • [ ] All new and existing tests passed.
    documentation enhancement 
    opened by dustalov 9
  • Add MACE

    Add MACE

    Is it possible that you add MACE ? It is often used in my field but there is only a Java implementation that is hard to integrate into Python projects.

    enhancement good first issue 
    opened by jcklie 4
  • Add MACE aggregation model

    Add MACE aggregation model

    I have added the MACE aggregation model. https://www.cs.cmu.edu/~hovy/papers/13HLT-MACE.pdf

    Description

    Based on the original VB inference implementation, I wrote it in Python.

    Connected issues (if any)

    https://github.com/Toloka/crowd-kit/issues/5

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)
    • [ ] Documentation and examples improvement (changes affected documentation and/or examples)

    Checklist:

    • [x] I have read the CONTRIBUTING document.
    • [x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
    • [x] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [x] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    opened by pilot7747 3
  • Documentation updates

    Documentation updates

    Updated index.md and the Classification section:

    1. added extra information to the models descriptions;
    2. added descriptions for parameters;
    3. fixed error and typos in descriptions.
    opened by Natalyl3 2
  • Binary Relevance aggregation

    Binary Relevance aggregation

    Description

    I have added code for Binary Relevance aggregation - simple method for multi-label classification. This approach treats each label as a class in binary classification task and aggregates it separately.

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)
    • [ ] Documentation and examples improvement (changes affected documentation and/or examples)

    Checklist:

    • [x] I have read the CONTRIBUTING document.
    • [x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [x] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    opened by denaxen 2
  • Use mypy --strict

    Use mypy --strict

    Description

    This pull request enforces a stricter set of mypy type checks by enabling the strict mode. It also fixes several type inconsistencies. As the NumPy type annotations were introduced in version 1.20 (January 2021), some Crowd-Kit installations might broke, but I believe it is a worthy contribution.

    Connected issues (if any)

    Types of changes

    • [x] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [x] Breaking change (fix or feature that would cause existing functionality to change)
    • [ ] Documentation and examples improvement (changes affected documentation and/or examples)

    Checklist:

    • [x] I have read the CONTRIBUTING document.
    • [x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [x] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    enhancement 
    opened by dustalov 2
  • Run Jupyter notebooks with tests

    Run Jupyter notebooks with tests

    Description

    This pull request runs the Jupyter notebooks with examples on the current version of Crowd-Kit with the rest of the test suite on GitHub Actions.

    Connected issues (if any)

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)
    • [x] Documentation and examples improvement (changes affected documentation and/or examples)

    Checklist:

    • [x] I have read the CONTRIBUTING document.
    • [x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [x] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    enhancement good first issue 
    opened by dustalov 2
  • Dramatically improve the code maintainability

    Dramatically improve the code maintainability

    This pull request is probably the best thing that could happen to Crowd-Kit code maintainability.

    Description

    In this pull request, we switch from unnecessarily verbose Python stub files to more convenient inline type annotations. During this, many type annotations were fixed. We also removed the manage_docstring decorator and the corresponding utility functions.

    I think this change might break the documentation generation process. We will release a new version of Crowd-Kit only after this is fixed.

    Connected issues (if any)

    Types of changes

    • [x] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [x] Breaking change (fix or feature that would cause existing functionality to change)
    • [x] Documentation and examples improvement (changes affected documentation and/or examples)

    Checklist:

    • [x] I have read the CONTRIBUTING document.
    • [x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
    • [x] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [x] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    bug documentation enhancement 
    opened by dustalov 2
  • Add header and LM-based aggregation item

    Add header and LM-based aggregation item

    Description

    This pull request makes README.md nicer. It adds the missing language model-based textual aggregation method.

    Connected issues (if any)

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)
    • [x] Documentation and examples improvement (changes affected documentation and/or examples)

    Checklist:

    • [x] I have read the CONTRIBUTING document.
    • [x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [ ] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    documentation 
    opened by dustalov 2
  • Renamed columns?

    Renamed columns?

    Hi, the guide says

    df = pd.read_csv('results.csv') # should contain columns: task, performer, label

    but when I load this file, then the second column is worker and not performer. I had used crowdkit with dataframes that had columns: task, performer, label, but after an update, it broke.

    opened by jcklie 2
  • Ordinal Labels

    Ordinal Labels

    Is it possible to support aggregation of ordinal labels as a part of this toolkit via this reduction algorithm.

    • Labels are categorical but have an ordering defined 1 < ... < K.
    • The K class ordinal labels are transformed into K−1 binary class label data.
    • Each of the binary task is then aggregated via crowdkit to estimate Pr[yi > c] for c = 1,...,K −1.
    • The probability of the actual class values can then be obtained as Pr[yi = c] = Pr[yi > c−1 and yi ≤ c] = Pr[yi > c−1]−Pr[yi > c].
    • The class with the maximum probability is assigned to the instance
    enhancement 
    opened by vikasraykar 2
Releases(v1.2.0)
Owner
Toloka
Data labeling platform for ML
Toloka
Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images"

GANInversion_with_ConsecutiveImgs Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images" https://a

QingyangXu 38 Dec 07, 2022
Given a 2D triangle mesh, we could randomly generate cloud points that fill in the triangle mesh

generate_cloud_points Given a 2D triangle mesh, we could randomly generate cloud points that fill in the triangle mesh. Run python disp_mesh.py Or you

Peng Yu 2 Dec 24, 2021
Using Self-Supervised Pretext Tasks for Active Learning - Official Pytorch Implementation

Using Self-Supervised Pretext Tasks for Active Learning - Official Pytorch Implementation Experiment Setting: CIFAR10 (downloaded and saved in ./DATA

John Seon Keun Yi 38 Dec 27, 2022
This repo contains the code for the paper "Efficient hierarchical Bayesian inference for spatio-temporal regression models in neuroimaging" that has been accepted to NeurIPS 2021.

Dugh-NeurIPS-2021 This repo contains the code for the paper "Efficient hierarchical Bayesian inference for spatio-temporal regression models in neuroi

Ali Hashemi 5 Jul 12, 2022
Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

DeepPanoContext (DPC) [Project Page (with interactive results)][Paper] DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context G

Cheng Zhang 66 Nov 16, 2022
PyGAD, a Python 3 library for building the genetic algorithm and training machine learning algorithms (Keras & PyTorch).

PyGAD: Genetic Algorithm in Python PyGAD is an open-source easy-to-use Python 3 library for building the genetic algorithm and optimizing machine lear

Ahmed Gad 1.1k Dec 26, 2022
Keyhole Imaging: Non-Line-of-Sight Imaging and Tracking of Moving Objects Along a Single Optical Path

Keyhole Imaging Code & Dataset Code associated with the paper "Keyhole Imaging: Non-Line-of-Sight Imaging and Tracking of Moving Objects Along a Singl

Stanford Computational Imaging Lab 20 Feb 03, 2022
Neural Articulated Radiance Field

Neural Articulated Radiance Field NARF Neural Articulated Radiance Field Atsuhiro Noguchi, Xiao Sun, Stephen Lin, Tatsuya Harada ICCV 2021 [Paper] [Co

Atsuhiro Noguchi 144 Jan 03, 2023
A simple Python library for stochastic graphical ecological models

What is Viridicle? Viridicle is a library for simulating stochastic graphical ecological models. It implements the continuous time models described in

Theorem Engine 0 Dec 04, 2021
PyTorch implementation of ''Background Activation Suppression for Weakly Supervised Object Localization''.

Background Activation Suppression for Weakly Supervised Object Localization PyTorch implementation of ''Background Activation Suppression for Weakly S

35 Jan 06, 2023
Python scripts for performing 3D human pose estimation using the Mobile Human Pose model in ONNX.

Python scripts for performing 3D human pose estimation using the Mobile Human Pose model in ONNX.

Ibai Gorordo 99 Dec 31, 2022
Ros2-voiceroid2 - ROS2 wrapper package of VOICEROID2

ros2_voiceroid2 ROS2 wrapper package of VOICEROID2 Windows Only Installation Ins

Nkyoku 1 Jan 23, 2022
A modular, research-friendly framework for high-performance and inference of sequence models at many scales

T5X T5X is a modular, composable, research-friendly framework for high-performance, configurable, self-service training, evaluation, and inference of

Google Research 1.1k Jan 08, 2023
PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility

PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility Jae Yong Lee, Joseph DeGol, Chuhang Zou, Derek Hoiem Installation To install nece

31 Apr 19, 2022
Time-stretch audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.

Time-stretch audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.

Kento Nishi 22 Jul 07, 2022
Model-based 3D Hand Reconstruction via Self-Supervised Learning, CVPR2021

S2HAND: Model-based 3D Hand Reconstruction via Self-Supervised Learning S2HAND presents a self-supervised 3D hand reconstruction network that can join

Yujin Chen 72 Dec 12, 2022
Easy to use and customizable SOTA Semantic Segmentation models with abundant datasets in PyTorch

Semantic Segmentation Easy to use and customizable SOTA Semantic Segmentation models with abundant datasets in PyTorch Features Applicable to followin

sithu3 530 Jan 05, 2023
FIRA: Fine-Grained Graph-Based Code Change Representation for Automated Commit Message Generation

FIRA is a learning-based commit message generation approach, which first represents code changes via fine-grained graphs and then learns to generate commit messages automatically.

Van 21 Dec 30, 2022
Classification Modeling: Probability of Default

Credit Risk Modeling in Python Introduction: If you've ever applied for a credit card or loan, you know that financial firms process your information

Aktham Momani 2 Nov 07, 2022
PyTorch implementation of the paper Deep Networks from the Principle of Rate Reduction

Deep Networks from the Principle of Rate Reduction This repository is the official PyTorch implementation of the paper Deep Networks from the Principl

459 Dec 27, 2022