Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets

Overview

Crowd-Kit: Computational Quality Control for Crowdsourcing

GitHub Tests Codecov

Documentation

Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets. We strive to implement functionality that simplifies working with crowdsourced data.

Currently, Crowd-Kit contains:

  • implementations of commonly-used aggregation methods for categorical, pairwise, textual, and segmentation responses
  • metrics of uncertainty, consistency, and agreement with aggregate
  • loaders for popular crowdsourced datasets

The library is currently in a heavy development state, and interfaces are subject to change.

Installing

Installing Crowd-Kit is as easy as pip install crowd-kit

Getting Started

This example shows how to use Crowd-Kit for categorical aggregation using the classical Dawid-Skene algorithm.

First, let us do all the necessary imports.

from crowdkit.aggregation import DawidSkene
from crowdkit.datasets import load_dataset

import pandas as pd

Then, you need to read your annotations into Pandas DataFrame with columns task, performer, label. Alternatively, you can download an example dataset.

df = pd.read_csv('results.csv')  # should contain columns: task, performer, label
# df, ground_truth = load_dataset('relevance-2')  # or download an example dataset

Then you can aggregate the performer responses as easily as in scikit-learn:

aggregated_labels = DawidSkene(n_iter=100).fit_predict(df)

More usage examples

Implemented Aggregation Methods

Below is the list of currently implemented methods, including the already available () and in progress ( 🟡 ).

Categorical Responses

Method Status
Majority Vote
Dawid-Skene
Gold Majority Vote
M-MSR
Wawa
Zero-Based Skill
GLAD
BCC 🟡

Textual Responses

Method Status
RASA
HRRASA
ROVER

Image Segmentation

Method Status
Segmentation MV
Segmentation RASA
Segmentation EM

Pairwise Comparisons

Method Status
Bradley-Terry
Noisy Bradley-Terry

Citation

@inproceedings{HCOMP2021/CrowdKit,
  author    = {Ustalov, Dmitry and Pavlichenko, Nikita and Losev, Vladimir and Giliazev, Iulian and Tulin, Evgeny},
  title     = {{A General-Purpose Crowdsourcing Computational Quality Control Toolkit for Python}},
  year      = {2021},
  booktitle = {The Ninth AAAI Conference on Human Computation and Crowdsourcing: Works-in-Progress and Demonstration Track},
  series    = {HCOMP~2021},
  eprint    = {2109.08584},
  eprinttype = {arxiv},
  eprintclass = {cs.HC},
  url       = {https://www.humancomputation.com/assets/wips_demos/HCOMP_2021_paper_85.pdf},
  language  = {english},
}

Questions and Bug Reports

License

© YANDEX LLC, 2020-2021. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.

Comments
  • Crowd-Kit Learning

    Crowd-Kit Learning

    This is just an example of what this subpackage will contain.

    We need to configure setup.cfg and add new tests. Here I suggest to discuss the concept.

    opened by pilot7747 10
  • Fix the documentation generation issues

    Fix the documentation generation issues

    Stick to YAML files hosted in https://github.com/Toloka/docs and use the proper includes.

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)
    • [x] Documentation and examples improvement (changes affected documentation and/or examples)

    Checklist:

    • [x] I have read the CONTRIBUTING document.
    • [x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
    • [x] My change requires a change to the documentation.
    • [x] I have updated the documentation accordingly.
    • [ ] I have added tests to cover my changes.
    • [ ] All new and existing tests passed.
    documentation enhancement 
    opened by dustalov 9
  • Add MACE

    Add MACE

    Is it possible that you add MACE ? It is often used in my field but there is only a Java implementation that is hard to integrate into Python projects.

    enhancement good first issue 
    opened by jcklie 4
  • Add MACE aggregation model

    Add MACE aggregation model

    I have added the MACE aggregation model. https://www.cs.cmu.edu/~hovy/papers/13HLT-MACE.pdf

    Description

    Based on the original VB inference implementation, I wrote it in Python.

    Connected issues (if any)

    https://github.com/Toloka/crowd-kit/issues/5

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)
    • [ ] Documentation and examples improvement (changes affected documentation and/or examples)

    Checklist:

    • [x] I have read the CONTRIBUTING document.
    • [x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
    • [x] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [x] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    opened by pilot7747 3
  • Documentation updates

    Documentation updates

    Updated index.md and the Classification section:

    1. added extra information to the models descriptions;
    2. added descriptions for parameters;
    3. fixed error and typos in descriptions.
    opened by Natalyl3 2
  • Binary Relevance aggregation

    Binary Relevance aggregation

    Description

    I have added code for Binary Relevance aggregation - simple method for multi-label classification. This approach treats each label as a class in binary classification task and aggregates it separately.

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [x] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)
    • [ ] Documentation and examples improvement (changes affected documentation and/or examples)

    Checklist:

    • [x] I have read the CONTRIBUTING document.
    • [x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [x] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    opened by denaxen 2
  • Use mypy --strict

    Use mypy --strict

    Description

    This pull request enforces a stricter set of mypy type checks by enabling the strict mode. It also fixes several type inconsistencies. As the NumPy type annotations were introduced in version 1.20 (January 2021), some Crowd-Kit installations might broke, but I believe it is a worthy contribution.

    Connected issues (if any)

    Types of changes

    • [x] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [x] Breaking change (fix or feature that would cause existing functionality to change)
    • [ ] Documentation and examples improvement (changes affected documentation and/or examples)

    Checklist:

    • [x] I have read the CONTRIBUTING document.
    • [x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [x] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    enhancement 
    opened by dustalov 2
  • Run Jupyter notebooks with tests

    Run Jupyter notebooks with tests

    Description

    This pull request runs the Jupyter notebooks with examples on the current version of Crowd-Kit with the rest of the test suite on GitHub Actions.

    Connected issues (if any)

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)
    • [x] Documentation and examples improvement (changes affected documentation and/or examples)

    Checklist:

    • [x] I have read the CONTRIBUTING document.
    • [x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [x] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    enhancement good first issue 
    opened by dustalov 2
  • Dramatically improve the code maintainability

    Dramatically improve the code maintainability

    This pull request is probably the best thing that could happen to Crowd-Kit code maintainability.

    Description

    In this pull request, we switch from unnecessarily verbose Python stub files to more convenient inline type annotations. During this, many type annotations were fixed. We also removed the manage_docstring decorator and the corresponding utility functions.

    I think this change might break the documentation generation process. We will release a new version of Crowd-Kit only after this is fixed.

    Connected issues (if any)

    Types of changes

    • [x] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [x] Breaking change (fix or feature that would cause existing functionality to change)
    • [x] Documentation and examples improvement (changes affected documentation and/or examples)

    Checklist:

    • [x] I have read the CONTRIBUTING document.
    • [x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
    • [x] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [x] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    bug documentation enhancement 
    opened by dustalov 2
  • Add header and LM-based aggregation item

    Add header and LM-based aggregation item

    Description

    This pull request makes README.md nicer. It adds the missing language model-based textual aggregation method.

    Connected issues (if any)

    Types of changes

    • [ ] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to change)
    • [x] Documentation and examples improvement (changes affected documentation and/or examples)

    Checklist:

    • [x] I have read the CONTRIBUTING document.
    • [x] I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
    • [ ] My change requires a change to the documentation.
    • [ ] I have updated the documentation accordingly.
    • [ ] I have added tests to cover my changes.
    • [x] All new and existing tests passed.
    documentation 
    opened by dustalov 2
  • Renamed columns?

    Renamed columns?

    Hi, the guide says

    df = pd.read_csv('results.csv') # should contain columns: task, performer, label

    but when I load this file, then the second column is worker and not performer. I had used crowdkit with dataframes that had columns: task, performer, label, but after an update, it broke.

    opened by jcklie 2
  • Ordinal Labels

    Ordinal Labels

    Is it possible to support aggregation of ordinal labels as a part of this toolkit via this reduction algorithm.

    • Labels are categorical but have an ordering defined 1 < ... < K.
    • The K class ordinal labels are transformed into K−1 binary class label data.
    • Each of the binary task is then aggregated via crowdkit to estimate Pr[yi > c] for c = 1,...,K −1.
    • The probability of the actual class values can then be obtained as Pr[yi = c] = Pr[yi > c−1 and yi ≤ c] = Pr[yi > c−1]−Pr[yi > c].
    • The class with the maximum probability is assigned to the instance
    enhancement 
    opened by vikasraykar 2
Releases(v1.2.0)
Owner
Toloka
Data labeling platform for ML
Toloka
A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms This repo contains the source code to reproduce the results in the paper A Close

Costa Huang 73 Dec 24, 2022
Rot-Pro: Modeling Transitivity by Projection in Knowledge Graph Embedding

Rot-Pro : Modeling Transitivity by Projection in Knowledge Graph Embedding This repository contains the source code for the Rot-Pro model, presented a

Tewi 9 Sep 28, 2022
RobustART: Benchmarking Robustness on Architecture Design and Training Techniques

The first comprehensive Robustness investigation benchmark on large-scale dataset ImageNet regarding ARchitecture design and Training techniques towards diverse noises.

132 Dec 23, 2022
[NeurIPS'20] Multiscale Deep Equilibrium Models

Multiscale Deep Equilibrium Models 💥 💥 💥 💥 This repo is deprecated and we will soon stop actively maintaining it, as a more up-to-date (and simple

CMU Locus Lab 221 Dec 26, 2022
This repository consists of Blender python scripts and corresponding assets to generate variants of the CANDLE dataset

candle-simulator This repository consists of Blender python scripts and corresponding assets to generate variants of the IITH-CANDLE dataset. The rend

1 Dec 15, 2021
[AAAI22] Reliable Propagation-Correction Modulation for Video Object Segmentation

Reliable Propagation-Correction Modulation for Video Object Segmentation (AAAI22) Preview version paper of this work is available at: https://arxiv.or

Xiaohao Xu 70 Dec 04, 2022
[ICML 2021] Break-It-Fix-It: Learning to Repair Programs from Unlabeled Data

Break-It-Fix-It: Learning to Repair Programs from Unlabeled Data This repo provides the source code & data of our paper: Break-It-Fix-It: Unsupervised

Michihiro Yasunaga 86 Nov 30, 2022
[NeurIPS2021] Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks

Exploring Architectural Ingredients of Adversarially Robust Deep Neural Networks Code for NeurIPS 2021 Paper "Exploring Architectural Ingredients of A

Hanxun Huang 26 Dec 01, 2022
Retinal vessel segmentation based on GT-UNet

Retinal vessel segmentation based on GT-UNet Introduction This project is a retinal blood vessel segmentation code based on UNet-like Group Transforme

Kent0n 27 Dec 18, 2022
A program that uses computer vision to detect hand gestures, used for controlling movie players.

HandGestureDetection This program uses a Haar Cascade algorithm to detect the presence of your hand, and then passes it on to a self-created and self-

2 Nov 22, 2022
Multi-robot collaborative exploration and mapping through Voronoi partition and DRL in unknown environment

Voronoi Multi_Robot Collaborate Exploration Introduction In the unknown environment, the cooperative exploration of multiple robots is completed by Vo

PeaceWord 6 Nov 22, 2022
XViT - Space-time Mixing Attention for Video Transformer

XViT - Space-time Mixing Attention for Video Transformer This is the official implementation of the XViT paper: @inproceedings{bulat2021space, title

Adrian Bulat 33 Dec 23, 2022
Implementation of "Semi-supervised Domain Adaptive Structure Learning"

Semi-supervised Domain Adaptive Structure Learning - ASDA This repo contains the source code and dataset for our ASDA paper. Illustration of the propo

3 Dec 13, 2021
Code and data for "TURL: Table Understanding through Representation Learning"

TURL This Repo contains code and data for "TURL: Table Understanding through Representation Learning". Environment and Setup Data Pretraining Finetuni

SunLab-OSU 63 Nov 23, 2022
PyTorch implementation of SCAFFOLD (Stochastic Controlled Averaging for Federated Learning, ICML 2020).

Scaffold-Federated-Learning PyTorch implementation of SCAFFOLD (Stochastic Controlled Averaging for Federated Learning, ICML 2020). Environment numpy=

KI 30 Dec 29, 2022
🎓Automatically Update CV Papers Daily using Github Actions (Update at 12:00 UTC Every Day)

🎓Automatically Update CV Papers Daily using Github Actions (Update at 12:00 UTC Every Day)

Realcat 270 Jan 07, 2023
A library for using chemistry in your applications

Chemistry in python Resources Used The following items are not made by me! Click the words to go to the original source Periodic Tab Json - Used in -

Tech Penguin 28 Dec 17, 2021
Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX.

ONNX Object Localization Network Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX. Ori

Ibai Gorordo 15 Oct 14, 2022
Source code for our Paper "Learning in High-Dimensional Feature Spaces Using ANOVA-Based Matrix-Vector Multiplication"

NFFT4ANOVA Source code for our Paper "Learning in High-Dimensional Feature Spaces Using ANOVA-Based Matrix-Vector Multiplication" This package uses th

Theresa Wagner 1 Aug 10, 2022
PyTorch implementations of Top-N recommendation, collaborative filtering recommenders.

PyTorch implementations of Top-N recommendation, collaborative filtering recommenders.

Yoonki Jeong 129 Dec 22, 2022