A framework for Quantification written in Python

Related tags

Deep LearningQuaPy
Overview

QuaPy

QuaPy is an open source framework for quantification (a.k.a. supervised prevalence estimation, or learning to quantify) written in Python.

QuaPy is based on the concept of "data sample", and provides implementations of the most important aspects of the quantification workflow, such as (baseline and advanced) quantification methods, quantification-oriented model selection mechanisms, evaluation measures, and evaluations protocols used for evaluating quantification methods. QuaPy also makes available commonly used datasets, and offers visualization tools for facilitating the analysis and interpretation of the experimental results.

Installation

pip install quapy

A quick example:

The following script fetches a dataset of tweets, trains, applies, and evaluates a quantifier based on the Adjusted Classify & Count quantification method, using, as the evaluation measure, the Mean Absolute Error (MAE) between the predicted and the true class prevalence values of the test set.

import quapy as qp
from sklearn.linear_model import LogisticRegression

dataset = qp.datasets.fetch_twitter('semeval16')

# create an "Adjusted Classify & Count" quantifier
model = qp.method.aggregative.ACC(LogisticRegression())
model.fit(dataset.training)

estim_prevalence = model.quantify(dataset.test.instances)
true_prevalence  = dataset.test.prevalence()

error = qp.error.mae(true_prevalence, estim_prevalence)

print(f'Mean Absolute Error (MAE)={error:.3f}')

Quantification is useful in scenarios characterized by prior probability shift. In other words, we would be little interested in estimating the class prevalence values of the test set if we could assume the IID assumption to hold, as this prevalence would be roughly equivalent to the class prevalence of the training set. For this reason, any quantification model should be tested across many samples, even ones characterized by class prevalence values different or very different from those found in the training set. QuaPy implements sampling procedures and evaluation protocols that automate this workflow. See the Wiki for detailed examples.

Features

  • Implementation of many popular quantification methods (Classify-&-Count and its variants, Expectation Maximization, quantification methods based on structured output learning, HDy, QuaNet, and quantification ensembles).
  • Versatile functionality for performing evaluation based on artificial sampling protocols.
  • Implementation of most commonly used evaluation metrics (e.g., AE, RAE, SE, KLD, NKLD, etc.).
  • Datasets frequently used in quantification (textual and numeric), including:
    • 32 UCI Machine Learning datasets.
    • 11 Twitter quantification-by-sentiment datasets.
    • 3 product reviews quantification-by-sentiment datasets.
  • Native support for binary and single-label multiclass quantification scenarios.
  • Model selection functionality that minimizes quantification-oriented loss functions.
  • Visualization tools for analysing the experimental results.

Requirements

  • scikit-learn, numpy, scipy
  • pytorch (for QuaNet)
  • svmperf patched for quantification (see below)
  • joblib
  • tqdm
  • pandas, xlrd
  • matplotlib

SVM-perf with quantification-oriented losses

In order to run experiments involving SVM(Q), SVM(KLD), SVM(NKLD), SVM(AE), or SVM(RAE), you have to first download the svmperf package, apply the patch svm-perf-quantification-ext.patch, and compile the sources. The script prepare_svmperf.sh does all the job. Simply run:

./prepare_svmperf.sh

The resulting directory svm_perf_quantification contains the patched version of svmperf with quantification-oriented losses.

The svm-perf-quantification-ext.patch is an extension of the patch made available by Esuli et al. 2015 that allows SVMperf to optimize for the Q measure as proposed by Barranquero et al. 2015 and for the KLD and NKLD measures as proposed by Esuli et al. 2015. This patch extends the above one by also allowing SVMperf to optimize for AE and RAE.

Wiki

Check out our Wiki, in which many examples are provided:

Comments
  • Couldn't train QuaNet on multiclass data

    Couldn't train QuaNet on multiclass data

    Hi, I am having trouble in training a QuaNet quantifier for multiclass (20) data. Everything works fine with where my dataset only has 2 classes. It looks like the ACC quantifier is not able to aggregate from more than 2 classes?

    The classifier is built and trained as with the code below

    classifier = LSTMnet(dataset.vocabulary_size, dataset.n_classes)
    learner = NeuralClassifierTrainer(classifier)
    learner.fit(*dataset.training.Xy)
    

    where it has all the default configurations

    {'embedding_size': 100, 'hidden_size': 256, 'repr_size': 100, 'lstm_class_nlayers': 1, 'drop_p': 0.5}

    Then I tried to train QuaNet with following code

    model = QuaNetTrainer(learner, qp.environ['SAMPLE_SIZE'])
    model.fit(dataset.training, fit_learner=False)
    

    and it showed that QuaNet is built as

    QuaNetModule( (lstm): LSTM(120, 64, batch_first=True, dropout=0.5, bidirectional=True) (dropout): Dropout(p=0.5, inplace=False) (ff_layers): ModuleList( (0): Linear(in_features=208, out_features=1024, bias=True) (1): Linear(in_features=1024, out_features=512, bias=True) ) (output): Linear(in_features=512, out_features=20, bias=True) )

    And then the error occured in model.fit().

    Attached is the error I get.

    Traceback (most recent call last): File "quanet-test.py", line 181, in model.fit(dataset.training, fit_learner=False) File "/home/vickys/.local/lib/python3.6/site-packages/quapy/method/neural.py", line 126, in fit self.epoch(train_data_embed, train_posteriors, self.tr_iter, epoch_i, early_stop, train=True) File "/home/vickys/.local/lib/python3.6/site-packages/quapy/method/neural.py", line 182, in epoch quant_estims = self.get_aggregative_estims(sample_posteriors) File "/home/vickys/.local/lib/python3.6/site-packages/quapy/method/neural.py", line 145, in get_aggregative_estims prevs_estim.extend(quantifier.aggregate(predictions)) File "/home/vickys/.local/lib/python3.6/site-packages/quapy/method/aggregative.py", line 238, in aggregate return ACC.solve_adjustment(self.Pte_cond_estim_, prevs_estim) File "/home/vickys/.local/lib/python3.6/site-packages/quapy/method/aggregative.py", line 246, in solve_adjustment adjusted_prevs = np.linalg.solve(A, B) File "<array_function internals>", line 6, in solve File "/usr/local/lib64/python3.6/site-packages/numpy/linalg/linalg.py", line 394, in solve r = gufunc(a, b, signature=signature, extobj=extobj) ValueError: solve1: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (m,m),(m)->(m) (size 2 is different from 20)

    Thank you!

    opened by vickysvicky 4
  • Parameter fit_learner in QuaNetTrainer (fit method)

    Parameter fit_learner in QuaNetTrainer (fit method)

    The parameter fit_leaner is not used in the function:

    def fit(self, data: LabelledCollection, fit_learner=True):

    and the learner is fitted every time:

    self.learner.fit(*classifier_data.Xy)

    opened by pglez82 1
  • Wiki correction

    Wiki correction

    In the last part of the Methods wiki page, where it says:

    from classification.neural import NeuralClassifierTrainer, CNNnet

    I think it should say:

    from quapy.classification.neural import NeuralClassifierTrainer, LSTMnet

    opened by pglez82 1
  • Error in LSTMnet

    Error in LSTMnet

    I think there is the function init_hidden:

    def init_hidden(self, set_size):
            opt = self.hyperparams
            var_hidden = torch.zeros(opt['lstm_nlayers'], set_size, opt['lstm_hidden_size'])
            var_cell = torch.zeros(opt['lstm_nlayers'], set_size, opt['lstm_hidden_size'])
            if next(self.lstm.parameters()).is_cuda:
                var_hidden, var_cell = var_hidden.cuda(), var_cell.cuda()
            return var_hidden, var_cell
    

    Where it says opt['lstm_hidden_size'] should be opt['hidden_size']

    opened by pglez82 1
  • EMQ can be instantiated with a transformation function

    EMQ can be instantiated with a transformation function

    This transformation function is applied to each intermediate estimate.

    Why should someone want to transform the prior between two iterations? A transformation of the prior is a heuristic, yet effective way of promoting desired properties of the solution. For instance,

    • small values could be enhanced if the data is extremely imbalanced
    • small values could be reduced if the user is looking for a sparse solution
    • neighboring values could be averaged if the user is looking for a smooth solution
    • the function could also leave the prior unaltered and just be used as a callback for logging the progress of the method

    I hope this feature is useful. Let me know what you think!

    opened by mirkobunse 0
  • fixing two problems with parameters: hidden_size and lstm_nlayers

    fixing two problems with parameters: hidden_size and lstm_nlayers

    I found another problem with a parameter. When using LSTMnet with QuaNet two parameters overlap (lstm_nlayers). I have renamed the one in the LSTMnet to lstm_class_nlayers.

    opened by pglez82 0
  • Using a different gpu than cuda:0

    Using a different gpu than cuda:0

    The code seems to be tied up to using only 'cuda', which by default uses the first gpu in the system ('cuda:0'). It would be handy to be able to tell the library in which cuda gpu you want to train (cuda:0, cuda:1, etc).

    opened by pglez82 0
Releases(0.1.6)
Owner
The Human Language Technologies group of ISTI-CNR
Teaching end to end workflow of deep learning

Deep-Education This repository is now available for public use for teaching end to end workflow of deep learning. This implies that learners/researche

Data Lab at College of William and Mary 2 Sep 26, 2022
A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Torch and Numpy.

Visdom A flexible tool for creating, organizing, and sharing visualizations of live, rich data. Supports Python. Overview Concepts Setup Usage API To

FOSSASIA 9.4k Jan 07, 2023
Repository for the Bias Benchmark for QA dataset.

BBQ Repository for the Bias Benchmark for QA dataset. Authors: Alicia Parrish, Angelica Chen, Nikita Nangia, Vishakh Padmakumar, Jason Phang, Jana Tho

ML² AT CILVR 18 Nov 18, 2022
Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set (CVPRW 2019). A PyTorch implementation.

Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set —— PyTorch implementation This is an unofficial offici

Sicheng Xu 833 Dec 28, 2022
Redash reset for python

redash-reset This will use a default REDASH_SECRET_KEY key of c292a0a3aa32397cdb050e233733900f this allows you to reset the password of the user ID bu

Robert Wiggins 5 Nov 14, 2022
Check out the StyleGAN repo and place it in the same directory hierarchy as the present repo

Variational Model Inversion Attacks Kuan-Chieh Wang, Yan Fu, Ke Li, Ashish Khisti, Richard Zemel, Alireza Makhzani Most commands are in run_scripts. W

Jackson Wang 15 Dec 26, 2022
Weakly-supervised semantic image segmentation with CNNs using point supervision

Code for our ECCV paper What's the Point: Semantic Segmentation with Point Supervision. Summary This library is a custom build of Caffe for semantic i

27 Sep 14, 2022
Official PyTorch code of Holistic 3D Scene Understanding from a Single Image with Implicit Representation (CVPR 2021)

Implicit3DUnderstanding (Im3D) [Project Page] Holistic 3D Scene Understanding from a Single Image with Implicit Representation Cheng Zhang, Zhaopeng C

Cheng Zhang 149 Jan 08, 2023
clDice - a Novel Topology-Preserving Loss Function for Tubular Structure Segmentation

README clDice - a Novel Topology-Preserving Loss Function for Tubular Structure Segmentation CVPR 2021 Authors: Suprosanna Shit and Johannes C. Paetzo

110 Dec 29, 2022
Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation".

PixelTransformer Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation". Project Page Installation Please insta

Shubham Tulsiani 24 Dec 17, 2022
Code accompanying the paper "Wasserstein GAN"

Wasserstein GAN Code accompanying the paper "Wasserstein GAN" A few notes The first time running on the LSUN dataset it can take a long time (up to an

3.1k Jan 01, 2023
Vertical Federated Principal Component Analysis and Its Kernel Extension on Feature-wise Distributed Data based on Pytorch Framework

VFedPCA+VFedAKPCA This is the official source code for the Paper: Vertical Federated Principal Component Analysis and Its Kernel Extension on Feature-

John 9 Sep 18, 2022
Implementation of "Learning to Match Features with Seeded Graph Matching Network" ICCV2021

SGMNet Implementation PyTorch implementation of SGMNet for ICCV'21 paper "Learning to Match Features with Seeded Graph Matching Network", by Hongkai C

87 Dec 11, 2022
EASY - Ensemble Augmented-Shot Y-shaped Learning: State-Of-The-Art Few-Shot Classification with Simple Ingredients.

EASY - Ensemble Augmented-Shot Y-shaped Learning: State-Of-The-Art Few-Shot Classification with Simple Ingredients. This repository is the official im

Yassir BENDOU 57 Dec 26, 2022
The Submission for SIMMC 2.0 Challenge 2021

The Submission for SIMMC 2.0 Challenge 2021 challenge website Requirements python 3.8.8 pytorch 1.8.1 transformers 4.8.2 apex for multi-gpu nltk Prepr

5 Jul 26, 2022
[CVPR 2021] Exemplar-Based Open-Set Panoptic Segmentation Network (EOPSN)

EOPSN: Exemplar-Based Open-Set Panoptic Segmentation Network (CVPR 2021) PyTorch implementation for EOPSN. We propose open-set panoptic segmentation t

Jaedong Hwang 49 Dec 30, 2022
This repository contains the code for the paper in EMNLP 2021: "HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression".

HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression This repository contains the code for the paper in EM

Chenhe Dong 2 Mar 24, 2022
A library for building and serving multi-node distributed faiss indices.

About Distributed faiss index service. A lightweight library that lets you work with FAISS indexes which don't fit into a single server memory. It fol

Meta Research 170 Dec 30, 2022
Cmsc11 arcade - Final Project for CMSC11

cmsc11_arcade Final Project for CMSC11 Developers: Limson, Mark Vincent Peñafiel

Gregory 1 Jan 18, 2022
Advanced Deep Learning with TensorFlow 2 and Keras (Updated for 2nd Edition)

Advanced Deep Learning with TensorFlow 2 and Keras (Updated for 2nd Edition)

Packt 1.5k Jan 03, 2023