DuBE: Duple-balanced Ensemble Learning from Skewed Data

Overview

DuBE: Duple-balanced Ensemble Learning from Skewed Data

"Towards Inter-class and Intra-class Imbalance in Class-imbalanced Learning"
(IEEE ICDE 2022 Submission) [Documentation] [Examples]

DuBE is an ensemble learning framework for (multi)class-imbalanced classification. It is an easy-to-use solution to imbalanced learning problems, features good performance, computing efficiency, and wide compatibility with different learning models. Documentation and examples are available at https://duplebalance.readthedocs.io.

Table of Contents

Background

Imbalanced Learning (IL) is an important problem that widely exists in data mining applications. Typical IL methods utilize intuitive class-wise resampling or reweighting to directly balance the training set. However, some recent research efforts in specific domains show that class-imbalanced learning can be achieved without class-wise manipulation. This prompts us to think about the relationship between the two different IL strategies and the nature of the class imbalance. Fundamentally, they correspond to two essential imbalances that exist in IL: the difference in quantity between examples from different classes as well as between easy and hard examples within a single class, i.e., inter-class and intra-class imbalance.

image

Existing works fail to explicitly take both imbalances into account and thus suffer from suboptimal performance. In light of this, we present Duple-Balanced Ensemble, namely DUBE, a versatile ensemble learning framework. Unlike prevailing methods, DUBE directly performs inter-class and intra-class balancing without relying on heavy distance-based computation, which allows it to achieve competitive performance while being computationally efficient.

image

Install

Our DuBE implementation requires following dependencies:

You can install DuBE by clone this repository:

git clone https://github.com/ICDE2022Sub/duplebalance.git
cd duplebalance
pip install .

Usage

For more detailed usage example, please see Examples.

A minimal working example:

# load dataset & prepare environment
from duplebalance import DupleBalanceClassifier
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_classes=3,
                           n_informative=4, weights=[0.2, 0.3, 0.5],
                           random_state=0)

# ensemble training
clf = DupleBalanceClassifier(
    n_estimators=10,
    random_state=42,
    ).fit(X_train, y_train)

# predict
y_pred_test = clf.predict_proba(X_test)

Documentation

For more detailed API references, please see API reference.

Our DupleBalance implementation can be used much in the same way as the ensemble classifiers in sklearn.ensemble. The DupleBalanceClassifier class inherits from the sklearn.ensemble.BaseEnsemble base class.

Main parameters are listed below:

Parameters Description
base_estimator object, optional (default=sklearn.tree.DecisionTreeClassifier())
The base estimator to fit on self-paced under-sampled subsets of the dataset. NO need to support sample weighting. Built-in fit(), predict(), predict_proba() methods are required.
n_estimators int, optional (default=10)
The number of base estimators in the ensemble.
resampling_target {'hybrid', 'under', 'over', 'raw'}, default="hybrid"
Determine the number of instances to be sampled from each class (inter-class balancing).
- If under, perform under-sampling. The class containing the fewest samples is considered the minority class :math:c_{min}. All other classes are then under-sampled until they are of the same size as :math:c_{min}.
- If over, perform over-sampling. The class containing the argest number of samples is considered the majority class :math:c_{maj}. All other classes are then over-sampled until they are of the same size as :math:c_{maj}.
- If hybrid, perform hybrid-sampling. All classes are under/over-sampled to the average number of instances from each class.
- If raw, keep the original size of all classes when resampling.
resampling_strategy {'hem', 'shem', 'uniform'}, default="shem")
Decide how to assign resampling probabilities to instances during ensemble training (intra-class balancing).
- If hem, perform hard-example mining. Assign probability with respect to instance's latest prediction error.
- If shem, perform soft hard-example mining. Assign probability by inversing the classification error density.
- If uniform, assign uniform probability, i.e., random resampling.
perturb_alpha float or str, optional (default='auto')
The multiplier of the calibrated Gaussian noise that was add on the sampled data. It determines the intensity of the perturbation-based augmentation. If 'auto', perturb_alpha will be automatically tuned using a subset of the given training data.
k_bins int, optional (default=5)
The number of error bins that were used to approximate error distribution. It is recommended to set it to 5. One can try a larger value when the smallest class in the data set has a sufficient number (say, > 1000) of samples.
estimator_params list of str, optional (default=tuple())
The list of attributes to use as parameters when instantiating a new base estimator. If none are given, default parameters are used.
n_jobs int, optional (default=None)
The number of jobs to run in parallel for :meth:predict. None means 1 unless in a :obj:joblib.parallel_backend context. -1 means using all processors. See :term:Glossary <n_jobs> for more details.
random_state int / RandomState instance / None, optional (default=None)
If integer, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by numpy.random.
verbose int, optional (default=0)
Controls the verbosity when fitting and predicting.
Code for the Active Speakers in Context Paper (CVPR2020)

Active Speakers in Context This repo contains the official code and models for the "Active Speakers in Context" CVPR 2020 paper. Before Training The c

43 Oct 14, 2022
TilinGNN: Learning to Tile with Self-Supervised Graph Neural Network (SIGGRAPH 2020)

TilinGNN: Learning to Tile with Self-Supervised Graph Neural Network (SIGGRAPH 2020) About The goal of our research problem is illustrated below: give

59 Dec 09, 2022
Code for the paper “The Peril of Popular Deep Learning Uncertainty Estimation Methods”

Uncertainty Estimation Methods Code for the paper “The Peril of Popular Deep Learning Uncertainty Estimation Methods” Reference If you use this code,

EPFL Machine Learning and Optimization Laboratory 4 Apr 05, 2022
Torch implementation of SegNet and deconvolutional network

Torch implementation of SegNet and deconvolutional network

Fedor Chervinskii 5 Jul 17, 2020
Implementation of Monocular Direct Sparse Localization in a Prior 3D Surfel Map (DSL)

DSL Project page: https://sites.google.com/view/dsl-ram-lab/ Monocular Direct Sparse Localization in a Prior 3D Surfel Map Authors: Haoyang Ye, Huaiya

Haoyang Ye 93 Nov 30, 2022
PyTorch implementation for the ICLR 2020 paper "Understanding the Limitations of Variational Mutual Information Estimators"

Smoothed Mutual Information ``Lower Bound'' Estimator PyTorch implementation for the ICLR 2020 paper Understanding the Limitations of Variational Mutu

50 Nov 09, 2022
Capture all information throughout your model's development in a reproducible way and tie results directly to the model code!

Rubicon Purpose Rubicon is a data science tool that captures and stores model training and execution information, like parameters and outcomes, in a r

Capital One 97 Jan 03, 2023
TCPNet - Temporal-attentive-Covariance-Pooling-Networks-for-Video-Recognition

Temporal-attentive-Covariance-Pooling-Networks-for-Video-Recognition This is an implementation of TCPNet. Introduction For video recognition task, a g

Zilin Gao 21 Dec 08, 2022
High accurate tool for automatic faces detection with landmarks

faces_detanator High accurate tool for automatic faces detection with landmarks. The library is based on public detectors with high accuracy (TinaFace

Ihar 7 May 10, 2022
Official PyTorch Implementation of Convolutional Hough Matching Networks, CVPR 2021 (oral)

Convolutional Hough Matching Networks This is the implementation of the paper "Convolutional Hough Matching Network" by J. Min and M. Cho. Implemented

Juhong Min 70 Nov 22, 2022
Reproduce partial features of DeePMD-kit using PyTorch.

DeePMD-kit on PyTorch For better understand DeePMD-kit, we implement its partial features using PyTorch and expose interface consuing descriptors. Tec

Shaochen Shi 8 Dec 17, 2022
Semi-Supervised Semantic Segmentation via Adaptive Equalization Learning, NeurIPS 2021 (Spotlight)

Semi-Supervised Semantic Segmentation via Adaptive Equalization Learning, NeurIPS 2021 (Spotlight) Abstract Due to the limited and even imbalanced dat

Hanzhe Hu 99 Dec 12, 2022
Image Captioning on google cloud platform based on iot

Image-Captioning-on-google-cloud-platform-based-on-iot - Image Captioning on google cloud platform based on iot

Shweta_kumawat 1 Jan 20, 2022
Python Tensorflow 2 scripts for detecting objects of any class in an image without knowing their label.

Tensorflow-Mobile-Generic-Object-Localizer Python Tensorflow 2 scripts for detecting objects of any class in an image without knowing their label. Ori

Ibai Gorordo 11 Nov 15, 2022
StyleGAN2-ADA - Official PyTorch implementation

Need Help? If you’re new to StyleGAN2-ADA and looking to get started, please check out this video series from a course Lia Coleman and I taught in Oct

Derrick Schultz 217 Jan 04, 2023
How Effective is Incongruity? Implications for Code-mix Sarcasm Detection.

Code for the paper: How Effective is Incongruity? Implications for Code-mix Sarcasm Detection - ICON ACL 2021

2 Jun 05, 2022
This repo in the implementation of EMNLP'21 paper "SPARQLing Database Queries from Intermediate Question Decompositions" by Irina Saparina, Anton Osokin

SPARQLing Database Queries from Intermediate Question Decompositions This repo is the implementation of the following paper: SPARQLing Database Querie

Yandex Research 20 Dec 19, 2022
Code for reproducing key results in the paper "InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets"

Status: Archive (code is provided as-is, no updates expected) InfoGAN Code for reproducing key results in the paper InfoGAN: Interpretable Representat

OpenAI 1k Dec 19, 2022
Official implementation of the paper ``Unifying Nonlocal Blocks for Neural Networks'' (ICCV'21)

Spectral Nonlocal Block Overview Official implementation of the paper: Unifying Nonlocal Blocks for Neural Networks (ICCV'21) Spectral View of Nonloca

91 Dec 14, 2022
Unofficial implementation of the ImageNet, CIFAR 10 and SVHN Augmentation Policies learned by AutoAugment using pillow

AutoAugment - Learning Augmentation Policies from Data Unofficial implementation of the ImageNet, CIFAR10 and SVHN Augmentation Policies learned by Au

Philip Popien 1.3k Jan 02, 2023