FEMDA: Robust classification with Flexible Discriminant Analysis in heterogeneous data

Last update: Sep 06, 2022

Overview

FEMDA: Robust classification with Flexible Discriminant Analysis in heterogeneous data

Flexible EM-Inspired Discriminant Analysis is a robust supervised classification algorithm that performs well in noisy and contaminated datasets.

Authors

Andrew Wang, University of Cambridge, Cambridge, UK Pierre Houdouin, CentraleSupélec, Paris, France

Instllation

pip install -i https://test.pypi.org/simple/ femda

Get started

>>> from sklearn.datasets import load_iris
>>> from femda import FEMDA
>>> X, y = load_iris(return_X_y=True)
>>> clf = FEMDA()
>>> clf.fit(X, y)
FEMDA()
>>> clf.score(X, y)
0.9666666666666667

Using a specific dataset...

>> FEMDA().fit(X_train, y_train).score(X_test, y_test) ...">

>>> import femda.experiments.preprocessing as pre
>>> X_train, y_train, X_test, y_test = pre.statlog(r"root\datasets\\")
>>> FEMDA().fit(X_train, y_train).score(X_test, y_test)
...

Using a sklearn.pipeline.Pipeline...

>>> from sklearn.datasets import load_digits
>>> from sklearn.pipeline import make_pipeline
>>> from sklearn.decomposition import PCA
>>> X, y = load_digits(return_X_y=True)
>>> pipe = make_pipeline(PCA(n_components=5), FEMDA()).fit(X, y)
>>> pipe.predict(X)
...

Run all experiments presented in the paper

>>> from femda.experiments import run_experiments()
>>> run_experiments()
...

See for more.

Abstract

Linear and Quadraic Discriminant Analysis are well-known classical methods but suffer heavily from non-Gaussian class distributions and are very non-robust in contaminated datasets. In this paper, we present a new discriminant analysis style classification algorithm that directly models noise and diverse shapes which can deal with a wide range of datasets.

Each data point is modelled by its own arbitrary Elliptically Symmetrical (ES) distribution and its own arbitrary scale parameter, modelling directly very heterogeneous, non-i.i.d datasets. We show that maximum-likelihood parameter estimation and classification are simple and fast under this model.

We highlight the flexibility of the model to a wide range of Elliptically Symmetrical distribution shapes and varying levels of contamination in synthetic datasets. Then, we show that our algorithm outperforms other robust methods on contaminated datasets from Computer Vision and NLP.

FEMDA: Robust classification with Flexible Discriminant Analysis in heterogeneous data

Related tags

Overview

FEMDA: Robust classification with Flexible Discriminant Analysis in heterogeneous data

Authors

Instllation

Get started

Run all experiments presented in the paper

Abstract

Owner

Equivariant GNN for the prediction of atomic multipoles up to quadrupoles.

Cross-Document Coreference Resolution

Train SN-GAN with AdaBelief

FAVD: Featherweight Assisted Vulnerability Discovery

Code for our method RePRI for Few-Shot Segmentation. Paper at http://arxiv.org/abs/2012.06166

A PyTorch implementation of EventProp [https://arxiv.org/abs/2009.08378], a method to train Spiking Neural Networks

A graph neural network (GNN) model to predict protein-protein interactions (PPI) with no sample features

[CVPR 2020] GAN Compression: Efficient Architectures for Interactive Conditional GANs

A library that allows for inference on probabilistic models

SEOVER: Sentence-level Emotion Orientation Vector based Conversation Emotion Recognition Model

Campsite Reservation Finder

Catch-all collection of generative art made using processing

A basic duplicate image detection service using perceptual image hash functions and nearest neighbor search, implemented using faiss, fastapi, and imagehash

A-ESRGAN aims to provide better super-resolution images by using multi-scale attention U-net discriminators.

Google Recaptcha solver.

AdaNet is a lightweight TensorFlow-based framework for automatically learning high-quality models with minimal expert intervention

The implementation of "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement"

The official PyTorch code implementation of "Human Trajectory Prediction via Counterfactual Analysis" in ICCV 2021.

Group R-CNN for Point-based Weakly Semi-supervised Object Detection (CVPR2022)

GAN-STEM-Conv2MultiSlice - Exploring Generative Adversarial Networks for Image-to-Image Translation in STEM Simulation