Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for Machine Learning

Last update: Nov 10, 2022

Related tags

Overview

Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for Machine Learning

This repository provides an implementation of the paper Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for Machine Learning accepted at AISTATS 2022 as oral presentation. We propose a noise-reduced data valuation method, Beta Shapley, which is powerful at capturing the importance of data points.

Quick start

We provide a notebook using the Covertype dataset. It shows how to compute the Beta Shapley value and its application on several downstream ML tasks.

--> Beta Shapley can identify noisy samples by focusing marginal contributions on small cardinalities.

--> Beta Shapley on the CIFAR100 test dataset. Mislabeled data points have negative Beta Shapley values, meaning they actually harm the model performance. Beta Shapley can detect mislabeled points.

Files

betashap/ShapEngine.py: main class for computing Beta-Shapley.

betashap/data.py: handles loading and preprocessing datasets.

Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for Machine Learning

Related tags

Overview

Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for Machine Learning

Quick start

Files

Owner

Yongchan Kwon

Semi-Supervised Learning with Ladder Networks in Keras. Get 98% test accuracy on MNIST with just 100 labeled examples !

TensorFlow implementation of ENet, trained on the Cityscapes dataset.

3D ResNets for Action Recognition (CVPR 2018)

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" on Semantic Segmentation.

QilingLab challenge writeup

Classification of ecg datas for disease detection

Source code for paper "Deep Diffusion Models for Robust Channel Estimation", TBA.

Training Cifar-10 Classifier Using VGG16

Python package for visualizing the loss landscape of parameterized quantum algorithms.

Membership Inference Attack against Graph Neural Networks

Code for the paper "There is no Double-Descent in Random Forests"

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

Lightweight Cuda Renderer with Python Wrapper.

graph-theoretic framework for robust pairwise data association

Implementation of Hourglass Transformer, in Pytorch, from Google and OpenAI

Author's PyTorch implementation of Randomized Ensembled Double Q-Learning (REDQ) algorithm.

Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

《LightXML: Transformer with dynamic negative sampling for High-Performance Extreme Multi-label Text Classiﬁcation》(AAAI 2021) GitHub:

[CVPR 2021] Region-aware Adaptive Instance Normalization for Image Harmonization

Deep Dual Consecutive Network for Human Pose Estimation (CVPR2021)