Awesome Long-Tailed Learning

Overview

Awesome Long-Tailed Learning Awesome

This repo pays specially attention to the long-tailed distribution, where labels follow a long-tailed or power-law distribution in the training dataset or/and test dataset. Related papers are sumarized, including its application in computer vision, in particular image classification, and extreme multi-label learning (XML), in particular text categorization.

🔆 Updated 2021-09-27

Long-tailed Learning in Computer Vision

Type of Long-Tailed Learning Methods

Type TST IS CBS CLW NC ENS DA
Meaning Two-Stage Training Instance Sampling Class-Balanced Sampling Class-Level Weighting Normalized Classifier Ensemble Data Augmentation

Long-Tailed Learning Workshops

Year Venue Title Remark
2021 CVPR Open World Vision long-tail, open-set, streaming labels
2021 CVPR Learning from Limited and Imperfect Data (L2ID) label noise, SSL, long-tail

Long-Tailed Learning Papers

Year Venue Title Remark
2021 Arxiv LEARNING FROM LONG-TAILED DATA WITH NOISY LABELS
2021 ICCV Self Supervision to Distillation for Long-Tailed Visual Recognition
2021 ICCV Distilling Virtual Examples for Long-tailed Recognition
2021 CVPR Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification
2021 CVPR MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition
2021 CVPR Disentangling Label Distribution for Long-tailed Visual Recognition
2021 CVPR Long-Tailed Multi-Label Visual Recognition by Collaborative Training on Uniform and Re-Balanced Samplings
2021 CVPR Seesaw Loss for Long-Tailed Instance Segmentation
2021 ICLR IS LABEL SMOOTHING TRULY INCOMPATIBLE WITH KNOWLEDGE DISTILLATION: AN EMPIRICAL STUDY
2021 Arxiv Improving Long-Tailed Classification from Instance Level
2021 Arxiv DISTRIBUTION-AWARE SEMANTICS-ORIENTED PSEUDO-LABEL FOR IMBALANCED SEMI-SUPERVISED LEARNING SSL, Code
2021 Arxiv ResLT: Residual Learning for Long-tailed Recognition
2021 Arxiv Improving Long-Tailed Classification from Instance Level
2021 Arxiv Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces by Google
2021 Arxiv Breadcrumbs: Adversarial Class-Balanced Sampling for Long-tailed Recognition
2021 Arxiv Procrustean Training for Imbalanced Deep Learning
2021 Arxiv Balanced Knowledge Distillation for Long-tailed Learning CBS+IS, Code
2021 Arxiv Class-Balanced Distillation for Long-Tailed Visual Recognition ENS+DA+IS, by Google Research
2021 Arxiv Distributional Robustness Loss for Long-tail Learning TST+CBS
2021 CVPR Improving Calibration for Long-Tailed Recognition DA+TST, Code
2021 CVPR Distribution Alignment: A Unified Framework for Long-tail Visual Recognition TST
2021 CVPR Adversarial Robustness under Long-Tailed Distribution
2021 CVPR CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning by Google, Code, Tensorflow
2021 ICLR HETEROSKEDASTIC AND IMBALANCED DEEP LEARNING WITH ADAPTIVE REGULARIZATION Code
2021 ICLR LONG-TAILED RECOGNITION BY ROUTING DIVERSE DISTRIBUTION-AWARE EXPERTS ENS+NC, Code, by Zi-Wei Liu
2021 ICLR Long-Tail Learning via Logit Adjustment by Google
2021 AAAI Bag of Tricks for Long-Tailed Visual Recognition with Deep Convolutional Neural Networks
2021 Arxiv Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification
2020 Arxiv ELF: An Early-Exiting Framework for Long-Tailed Classification
2020 CVPR Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition from a Domain Adaptation Perspective
2020 CVPR Equalization Loss for Long-Tailed Object Recognition
2020 CVPR Deep Representation Learning on Long-tailed Data: A Learnable Embedding Augmentation Perspective
2020 ICLR Decoupling representation and classifier for long-tailed recognition Code
2020 NeurIPS Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning Code
2020 NeurIPS Rethinking the Value of Labels for Improving Class-Imbalanced Learning Code
2020 CVPR Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition Code
2019 NeurIPS Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss Code
2019 CVPR Large-Scale Long-Tailed Recognition in an Open World Code, bibtex, by CUHK
2018 - iNatrualist. The inaturalist 2018 competition dataset long-tailed dataset
2017 Arxiv The Devil is in the Tails: Fine-grained Classification in the Wild
2017 NeurIPS Learning to model the tail

eXtreme Multi-label Learning for Information Retrieval

Binary Relevance

Year Venue Title Remark
2019 Machine learning Data Scarcity, Robustness and Extreme Multi-label Classification
2019 WSDM Slice: Scalable linear extreme classifiers trained on 100 million labels for related searches
2017 KDD PPDSparse: A Parallel Primal-Dual Sparse Method for Extreme Classification
2017 AISTATS Label Filters for Large Scale Multilabel Classification
2016 WSDM DiSMEC - Distributed Sparse Machines for Extreme Multi-label Classification
2016 ICML PD-Sparse: A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification

Tree-based Methods

Year Venue Title Remark
2021 KDD Extreme Multi-label Learning for Semantic Matching in Product Search by Amazon, code
2020 arXiv Probabilistic Label Trees for Extreme Multi-label Classification PLT survey, code
2020 arXiv Online probabilistic label trees
2020 AISTATS LdSM: Logarithm-depth Streaming Multi-label Decision Trees Instance tree,c++ code
2019 NeurIPS AttentionXML: Extreme Multi-Label Text Classification with Multi-Label Attention Based Recurrent Neural Networks Label tree
2019 arXiv Bonsai - Diverse and Shallow Trees for Extreme Multi-label Classification Label tree
2018 ICML CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning Instance tree
2018 WWW Parabel: Partitioned Label Trees for Extreme Classification with Application to Dynamic Search Advertising Label tree...by Manik Varma
2016 ICML Extreme F-Measure Maximization using Sparse Probability Estimates Label tree
2016 KDD Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications Instance tree
2014 KDD A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning Instance tree, python implementation
2013 ICML Label Partitioning For Sublinear Ranking Label tree
2013 WWW Multi-Label Learning with Millions of Labels: Recommending Advertiser Bid Phrases for Web Pages Instance tree, Random Forest, Gini Index
2011 NeurIPS Efficient label tree learning for large scale object recognition Label tree, multi-class
2010 NeurIPS Label embedding trees for large multi-class tasks Label tree, multi-class
2008 ECML Workshop Effective and Efficient Multilabel Classification in Domains with Large Number of Labels Label tree

Embedding-based Methods

Year Venue Title Remark
2019 AAAI Distributional Semantics Meets Multi-Label Learning bibtex
2019 arXiv Ranking-Based Autoencoder for Extreme Multi-label Classification
2019 NeurIPS Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Ouput Spaces by Google Research
2017 KDD AnnexML: Approximate Nearest Neighbor Search for Extreme Multi-label Classification
2015 NeurIPS Sparse Local Embeddings for Extreme Multi-label Classification
2014 ICML Large-scale Multi-label Learning with Missing Labels
2014 ICML Multi-label Classification via Feature-aware Implicit Label Space Encoding
2013 ICML Efficient Multi-label Classification with Many Labels
2012 NeurIIPS Feature-aware Label Space Dimension Reduction for Multi-label Classification
2011 IJCAI WSABIE: Scaling Up To Large Vocabulary Image Annotation bibtex
2009 NeurIPS Multi-Label Prediction via Compressed Sensing
2008 KDD Extracting Shared Subspaces for Multi-label Classification

Speed-up and Compression

Year Venue Title Remark
2020 KDD Large-Scale Training System for 100-Million Classification at Alibaba Applied Data Science Track
2020 arXiv SOLAR: Sparse Orthogonal Learned and Random Embeddings
2020 ICLR EXTREME CLASSIFICATION VIA ADVERSARIAL SOFTMAX APPROXIMATION
2019 AISTATS Stochastic Negative Mining for Learning with Large Output Spaces by Google
2019 NeurIPS Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products Rice University, bibtex
2019 arXiv An Embarrassingly Simple Baseline for eXtreme Multi-label Prediction
2019 arXiv Accelerating Extreme Classification via Adaptive Feature Agglomeration bibtex, authors from IIT
2019 SDM Fast Training for Large-Scale One-versus-All Linear Classifiers using Tree-Structured Initialization code bibtex

Noval XML Settings

Year Venue Title Remark
2020 arXiv Extreme Multi-label Classification from Aggregated Labels by Inderjit Dhillon. This paper considers multi-instance learning in XML
2020 arXiv Unbiased Loss Functions for Extreme Classification With Missing Labels by Rohit Babbar. Missing labels
2020 ICML Deep Streaming Label Learning code, by Dacheng Tao, streaming multi-label learning
2016 arXiv Streaming Label Learning for Modeling Labels on the Fly by Dacheng Tao, streaming multi-label learning

Theoritical Studies

Year Venue Title Remark
2019 ICML Sparse Extreme Multi-label Learning with Oracle Property Code, by Weiwei Liu
2019 NeurIPS Multilabel reductions: what is my loss optimising? bibtex, by Google

Text Classification

Year Venue Title Remark
2021 ICML SiameseXML: Siamese Networks meet Extreme Classifiers with 100M Labels
2020 KDD Correlation Networks for Extreme Multi-label Text Classification code
2020 arXiv GNN-XML: Graph Neural Networks for Extreme Multi-label Text Classification
2020 ICML Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification code
2019 ACL Large-Scale Multi-Label Text Classification on EU Legislation Eur-Lex 4.3K, bibtex
2019 arXiv X-BERT: eXtreme Multi-label Text Classification with BERT code by Yiming Yang, Inderjit Dhillon
2019 NeurIPS AttentionXML: Extreme Multi-Label Text Classification with Multi-Label Attention Based Recurrent Neural Networks
2018 EMNLP Few-Shot and Zero-Shot Multi-Label Learning for Structured Label Spaces few-shot, zero-shot, evaluation metric
2018 NeurIPS A no-regret generalization of hierarchical softmax to extreme multi-label classification code, PLT code
2017 SIGIR Deep Learning for Extreme Multi-label Text Classification by Yiming Yang at CMU, bibtex

Others

Label Correlation

Year Venue Title Remark
2019 ICML DL2: Training and Querying Neural Networks with Logic
2015 KDD Discovering and Exploiting Deterministic Label Relationships in Multi-Label Learning
2010 KDD Multi-Label Learning by Exploiting Label Dependency

Long-tailed Continual Learning

Year Venue Title Remark
2020 ECCV Imbalanced Continual Learning with Partitioning Reservoir Sampling

Train/Test Split

Year Venue Title Remark
2021 Arxiv Stratified Sampling for Extreme Multi-Label Data

XML Seminar

Year Venue Title Remark
2019 Dagstuhl Seminar 18291 Extreme Classification

Survey References:

  1. https://arxiv.org/pdf/1901.00248.pdf
  2. http://www.iith.ac.in/~saketha/research/AkshatMTP2018.pdf
  3. http://manikvarma.org/pubs/bengio19.pdf
  4. The Emerging Trends of Multi-Label Learning

XML Datasets link

Extreme Classification Workshops link

Owner
Stomach_ache
Stomach_ache
Implementing Graph Convolutional Networks and Information Retrieval Mechanisms using pure Python and NumPy

Implementing Graph Convolutional Networks and Information Retrieval Mechanisms using pure Python and NumPy

Noah Getz 3 Jun 22, 2022
[CVPR2021 Oral] UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers This is the official PyTorch implementation and models for UP-DETR paper: @a

dddzg 430 Dec 23, 2022
Implementation of "A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement" by pytorch

This repository is used to suspend the results of our paper "A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement"

ScorpioMiku 19 Sep 30, 2022
Implementation of the paper "Shapley Explanation Networks"

Shapley Explanation Networks Implementation of the paper "Shapley Explanation Networks" at ICLR 2021. Note that this repo heavily uses the experimenta

68 Dec 27, 2022
Solution to the Weather4cast 2021 challenge

This code was used for the entry by the team "antfugue" for the Weather4cast 2021 Challenge. Below, you can find the instructions for generating predi

Jussi Leinonen 13 Jan 03, 2023
A demo of how to use JAX to create a simple gravity simulation

JAX Gravity This repo contains a demo of how to use JAX to create a simple gravity simulation. It uses JAX's experimental ode package to solve the dif

Cristian Garcia 16 Sep 22, 2022
A curated list of references for MLOps

A curated list of references for MLOps

Larysa Visengeriyeva 9.3k Jan 07, 2023
Script that attempts to force M1 macs into RGB mode when used with monitors that are defaulting to YPbPr.

fix_m1_rgb Script that attempts to force M1 macs into RGB mode when used with monitors that are defaulting to YPbPr. No warranty provided for using th

Kevin Gao 116 Jan 01, 2023
A tool to prepare websites grabbed with wget for local viewing.

makelocal A tool to prepare websites grabbed with wget for local viewing. exapmples After fetching xkcd.com with: wget -r -no-remove-listing -r -N --p

5 Apr 23, 2022
A unified 3D Transformer Pipeline for visual synthesis

Overview This is the official repo for the paper: "NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion". NÜWA is a unified multimodal

Microsoft 2.6k Jan 03, 2023
Semi-Supervised Signed Clustering Graph Neural Network (and Implementation of Some Spectral Methods)

SSSNET SSSNET: Semi-Supervised Signed Network Clustering For details, please read our paper. Environment Setup Overview The project has been tested on

Yixuan He 9 Nov 24, 2022
retweet 4 satoshi ⚡️

rt4sat retweet 4 satoshi This bot is the codebase for https://twitter.com/rt4sat please feel free to create an issue if you saw any bugs basically thi

6 Sep 30, 2022
Tensorflow port of a full NetVLAD network

netvlad_tf The main intention of this repo is deployment of a full NetVLAD network, which was originally implemented in Matlab, in Python. We provide

Robotics and Perception Group 225 Nov 08, 2022
Graph Attention Networks

GAT Graph Attention Networks (Veličković et al., ICLR 2018): https://arxiv.org/abs/1710.10903 GAT layer t-SNE + Attention coefficients on Cora Overvie

Petar Veličković 2.6k Jan 05, 2023
Put blind watermark into a text with python

text_blind_watermark Put blind watermark into a text. Can be used in Wechat dingding ... How to Use install pip install text_blind_watermark Alice Pu

郭飞 164 Dec 30, 2022
MG-GCN: Scalable Multi-GPU GCN Training Framework

MG-GCN MG-GCN: multi-GPU GCN training framework. For more information, please read our paper. After cloning our repository, run git submodule update -

Translational Data Analytics (TDA) Lab @GaTech 6 Oct 24, 2022
Uni-Fold: Training your own deep protein-folding models.

Uni-Fold: Training your own deep protein-folding models. This package provides and implementation of a trainable, Transformer-based deep protein foldi

DeepModeling 88 Jan 03, 2023
A toolset of Python programs for signal modeling and indentification via sparse semilinear autoregressors.

SPAAR Description A toolset of Python programs for signal modeling via sparse semilinear autoregressors. References Vides, F. (2021). Computing Semili

Fredy Vides 0 Oct 30, 2021
Plato: A New Framework for Federated Learning Research

a new software framework to facilitate scalable federated learning research.

System <a href=[email protected] Lab"> 192 Jan 05, 2023
Example scripts for the detection of lanes using the ultra fast lane detection model in Tensorflow Lite.

TFlite Ultra Fast Lane Detection Inference Example scripts for the detection of lanes using the ultra fast lane detection model in Tensorflow Lite. So

Ibai Gorordo 12 Aug 27, 2022