Code for MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks

Overview

MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks

This is the code for the paper:

MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels
Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia Li, Li Fei-Fei
Presented at ICML 2018

Please note that this is not an officially supported Google product.

If you find this code useful in your research then please cite

@inproceedings{jiang2018mentornet,
  title={MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels},
  author={Jiang, Lu and Zhou, Zhengyuan and Leung, Thomas and Li, Li-Jia and Fei-Fei, Li},
  booktitle={ICML},
  year={2018}
}

Introduction

We are interested in training a deep network using curriculum learning (Bengio et al., 2009), i.e. learning examples with focus. Each curriculum is implemented as a network (called MentorNet).

  • During training, MentorNet supervises the training of the base network (called StudentNet).
  • At the test time, StudentNet makes prediction alone without MentorNet.

Training Overview

Setups

All code was developed and tested on Nvidia V100/P100 (16GB) the following environment.

  • Ubuntu 18.04
  • Python 2.7.15
  • TensorFlow 1.8.0
  • numpy 1.13.3
  • imageio 2.3.0

Download Cloud SDK to get data and models. Next we need to download the dataset and pre-trained MentorNet models. Put them into the same directory as the code directory.

gsutil -m cp -r gs://mentornet_project/data .
gsutil -m cp -r gs://mentornet_project/mentornet_models .

Alternatively, you may download the zip files: data and models.

Running MentorNet on CIFAR

export PYTHONPATH="$PYTHONPATH:$PWD/code/"

python code/cifar_train_mentornet.py \
  --dataset_name=cifar10   \
  --trained_mentornet_dir=mentornet_models/models/mentornet_pd1_g_1/mentornet_pd \
  --loss_p_precentile=0.75  \
  --nofixed_epoch_after_burn_in  \
  --burn_in_epoch=0  \
  --example_dropout_rates="0.5,17,0.05,83" \
  --data_dir=data/cifar10/0.2 \
  --train_log_dir=cifar_models/cifar10/resnet/0.2/mentornet_pd1_g_1/train \
  --studentnet=resnet101 \
  --max_number_of_steps=39000

A full list of commands can be found in this file. The training script has a number of command-line flags that you can use to configure the model architecture, hyperparameters, and input / output settings:

  • --trained_mentornet_dir: Directory where to find the trained MentorNet model, created by mentornet_learning/train.py.
  • --loss_p_percentile: p-percentile used to compute the loss moving average. Default is 0.7.
  • --burn_in_epoch: Number of first epochs to perform burn-in. In the burn-in period, every sample has a fixed 1.0 weight. Default is 0.
  • --fixed_epoch_after_burn_in: Whether to use the fixed epoch as the MentorNet input feature after the burn-in period. Set True for MentorNet DD. Default is False.
  • --loss_moving_average_decay: Decay factor used in moving average. Default is 0.5.
  • --example_dropout_rates: Comma-separated list indicating the example drop-out rate for the total of 100 epochs. The format is [dropout rate, epoch_num]+, the piecewise drop-out rate from boundaries and values. The sum of epoch_num is 100. Drop-out means the probability of setting sample weights to zeros proposed (Liang et al., 2016). Default is 0.5, 17, 0.05, 78, 1.0, 5.

To evaluate a model, run the evaluation job in parallel with the training job (on a different GPU).

python cifar/cifar_eval.py \
 --dataset_name=cifar10 \
 --data_dir=cifar/data/cifar10/val/ \
 --checkpoint_dir=cifar_models/cifar10/resnet/0.2/mentornet_pd1_g_1/train \
 --eval_dir=cifar_models/cifar10/resnet/0.2/mentornet_pd1_g_1//eval_val \
 --studentnet=resnet101 \
 --device_id=1

A complete list of commands of running experiments can be found at commands/train_studentnet_resnet.sh and commands/train_studentnet_inception.sh.

MentorNet Framework

MentorNet is a general framework for curriculum learning, where various curriculums can be learned by the same MentorNet structure of different parameters.

It is flexible as we can switch curriculums by attaching different MentorNets without modifying the pipeline.

We train a few MentorNets listed below. We can think of a MentorNet as a hyper-parameter and will be tuned for different problems.

Curriculum Visualization Intuition Model Name
No curriculum image Assign uniform weight to every sample uniform. baseline_mentornet
Self-paced
(Kuma et al. 2010)
image Favor samples of smaller loss. self_paced_mentornet
SPCL linear
(Jiang et al. 2015)
image Discount the weight by loss linearly. spcl_linear_mentornet
Hard example mining
(Felzenszwalb et al., 2008)
image Favor samples of greater loss. hard_example_mining_mentornet
Focal loss
(Lin et al., 2017)
image Increase the weight by loss by the exponential CDF. focal_loss_mentornet
Predefined Mixture image Mixture of SPL and SPCL changing by epoch. mentornet_pd
MentorNet Data-driven image Learned on a small subset of the CIFAR data. mentornet_dd

Note there are many more curriculums can be trained by MentorNet, for example, prediction variance (Chang et al., 2017), implicit regularizer (Fan et al. 2017), self-paced with diversity (Jiang et al. 2014), sample re-weighting (Dehghani et al., 2018, Ren et al., 2018), etc.

Performance

The numbers are slightly different from the ones reported in the paper due to the re-implementation on the third party library.

CIFAR-10 ResNet

noise_fraction baseline self_paced focal_loss mentornet_pd mentornet_dd
0.2 0.796 0.822 0.797 0.910 0.914
0.4 0.568 0.802 0.634 0.776 0.887
0.8 0.238 0.297 0.25 0.283 0.463

CIFAR-100 ResNet

noise_fraction baseline self_paced focal_loss mentornet_pd mentornet_dd
0.2 0.624 0.652 0.613 0.733 0.726
0.4 0.448 0.509 0.467 0.567 0.675
0.8 0.084 0.089 0.079 0.193 0.301

CIFAR-10 Inception

noise_fraction baseline self_paced focal_loss mentornet_pd mentornet_dd
0.2 0.775 0.784 0.747 0.798 0.800
0.4 0.72 0.733 0.695 0.731 0.763
0.8 0.29 0.272 0.309 0.312 0.461

CIFAR-100 Inception

noise_fraction baseline self_paced focal_loss mentornet_pd mentornet_dd
0.2 0.42 0.408 0.391 0.451 0.466
0.4 0.346 0.32 0.313 0.386 0.411
0.8 0.108 0.091 0.107 0.125 0.203

Algorithm

We propose an algorithm to optimize the StudentNet model parameter w jointly with a

given MentorNet. Unlike the alternating minimization, it minimizes w (StudentNet parameter) and v (sample weight) stochastically over mini-batches.

The curriculum can change during training, and MentorNet is updated a few times in the algorithm.

Algorithm

To learn new curriculums (Step 6), see this page.

We found specific MentorNet architectures do not matter that much.

References

  • Bengio, Yoshua, et al. "Curriculum learning". In ICML, 2009.
  • Kumar M. Pawan, Packer Benjamin, and Koller Daphne "Self-paced learning for latent variable models". In NIPS, 2010.
  • Jiang, Lu et al. "Self-paced Learning with Diversity", In NIPS 2014
  • Jiang, Lu, et al. "Self-Paced Curriculum Learning." In AAAI. 2015.
  • Liang, Junwei et al. Learning to Detect Concepts from Webly-Labeled Video Data, In IJCAI 2016.
  • Lin, Tsung-Yi, et al. "Focal loss for dense object detection." In ICCV. 2017.
  • Fan, Yanbo, et al. "Self-Paced Learning: an Implicit Regularization Perspective." In AAAI 2017.
  • Felzenszwalb, Pedro, et al. "A discriminatively trained, multiscale, deformable part model." In CVPR 2008.
  • Dehghani, Mostafa, et al. "Fidelity-Weighted Learning." In ICLR 2018.
  • Ren, Mengye, et al. "Learning to reweight examples for robust deep learning." In ICML 2018.
  • Fan, Yang, et al. "Learning to Teach." In ICLR 2018.
  • Chang, Haw-Shiuan, et al. "Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples." In NIPS 2017.
Owner
Google
Google ❤️ Open Source
Google
This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors" ([email protected])

GP-VAE This repository provides datasets and code for preprocessing, training and testing models for the paper: Diverse Text Generation via Variationa

Wanyu Du 18 Dec 29, 2022
Exploring Relational Context for Multi-Task Dense Prediction [ICCV 2021]

Adaptive Task-Relational Context (ATRC) This repository provides source code for the ICCV 2021 paper Exploring Relational Context for Multi-Task Dense

David Brüggemann 35 Dec 05, 2022
Tackling the Class Imbalance Problem of Deep Learning Based Head and Neck Organ Segmentation

Info This is the code repository of the work Tackling the Class Imbalance Problem of Deep Learning Based Head and Neck Organ Segmentation from Elias T

2 Apr 20, 2022
Flower - A Friendly Federated Learning Framework

Flower - A Friendly Federated Learning Framework Flower (flwr) is a framework for building federated learning systems. The design of Flower is based o

Adap 1.8k Jan 01, 2023
Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

mtomo Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation.

Katsuya Hyodo 24 Mar 02, 2022
Decensoring Hentai with Deep Neural Networks. Formerly named DeepMindBreak.

DeepCreamPy Decensoring Hentai with Deep Neural Networks. Formerly named DeepMindBreak. A deep learning-based tool to automatically replace censored a

616 Jan 06, 2023
DeepDiffusion: Unsupervised Learning of Retrieval-adapted Representations via Diffusion-based Ranking on Latent Feature Manifold

DeepDiffusion Introduction This repository provides the code of the DeepDiffusion algorithm for unsupervised learning of retrieval-adapted representat

4 Nov 15, 2022
Code for the paper "JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design"

JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design This repository contains code for the paper: JA

Aspuru-Guzik group repo 55 Nov 29, 2022
EGNN - Implementation of E(n)-Equivariant Graph Neural Networks, in Pytorch

EGNN - Pytorch Implementation of E(n)-Equivariant Graph Neural Networks, in Pytorch. May be eventually used for Alphafold2 replication. This

Phil Wang 259 Jan 04, 2023
Tom-the-AI - A compound artificial intelligence software for Linux systems.

Tom the AI (version 0.82) WARNING: This software is not yet ready to use, I'm still setting up the GitHub repository. Should be ready in a few days. T

2 Apr 28, 2022
This Jupyter notebook shows one way to implement a simple first-order low-pass filter on sampled data in discrete time.

How to Implement a First-Order Low-Pass Filter in Discrete Time We often teach or learn about filters in continuous time, but then need to implement t

Joshua Marshall 4 Aug 24, 2022
Bytedance Inc. 2.5k Jan 06, 2023
Supplemental Code for "ImpressionNet :A Multi view Approach to Predict Socio Facial Impressions"

Supplemental Code for "ImpressionNet :A Multi view Approach to Predict Socio Facial Impressions" Environment requirement This code is based on Python

Rohan Kumar Gupta 1 Dec 19, 2021
The devkit of the nuPlan dataset.

The devkit of the nuPlan dataset.

Motional 264 Jan 03, 2023
git《Tangent Space Backpropogation for 3D Transformation Groups》(CVPR 2021) GitHub:1]

LieTorch: Tangent Space Backpropagation Introduction The LieTorch library generalizes PyTorch to 3D transformation groups. Just as torch.Tensor is a m

Princeton Vision & Learning Lab 482 Jan 06, 2023
Source Code For Template-Based Named Entity Recognition Using BART

Template-Based NER Source Code For Template-Based Named Entity Recognition Using BART Training Training train.py Inference inference.py Corpus ATIS (h

174 Dec 19, 2022
NDE: Climate Modeling with Neural Diffusion Equation, ICDM'21

Climate Modeling with Neural Diffusion Equation Introduction This is the repository of our accepted ICDM 2021 paper "Climate Modeling with Neural Diff

Jeehyun Hwang 5 Dec 18, 2022
Pytorch implementation of set transformer

set_transformer Official PyTorch implementation of the paper Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks .

Juho Lee 410 Jan 06, 2023
MicRank is a Learning to Rank neural channel selection framework where a DNN is trained to rank microphone channels.

MicRank: Learning to Rank Microphones for Distant Speech Recognition Application Scenario Many applications nowadays envision the presence of multiple

Samuele Cornell 20 Nov 10, 2022
[ICML 2020] DrRepair: Learning to Repair Programs from Error Messages

DrRepair: Learning to Repair Programs from Error Messages This repo provides the source code & data of our paper: Graph-based, Self-Supervised Program

Michihiro Yasunaga 155 Jan 08, 2023