Code for the paper TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks

Overview

TestRank in Pytorch

Code for the paper TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks by Yu Li, Min Li, Qiuxia Lai, Yannan Liu, and Qiang Xu.

If you find this repository useful for your work, please consider citing it as follows:

@article{yu2021testrank,
  title={TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks},
  author={Yu Li, Min Li, Qiuxia Lai, Yannan Liu, and Qiang Xu},
  journal={NeurIPS},
  year={2021}
}

1. Setup

Install dependencies

conda env create -f environment.yml

Please run the code on GPU.

2. Runing

There are mainly three steps involved:

  • Prepare the DL models to be tested
  • Prepare the unsupervised BYOL feature extractor
  • Launch a specific test input prioritization technique

We illustrate these steps as the following.

2.1. Download the Pre-trained DL model under test

Please download the classifiers to corresponding folder ./checkpoint/{dataset}/ckpt_bias/

If you want to train your own classifiers, please refer to the Training part.

2.2. Download the Feature extractor

We papare pretrained feature extractor for the each (e.g. CIFAR-10, SVHN, STL10) dataset. Please put the downloaded file in the "./ckpt_byol/" folder.

If you want to train your own classifiers, please refer to the Training part.

2.3. Perform Test Selection

Call the 'run.sh' file with argument 'selection':

  ./run.sh selection

Configure your run.sh follow the discription below

  python selection.py \
              --dataset $DATASET \                   # specify the dataset to use
              --manualSeed ${RANDOM_SEED} \          # random seed
              --model2test_arch $MODEL2TEST \        # architecture of the model under test (e.g. resnet18)
              --model2test_path $MODEL2TESTPATH \    # the path storing the model weights 
              --model_number $MODEL_NO \             # which model to test, model 0, 1, or 2?
              --save_path ${save_path} \             # The result will be stored in here
              --data_path ${DATA_ROOT} \             # Dataset root path
              --graph_nn \                           # use graph neural network in testrank
              --feature_extractor_id ${feature_extractor_id} \ # type of feature extractor, 0: BYOL model, 1: the model under test
              --no_neighbors ${no_neighbors} \       # number of neighbors in to constract graph
              --learn_mixed                          # use mlp to combine intrinsic and contextual attributes; otherwise they are brute force combined (multiplication two scores)
              --baseline_gini                        # Use certain baseline method to perform selection, otherwise leave it blank
  • The result is stored in '{save_path}/{date}/{dataset}_{model}/xxx_result.csv' in where xxx stands for the selection method used (e.g. for testrank, the file would be gnn_result.csv)

  • The TRC value is in the last column, and the forth column shows the corresponding budget in percent.

  • To compare with baselines, please specify the corresponding baseline method (e.g. baseline_gini, baseline_uncertainty, baseline_dsa, baseline_mcp):

  • To evaluate different models, change the MODEL_NO to the corresponding model: [0, 1, 2]

3. Training

3.1. Train classifier

If you want to train your own DL model instead of using the pretrained ones, run this command:

./run.sh trainm
  • The trained model will be stored in path './checkpoint/dataset/ckpt_bias/*'.

  • Each model will be assigned with a unique ID (e.g. 0, 1, 2).

  • The code used to train the model are resides in the train_classifier.py file. If you want to change the dataset or model architecture, please modify 'DATASET=dataset_name' or 'MODEL=name'with the desired ones in the run.sh file.

3.2 Train BYOL Feature Extractor

Please refer to this code.

4. Contact

If there are any questions, feel free to send a message to [email protected]

List of GSoC organisations with number of times they have been selected.

Welcome to GSoC Organisation Frequency And Details 👋 List of GSoC organisations with number of times they have been selected, techonologies, topics,

Shivam Kumar Jha 41 Oct 01, 2022
Count the frequency of letters or words in a text file and show a graph.

Word Counter By EBUS Coding Club Count the frequency of letters or words in a text file and show a graph. Requirements Python 3.9 or higher matplotlib

EBUS Coding Club 0 Apr 09, 2022
Arabic speech recognition, classification and text-to-speech.

klaam Arabic speech recognition, classification and text-to-speech using many advanced models like wave2vec and fastspeech2. This repository allows tr

ARBML 177 Dec 27, 2022
Pre-Training with Whole Word Masking for Chinese BERT

Pre-Training with Whole Word Masking for Chinese BERT

Yiming Cui 7.7k Dec 31, 2022
justCTF [*] 2020 challenges sources

justCTF [*] 2020 This repo contains sources for justCTF [*] 2020 challenges hosted by justCatTheFish. TLDR: Run a challenge with ./run.sh (requires Do

justCatTheFish 25 Dec 27, 2022
Perform sentiment analysis and keyword extraction on Craigslist listings

craiglist-helper synopsis Perform sentiment analysis and keyword extraction on Craigslist listings Background I love Craigslist. I've found most of my

Mark Musil 1 Nov 08, 2021
A Python module made to simplify the usage of Text To Speech and Speech Recognition.

Nav Module The solution for voice related stuff in Python Nav is a Python module which simplifies voice related stuff in Python. Just import the Modul

Snm Logic 1 Dec 20, 2021
MASS: Masked Sequence to Sequence Pre-training for Language Generation

MASS: Masked Sequence to Sequence Pre-training for Language Generation

Microsoft 1.1k Dec 17, 2022
基于GRU网络的句子判断程序/A program based on GRU network for judging sentences

SentencesJudger SentencesJudger 是一个基于GRU神经网络的句子判断程序,基本的功能是判断文章中的某一句话是否为一个优美的句子。 English 如何使用SentencesJudger 确认Python运行环境 安装pyTorch与LTP python3 -m pip

8 Mar 24, 2022
Extract rooms type, door, neibour rooms, rooms corners nad bounding boxes, and generate graph from rplan dataset

Housegan-data-reader House-GAN++ (data-reader) Code and instructions for converting rplan dataset (raster images) to housegan++ data format. House-GAN

Sepid Hosseini 13 Nov 24, 2022
Few-shot Natural Language Generation for Task-Oriented Dialog

Few-shot Natural Language Generation for Task-Oriented Dialog This repository contains the dataset, source code and trained model for the following pa

172 Dec 13, 2022
Python generation script for BitBirds

BitBirds generation script Intro This is published under MIT license, which means you can do whatever you want with it - entirely at your own risk. Pl

286 Dec 06, 2022
Fastseq 基于ONNXRUNTIME的文本生成加速框架

Fastseq 基于ONNXRUNTIME的文本生成加速框架

Jun Gao 9 Nov 09, 2021
profile tools for pytorch nn models

nnprof Introduction nnprof is a profile tool for pytorch neural networks. Features multi profile mode: nnprof support 4 profile mode: Layer level, Ope

Feng Wang 42 Jul 09, 2022
🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

In recent years, the dense retrievers based on pre-trained language models have achieved remarkable progress. To facilitate more developers using cutt

475 Jan 04, 2023
Weird Sort-and-Compress Thing

Weird Sort-and-Compress Thing A weird integer sorting + compression algorithm inspired by a conversation with Luthingx (it probably already exists by

Douglas 1 Jan 03, 2022
IndoBERTweet is the first large-scale pretrained model for Indonesian Twitter. Published at EMNLP 2021 (main conference)

IndoBERTweet 🐦 🇮🇩 1. Paper Fajri Koto, Jey Han Lau, and Timothy Baldwin. IndoBERTweet: A Pretrained Language Model for Indonesian Twitter with Effe

IndoLEM 40 Nov 30, 2022
Text preprocessing, representation and visualization from zero to hero.

Text preprocessing, representation and visualization from zero to hero. From zero to hero • Installation • Getting Started • Examples • API • FAQ • Co

Jonathan Besomi 2.7k Jan 08, 2023
多语言降噪预训练模型MBart的中文生成任务

mbart-chinese 基于mbart-large-cc25 的中文生成任务 Input source input: text + /s + lang_code target input: lang_code + text + /s Usage token_ids_mapping.jso

11 Sep 19, 2022
A raytrace framework using taichi language

ti-raytrace The code use Taichi programming language Current implement acceleration lvbh disney brdf How to run First config your anaconda workspace,

蕉太狼 73 Dec 11, 2022