Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Overview

Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Prerequisites

This repo is built upon a local copy of transformers==2.1.1. This repo has been tested on torch==1.4.0 with python 3.7 and CUDA 10.1.

To start, create a new environment and install:

conda create -n grad2task python=3.7
conda activate grad2task
cd Grad2Task
pip install -e .

We use wandb for logging. Please set it up following this doc and specify your project name on wandb in run_meta_training.sh:

export WANDB=[YOUR PROJECT NAME]

Download the dataset and unzip it under the main folder: https://drive.google.com/file/d/1uAdgZFYv9epk6tQVQ3SwboxFpSlkC_ZW/view?usp=sharing

If need to place it somewhere else, specify its path in path.sh.

Train & Evaluation

To train/evaluate models:

bash meta_learn.sh [MODEL_NAME] [MODE] [EXP_ID]

where [MODEL_NAME] refers to model name, [MODE] is experiment model and [EXP_ID] is an optional experiment id used for mark different runs using the same model. Options for [MODEL_NAM] and MODE are listed as follow:

[MODE] Description
train Training models.
test_best Test the model with the best validation performance.
test_latest Test the latest checkpoint.
test Test model without meta-training. Only applicable to the fine-tune-baseline model.
[MODEL_NAME] Description
fine-tune-baseline Fine-tuning BERT for each task separately.
bert-protonet-euc ProtoNet with BERT as encoder, using Euclidean distance as distance metric.
bert-protonet-euc-bn ProtoNet with BERT+Bottleneck Adapters as encoder, using Euclidean distance as distance metric.
bert-protonet ProtoNet with BERT as encoder, using cosine distance as distance metric.
bert-protonet-bn ProtoNet with BERT+Bottleneck Adapters as encoder, using cosine distance as distance metric.
bert-leopard Leopard with pretrained BERT [1].
bert-leopard-fixlr Leopard but with fixed learning rates.
bert-cnap-bn-euc-context-cls-shift-scale-ar Our proposed approach using gradients as task representation.
bert-cnap-bn-euc-context-cls-shift-scale-ar-X Our proposed approach using average input encoding as task representation.
bert-cnap-bn-euc-context-cls-shift-scale-ar-XGrad Our proposed approach using both gradients and input encoding as task representation.
bert-cnap-bn-euc-context-cls-shift-scale-ar-XY Our proposed approach using input and textual label encoding as task representation.
bert-cnap-bn-euc-context-shift-scale-ar Same with our proposed approach except adapting all tokens instead of just the [CLS] token as we do.
bert-cnap-bn-pretrained-taskemb Our proposed approach with pretrained task embedding model.
bert-cnap-bn-hyper A hypernetwork based approach.

To run a model with different hyperparameters, first name this run by [EXP_ID] and then specify the new hyperparameters in run/meta_learn.sh. For example, if one wants to run bert-protonet-euc with a smaller learning rate, they could modify run/meta_learn.sh as:

...
elif [ $1 == "bert-protonet-bn" ]; then # ProtoNet with cosince distance
    export LEARNING_RATE=2e-5
    export CHECKPOINT_FREQ=1000
    if [ ${EXP_ID} == *"lr1e-5" ]; then
        export LEARNING_RATE=1e-5
        export CHECKPOINT_FREQ=2000
        # modify other hyperparameters here
    fi
...

and then run:

bash meta_learn.sh bert-protonet-bn train lr1e-5

Reference

[1] T. Bansal, R. Jha, and A. McCallum. Learning to few-shot learn across diverse natural language classification tasks. In Proceedings of the 28th International Conference on Computational Linguistics, pages 5108–5123, 2020.

Owner
Jixuan Wang
Computer Science PhD student at University of Toronto. Research interests include deep learning and machine learning, and their applications in healthcare.
Jixuan Wang
Official codebase for ICLR oral paper Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling

CLIORA This is the official codebase for ICLR oral paper: Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling. We introduce

Bo Wan 32 Dec 23, 2022
Supplementary code for the AISTATS 2021 paper "Matern Gaussian Processes on Graphs".

Matern Gaussian Processes on Graphs This repo provides an extension for gpflow with Matérn kernels, inducing variables and trainable models implemente

41 Dec 17, 2022
LaneDetectionAndLaneKeeping - Lane Detection And Lane Keeping

LaneDetectionAndLaneKeeping This project is part of my bachelor's thesis. The go

5 Jun 27, 2022
Elevation Mapping on GPU.

Elevation Mapping cupy Overview This is a ros package of elevation mapping on GPU. Code are written in python and uses cupy for GPU calculation. * pla

Robotic Systems Lab - Legged Robotics at ETH Zürich 183 Dec 19, 2022
official code for dynamic convolution decomposition

Revisiting Dynamic Convolution via Matrix Decomposition (ICLR 2021) A pytorch implementation of DCD. If you use this code in your research please cons

Yunsheng Li 110 Nov 23, 2022
Repository providing a wide range of self-supervised pretrained models for computer vision tasks.

Hierarchical Pretraining: Research Repository This is a research repository for reproducing the results from the project "Self-supervised pretraining

Colorado Reed 53 Nov 09, 2022
Unsupervised Representation Learning by Invariance Propagation

Unsupervised Learning by Invariance Propagation This repository is the official implementation of Unsupervised Learning by Invariance Propagation. Pre

FengWang 15 Jul 06, 2022
A PyTorch re-implementation of Neural Radiance Fields

nerf-pytorch A PyTorch re-implementation Project | Video | Paper NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis Ben Mildenhall

Krishna Murthy 709 Jan 09, 2023
A project which aims to protect your privacy using inexpensive hardware and easily modifiable software

Protecting your privacy using an ESP32, an IR sensor and a python script This project, which I personally call the "never-gonna-catch-me-in-the-act-ev

8 Oct 10, 2022
Docker containers of baseline agents for the Crafter environment

Crafter Baselines This repository contains Docker containers for running various baselines on the Crafter environment. Reward Agents DreamerV2 based o

Danijar Hafner 17 Sep 25, 2022
An All-MLP solution for Vision, from Google AI

MLP Mixer - Pytorch An All-MLP solution for Vision, from Google AI, in Pytorch. No convolutions nor attention needed! Yannic Kilcher video Install $ p

Phil Wang 784 Jan 06, 2023
LRBoost is a scikit-learn compatible approach to performing linear residual based stacking/boosting.

LRBoost is a sckit-learn compatible package for linear residual boosting. LRBoost combines a linear estimator and a non-linear estimator to leverage t

Andrew Patton 5 Nov 23, 2022
Start-to-finish tutorial for interactive music co-creation in PyTorch and Tensorflow.js

Start-to-finish tutorial for interactive music co-creation in PyTorch and Tensorflow.js

Chris Donahue 98 Dec 14, 2022
UV matrix decompostion using movielens dataset

UV-matrix-decompostion-with-kfold UV matrix decompostion using movielens dataset upload the 'ratings.dat' file install the following python libraries

2 Oct 18, 2022
Official implementation of "Membership Inference Attacks Against Self-supervised Speech Models"

Introduction Official implementation of "Membership Inference Attacks Against Self-supervised Speech Models". In this work, we demonstrate that existi

Wei-Cheng Tseng 7 Nov 01, 2022
Efficient-GlobalPointer - Pytorch Efficient GlobalPointer

引言 感谢苏神带来的模型,原文地址:https://spaces.ac.cn/archives/8877 如何运行 对应模型EfficientGlobalPoi

powerycy 40 Dec 14, 2022
Learning Representational Invariances for Data-Efficient Action Recognition

Learning Representational Invariances for Data-Efficient Action Recognition Official PyTorch implementation for Learning Representational Invariances

Virginia Tech Vision and Learning Lab 27 Nov 22, 2022
Augmented CLIP - Training simple models to predict CLIP image embeddings from text embeddings, and vice versa.

Train aug_clip against laion400m-embeddings found here: https://laion.ai/laion-400-open-dataset/ - note that this used the base ViT-B/32 CLIP model. S

Peter Baylies 55 Sep 13, 2022
IA for recognising Traffic Signs using Keras [Tensorflow]

Traffic Signs Recognition ⚠️ 🚦 Fundamentals of Intelligent Systems Introduction 📄 Development of a neural network capable of recognizing nine differ

Sebastián Fernández García 2 Dec 19, 2022
Dynamic vae - Dynamic VAE algorithm is used for anomaly detection of battery data

Dynamic VAE frame Automatic feature extraction can be achieved by probability di

10 Oct 07, 2022