Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Last update: Sep 28, 2022

Related tags

Overview

Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Prerequisites

This repo is built upon a local copy of transformers==2.1.1. This repo has been tested on torch==1.4.0 with python 3.7 and CUDA 10.1.

To start, create a new environment and install:

conda create -n grad2task python=3.7
conda activate grad2task
cd Grad2Task
pip install -e .

We use wandb for logging. Please set it up following this doc and specify your project name on wandb in run_meta_training.sh:

export WANDB=[YOUR PROJECT NAME]

Download the dataset and unzip it under the main folder: https://drive.google.com/file/d/1uAdgZFYv9epk6tQVQ3SwboxFpSlkC_ZW/view?usp=sharing

If need to place it somewhere else, specify its path in path.sh.

Train & Evaluation

To train/evaluate models:

bash meta_learn.sh [MODEL_NAME] [MODE] [EXP_ID]

where [MODEL_NAME] refers to model name, [MODE] is experiment model and [EXP_ID] is an optional experiment id used for mark different runs using the same model. Options for [MODEL_NAM] and MODE are listed as follow:

`[MODE]`	Description
train	Training models.
test_best	Test the model with the best validation performance.
test_latest	Test the latest checkpoint.
test	Test model without meta-training. Only applicable to the `fine-tune-baseline` model.

`[MODEL_NAME]`	Description
fine-tune-baseline	Fine-tuning BERT for each task separately.
bert-protonet-euc	ProtoNet with BERT as encoder, using Euclidean distance as distance metric.
bert-protonet-euc-bn	ProtoNet with BERT+Bottleneck Adapters as encoder, using Euclidean distance as distance metric.
bert-protonet	ProtoNet with BERT as encoder, using cosine distance as distance metric.
bert-protonet-bn	ProtoNet with BERT+Bottleneck Adapters as encoder, using cosine distance as distance metric.
bert-leopard	Leopard with pretrained BERT [1].
bert-leopard-fixlr	Leopard but with fixed learning rates.
bert-cnap-bn-euc-context-cls-shift-scale-ar	Our proposed approach using gradients as task representation.
bert-cnap-bn-euc-context-cls-shift-scale-ar-X	Our proposed approach using average input encoding as task representation.
bert-cnap-bn-euc-context-cls-shift-scale-ar-XGrad	Our proposed approach using both gradients and input encoding as task representation.
bert-cnap-bn-euc-context-cls-shift-scale-ar-XY	Our proposed approach using input and textual label encoding as task representation.
bert-cnap-bn-euc-context-shift-scale-ar	Same with our proposed approach except adapting all tokens instead of just the [CLS] token as we do.
bert-cnap-bn-pretrained-taskemb	Our proposed approach with pretrained task embedding model.
bert-cnap-bn-hyper	A hypernetwork based approach.

To run a model with different hyperparameters, first name this run by [EXP_ID] and then specify the new hyperparameters in run/meta_learn.sh. For example, if one wants to run bert-protonet-euc with a smaller learning rate, they could modify run/meta_learn.sh as:

...
elif [ $1 == "bert-protonet-bn" ]; then # ProtoNet with cosince distance
    export LEARNING_RATE=2e-5
    export CHECKPOINT_FREQ=1000
    if [ ${EXP_ID} == *"lr1e-5" ]; then
        export LEARNING_RATE=1e-5
        export CHECKPOINT_FREQ=2000
        # modify other hyperparameters here
    fi
...

and then run:

bash meta_learn.sh bert-protonet-bn train lr1e-5

Reference

[1] T. Bansal, R. Jha, and A. McCallum. Learning to few-shot learn across diverse natural language classification tasks. In Proceedings of the 28th International Conference on Computational Linguistics, pages 5108–5123, 2020.

Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Related tags

Overview

Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Prerequisites

Train & Evaluation

Reference

Owner

Jixuan Wang

Official codebase for ICLR oral paper Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling

Supplementary code for the AISTATS 2021 paper "Matern Gaussian Processes on Graphs".

LaneDetectionAndLaneKeeping - Lane Detection And Lane Keeping

Elevation Mapping on GPU.

official code for dynamic convolution decomposition

Repository providing a wide range of self-supervised pretrained models for computer vision tasks.

Unsupervised Representation Learning by Invariance Propagation

A PyTorch re-implementation of Neural Radiance Fields

A project which aims to protect your privacy using inexpensive hardware and easily modifiable software

Docker containers of baseline agents for the Crafter environment

An All-MLP solution for Vision, from Google AI

LRBoost is a scikit-learn compatible approach to performing linear residual based stacking/boosting.

Start-to-finish tutorial for interactive music co-creation in PyTorch and Tensorflow.js

UV matrix decompostion using movielens dataset

Official implementation of "Membership Inference Attacks Against Self-supervised Speech Models"

Efficient-GlobalPointer - Pytorch Efficient GlobalPointer

Learning Representational Invariances for Data-Efficient Action Recognition

Augmented CLIP - Training simple models to predict CLIP image embeddings from text embeddings, and vice versa.

IA for recognising Traffic Signs using Keras [Tensorflow]

Dynamic vae - Dynamic VAE algorithm is used for anomaly detection of battery data