Supervised Contrastive Learning for Downstream Optimized Sequence Representations

Overview

PyPI license arXiv

SupCL-Seq 📖

Supervised Contrastive Learning for Downstream Optimized Sequence representations (SupCS-Seq) accepted to be published in EMNLP 2021, extends the supervised contrastive learning from computer vision to the optimization of sequence representations in NLP. By altering the dropout mask probability in standard Transformer architectures (e.g. BERT_base), for every representation (anchor), we generate augmented altered views. A supervised contrastive loss is then utilized to maximize the system’s capability of pulling together similar samples (e.g. anchors and their altered views) and pushing apart the samples belonging to the other classes. Despite its simplicity, SupCL-Seq leads to large gains in many sequence classification tasks on the GLUE benchmark compared to a standard BERT_base, including 6% absolute improvement on CoLA, 5.4% on MRPC, 4.7% on RTE and 2.6% on STS-B.

This package can be easily run on almost all of the transformer models in Huggingface 🤗 that contain an encoder including but not limited to:

  1. ALBERT
  2. BERT
  3. BigBird
  4. RoBerta
  5. ERNIE
  6. And many more models!

SupCL-Seq

Table of Contents

GLUE Benchmark BERT SupCL-SEQ

Installation

Usage

Run on GLUE

How to Cite

References

GLUE Benchmark BERT SupCL-SEQ

The table below reports the improvements over naive finetuning of BERT model on GLUE benchmark. We employed [CLS] token during training and expect that using the mean would further improve these results.

Glue

Installation

  1. First you need to install one of, or both, TensorFlow 2.0 and PyTorch. Please refer to TensorFlow installation page, PyTorch installation page and/or Flax installation page regarding the specific install command for your platform.

  2. Second step:

$ pip install SupCL-Seq

Usage

The package builds on the trainer from Huggingface 🤗 . Therefore, its use is exactly similar to trainer. The pipeline works as follows:

  1. First employ supervised contrastive learning to constratively optimize sentence embeddings using your annotated data.
from SupCL_Seq import SupCsTrainer

SupCL_trainer = SupCsTrainer.SupCsTrainer(
            w_drop_out=[0.0,0.05,0.2],      # Number of views and their associated mask drop-out probabilities [Optional]
            temperature= 0.05,              # Temeprature for the contrastive loss function [Optional]
            def_drop_out=0.1,               # Default drop out of the transformer, this is usually 0.1 [Optional]
            pooling_strategy='mean',        # Strategy used to extract embeddings can be from `mean` or `pooling` [Optional]
            model = model,                  # model
            args = CL_args,                 # Arguments from `TrainingArguments` [Optional]
            train_dataset=train_dataset,    # Train dataloader
            tokenizer=tokenizer,            # Tokenizer
            compute_metrics=compute_metrics # If you need a customized evaluation [Optional]
        )
  1. After contrastive training:

    2.1 Add a linear classification layer to your model

    2.2 Freeze the base layer

    2.3 Finetune the linear layer on your annotated data

For detailed implementation see glue.ipynb

Run on GLUE

In order to evaluate the method on GLUE benchmark please see the glue.ipynb

How to Cite

@misc{sedghamiz2021supclseq,
      title={SupCL-Seq: Supervised Contrastive Learning for Downstream Optimized Sequence Representations}, 
      author={Hooman Sedghamiz and Shivam Raval and Enrico Santus and Tuka Alhanai and Mohammad Ghassemi},
      year={2021},
      eprint={2109.07424},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

References

[1] Supervised Contrastive Learning

[2] SimCSE: Simple Contrastive Learning of Sentence Embeddings

Owner
Hooman Sedghamiz
Data Science Lead interested in ML/AI and algorithm development for healthcare challenges.
Hooman Sedghamiz
Code repository for the paper "Doubly-Trained Adversarial Data Augmentation for Neural Machine Translation" with instructions to reproduce the results.

Doubly Trained Neural Machine Translation System for Adversarial Attack and Data Augmentation Languages Experimented: Data Overview: Source Target Tra

Steven Tan 1 Aug 18, 2022
TorchGeo is a PyTorch domain library, similar to torchvision, that provides datasets, transforms, samplers, and pre-trained models specific to geospatial data.

TorchGeo is a PyTorch domain library, similar to torchvision, that provides datasets, transforms, samplers, and pre-trained models specific to geospatial data.

Microsoft 1.3k Dec 30, 2022
Apache Flink

Apache Flink Apache Flink is an open source stream processing framework with powerful stream- and batch-processing capabilities. Learn more about Flin

The Apache Software Foundation 20.4k Dec 30, 2022
Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data - Official PyTorch Implementation (CVPR 2022)

Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data (CVPR 2022) Potentials of primitive shapes f

31 Sep 27, 2022
Training Very Deep Neural Networks Without Skip-Connections

DiracNets v2 update (January 2018): The code was updated for DiracNets-v2 in which we removed NCReLU by adding per-channel a and b multipliers without

Sergey Zagoruyko 585 Oct 12, 2022
Boston House Prediction Valuation Tool

Boston-House-Prediction-Valuation-Tool From Below Anlaysis The Valuation Tool is Designed Correlation Matrix Regrssion Analysis Between Target Vs Pred

0 Sep 09, 2022
Code for "Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation" ICCV'21

Skeletal-GNN Code for "Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation" ICCV'21 Various deep learning techniques have been propose

37 Oct 23, 2022
Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)

T-Zero This repository serves primarily as codebase and instructions for training, evaluation and inference of T0. T0 is the model developed in Multit

BigScience Workshop 253 Dec 27, 2022
The original weights of some Caffe models, ported to PyTorch.

pytorch-caffe-models This repo contains the original weights of some Caffe models, ported to PyTorch. Currently there are: GoogLeNet (Going Deeper wit

Katherine Crowson 9 Nov 04, 2022
K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce (EMNLP Founding 2021)

Introduction K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce. Installation PyTor

Xu Song 21 Nov 16, 2022
Python script to download the celebA-HQ dataset from google drive

download-celebA-HQ Python script to download and create the celebA-HQ dataset. WARNING from the author. I believe this script is broken since a few mo

133 Dec 21, 2022
This repository contains the official code of the paper Equivariant Subgraph Aggregation Networks (ICLR 2022)

Equivariant Subgraph Aggregation Networks (ESAN) This repository contains the official code of the paper Equivariant Subgraph Aggregation Networks (IC

Beatrice Bevilacqua 59 Dec 13, 2022
Official PyTorch Implementation of HELP: Hardware-adaptive Efficient Latency Prediction for NAS via Meta-Learning (NeurIPS 2021 Spotlight)

[NeurIPS 2021 Spotlight] HELP: Hardware-adaptive Efficient Latency Prediction for NAS via Meta-Learning [Paper] This is Official PyTorch implementatio

42 Nov 01, 2022
scalingscattering

Scaling The Scattering Transform : Deep Hybrid Networks This repository contains the experiments found in the paper: https://arxiv.org/abs/1703.08961

Edouard Oyallon 78 Dec 21, 2022
🦕 NanoSaur is a little tracked robot ROS2 enabled, made for an NVIDIA Jetson Nano

🦕 nanosaur NanoSaur is a little tracked robot ROS2 enabled, made for an NVIDIA Jetson Nano Website: nanosaur.ai Do you need an help? Discord For tech

NanoSaur 162 Dec 09, 2022
In this project we predict the forest cover type using the cartographic variables in the training/test datasets.

Kaggle Competition: Forest Cover Type Prediction In this project we predict the forest cover type (the predominant kind of tree cover) using the carto

Marianne Joy Leano 1 Mar 15, 2022
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Segmentation Transformer Implementation of Segmentation Transformer in PyTorch, a new model to achieve SOTA in semantic segmentation while using trans

Abhay Gupta 161 Dec 08, 2022
Implementation of the GVP-Transformer, which was used in the paper "Learning inverse folding from millions of predicted structures" for de novo protein design alongside Alphafold2

GVP Transformer (wip) Implementation of the GVP-Transformer, which was used in the paper Learning inverse folding from millions of predicted structure

Phil Wang 19 May 06, 2022
Synthetic Humans for Action Recognition, IJCV 2021

SURREACT: Synthetic Humans for Action Recognition from Unseen Viewpoints Gül Varol, Ivan Laptev and Cordelia Schmid, Andrew Zisserman, Synthetic Human

Gul Varol 59 Dec 14, 2022
Space robot - (Course Project) Using the space robot to capture the target satellite that is disabled and spinning, then stabilize and fix it up

Space robot - (Course Project) Using the space robot to capture the target satellite that is disabled and spinning, then stabilize and fix it up

Mingrui Yu 3 Jan 07, 2022