A framework to train language models to learn invariant representations.

Overview

Invariant Language Modeling

Implementation of the training for invariant language models.

Motivation

Modern pretrained language models are critical components of NLP pipelines. Yet, they suffer from spurious correlations, poor out-of-domain generalization, and biases. Inspired by recent progress in causal machine learning, we propose invariant language modeling, a framework to learn invariant representations that should generalize across training environments. In particular, we adapt IRM-games to language models, where the invariance emerges from a specific training schedule in which environments compete to optimize their environment-specific loss by updating subsets of the model in a round-robin fashion.

Model Description

The data is assumed to come as n distinct environments and we aim to learn a language model that focusing on correlations that generalize across environments.

The model is decomposed into two components:

  • ϕ the main body of the transformer language model,
  • w the language modeling head that predicts the missing token.

In our implementation, there are now as many heads as environments: n. For each data point, all heads make their predictions and they are averaged. However, during training we sample one batch from each environment in a round-robin fashion. When seeing a batch from environment e only the head w_e and the main body ϕ receive a batch update.

Usage

To get started with the code:

pip install -r requirements.txt

PyTorch with a CUDA installation is required to run this framework. Please find all useful installation information here

Then, to continue the training of a language model from a huggingface checkpoint:

python3 run_invariant_mlm.py \
    --model_name_or_path roberta-base \
    --validation_file data-folder/validation_file.txt \
    --do_train \
    --do_eval \
    --nb_steps 5000 \
    --learning_rate 1e-5 \
    --output_dir folder-to-save-model \
    --seed 123 \
    --train_file data-folder/training-environments \
    --overwrite_cache

Currently, the supported base models are:

Implementation

To train language models according to the IRM-games, one needs to modify:

  • the training schedule to perform batch updates according to each environment in a round-robin fashion. This logic is implemented by the InvariantTrainer in invariant_trainer.py', a class inherited from the Trainer` from huggingface.
  • the language modeling heads in the model. It needs one head per environment. This is done by creating variations of the base model classes. It is implemented in invariant_roberta.py for roberta and in invariant_distilbert.py for distilbert.

Contact

Maxime Peyrard, [email protected]

OpenMMLab Text Detection, Recognition and Understanding Toolbox

Introduction English | 简体中文 MMOCR is an open-source toolbox based on PyTorch and mmdetection for text detection, text recognition, and the correspondi

OpenMMLab 3k Jan 07, 2023
A set of tools for creating and testing machine learning features, with a scikit-learn compatible API

Feature Forge This library provides a set of tools that can be useful in many machine learning applications (classification, clustering, regression, e

Machinalis 380 Nov 05, 2022
This repository is the official implementation of the Hybrid Self-Attention NEAT algorithm.

This repository is the official implementation of the Hybrid Self-Attention NEAT algorithm. It contains the code to reproduce the results presented in the original paper: https://arxiv.org/abs/2112.0

Saman Khamesian 6 Dec 13, 2022
Nested Graph Neural Network (NGNN) is a general framework to improve a base GNN's expressive power and performance

Nested Graph Neural Networks About Nested Graph Neural Network (NGNN) is a general framework to improve a base GNN's expressive power and performance.

Muhan Zhang 38 Jan 05, 2023
DrNAS: Dirichlet Neural Architecture Search

This paper proposes a novel differentiable architecture search method by formulating it into a distribution learning problem. We treat the continuously relaxed architecture mixing weight as random va

Xiangning Chen 37 Jan 03, 2023
DAT4 - General Assembly's Data Science course in Washington, DC

DAT4 Course Repository Course materials for General Assembly's Data Science course in Washington, DC (12/15/14 - 3/16/15). Instructors: Sinan Ozdemir

Kevin Markham 779 Dec 25, 2022
DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)

DPT This repo is the official implementation of DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021). We provide code and model

CASIA-IVA-Lab 111 Dec 21, 2022
This repository contains implementations and illustrative code to accompany DeepMind publications

DeepMind Research This repository contains implementations and illustrative code to accompany DeepMind publications. Along with publishing papers to a

DeepMind 11.3k Dec 31, 2022
SCU OlympicsRunning Baseline

Competition 1v1 running Environment check details in Jidi Competition RLChina2021智能体竞赛 做出的修改: 奖励重塑:修改了环境,重新设置了奖励的分配,使得奖励组成不只有零和博弈,还有探索环境的奖励。 算法微调:修改了官

ZiSeoi Wong 2 Nov 23, 2021
MonoRCNN is a monocular 3D object detection method for automonous driving

MonoRCNN MonoRCNN is a monocular 3D object detection method for automonous driving, published at ICCV 2021. This project is an implementation of MonoR

87 Dec 27, 2022
Research on controller area network Intrusion Detection Systems

Group members information Member 1: Lixue Liang Member 2: Yuet Lee Chan Member 3: Xinruo Zhang Member 4: Yifei Han User Manual Generate Attack Packets

Roche 4 Aug 30, 2022
Faster Convex Lipschitz Regression

Faster Convex Lipschitz Regression This reepository provides a python implementation of our Faster Convex Lipschitz Regression algorithm with GPU and

Ali Siahkamari 0 Nov 19, 2021
A way to store images in YAML.

YAMLImg A way to store images in YAML. I made this after seeing Roadcrosser's JSON-G because it was too inspiring to ignore this opportunity. Installa

5 Mar 14, 2022
Recommendation algorithms for large graphs

Fast recommendation algorithms for large graphs based on link analysis. License: Apache Software License Author: Emmanouil (Manios) Krasanakis Depende

Multimedia Knowledge and Social Analytics Lab 27 Jan 07, 2023
Artificial Intelligence playing minesweeper 🤖

AI playing Minesweeper ✨ Minesweeper is a single-player puzzle video game. The objective of the game is to clear a rectangular board containing hidden

Vaibhaw 8 Oct 17, 2022
Western-3DSlicer-Modules - Point-Set Registrations for Ultrasound Probe Calibrations

Point-Set Registrations for Ultrasound Probe Calibrations -Undergraduate Thesis-

Matteo Tanzi 0 May 04, 2022
Code base of object detection

rmdet code base of object detection. 环境安装: 1. 安装conda python环境 - `conda create -n xxx python=3.7/3.8` - `conda activate xxx` 2. 运行脚本,自动安装pytorch1

3 Mar 08, 2022
Code for CVPR2021 paper "Robust Reflection Removal with Reflection-free Flash-only Cues"

Robust Reflection Removal with Reflection-free Flash-only Cues (RFC) Paper | To be released: Project Page | Video | Data Tensorflow implementation for

Chenyang LEI 162 Jan 05, 2023
Aligning Latent and Image Spaces to Connect the Unconnectable

About This repo contains the official implementation of the Aligning Latent and Image Spaces to Connect the Unconnectable paper. It is a GAN model whi

Ivan Skorokhodov 203 Jan 03, 2023
Repository for the paper "From global to local MDI variable importances for random forests and when they are Shapley values"

From global to local MDI variable importances for random forests and when they are Shapley values Antonio Sutera ( Antonio Sutera 3 Feb 23, 2022