Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance

Overview

Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance

Abstract: Models for natural language understanding (NLU) tasks often rely on the idiosyncratic biases of the dataset, which make them brittle against test cases outside the training distribution. Recently, several proposed debiasing methods are shown to be very effective in improving out-of-distribution performance. However, their improvements come at the expense of performance drop when models are evaluated on the in-distribution data, which contain examples with higher diversity. This seemingly inevitable trade-off may not tell us much about the changes in the reasoning and understanding capabilities of the resulting models on broader types of examples beyond the small subset represented in the out-of-distribution data. In this paper, we address this trade-off by introducing a novel debiasing method, called confidence regularization, which discourage models from exploiting biases while enabling them to receive enough incentive to learn from all the training examples. We evaluate our method on three NLU tasks and show that, in contrast to its predecessors, it improves the performance on out-of-distribution datasets (e.g., 7pp gain on HANS dataset) while maintaining the original in-distribution accuracy.

The repository contains the code to reproduce our work in debiasing NLU models without in-distribution degradation. We provide 2 runs of experiment that are shown in our paper:

  1. Debias MNLI model from syntactic bias and evaluate on HANS as the out-of-distribution data.
  2. Debias MNLI model from hypothesis-only bias and evaluate on MNLI-hard sets as the out-of-distribution data.

Requirements

The code requires python >= 3.6 and pytorch >= 1.1.0.

Additional required dependencies can be found in requirements.txt. Install all requirements by running:

pip install -r requirements.txt

Data

Our experiments use MNLI dataset version provided by GLUE benchmark. Download the file from here, and unzip under the directory ./dataset Additionally download the following files here for evaluating on hard/easy splits of both MNLI dev and test sets. The dataset directory should be structured as the following:

└── dataset 
    └── MNLI
        ├── train.tsv
        ├── dev_matched.tsv
        ├── dev_mismatched.tsv
        ├── dev_mismatched.tsv
        ├── dev_matched_easy.tsv
        ├── dev_matched_hard.tsv
        ├── dev_mismatched_easy.tsv
        ├── dev_mismatched_hard.tsv
        ├── multinli_hard
        │   ├── multinli_0.9_test_matched_unlabeled_hard.jsonl
        │   └── multinli_0.9_test_mismatched_unlabeled_hard.jsonl
        ├── multinli_test
        │   ├── multinli_0.9_test_matched_unlabeled.jsonl
        │   └── multinli_0.9_test_mismatched_unlabeled.jsonl
        └── original

Running the experiments

For each evaluation setting, use the --mode and --which_bias arguments to set the appropriate loss function and the type of bias to mitigate (e.g, hans, hypo).

To reproduce our result on MNLI ⮕ HANS, run the following:

cd src/
CUDA_VISIBLE_DEVICES=6 python train_distill_bert.py \
    --output_dir ../checkpoints/hans/bert_confreg_lr5_epoch3_seed444 \
    --do_train --do_eval --mode smoothed_distill \
    --seed 444 --which_bias hans

For the MNLI ⮕ hard splits, run the following:

cd src/
CUDA_VISIBLE_DEVICES=10 python train_distill_bert.py \
    --output_dir ../checkpoints/hypo/bert_confreg_lr5_epoch3_seed111 \
    --do_train --do_eval --mode smoothed_distill \
    --seed 111 --which_bias hypo

Expected results

Results on the MNLI ⮕ HANS setting:

Mode Seed MNLI-m MNLI-mm HANS avg.
None 111 84.57 84.72 62.04
conf-reg 111 84.17 85.02 69.62

Results on the MNLI ⮕ Hard-splits setting:

Mode Seed MNLI-m MNLI-mm MNLI-m hard MNLI-mm hard
None 111 84.62 84.71 76.07 76.75
conf-reg 111 85.01 84.87 78.02 78.89

Contact

Contact person: Ajie Utama, [email protected]

https://www.ukp.tu-darmstadt.de/

Please reach out to us for further questions or if you encounter any issue. You can cite this work by the following:

@InProceedings{UtamaDebias2020,
  author    = {Utama, P. Ajie and Moosavi, Nafise Sadat and Gurevych, Iryna},
  title     = {Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance},
  booktitle = {Proceedings of the 58th Conference of the Association for Computational Linguistics},
  month     = jul,
  year      = {2020},
  publisher = {Association for Computational Linguistics}
}

Acknowledgement

The code in this repository is build on the implementation of debiasing method by Clark et al. The original version can be found here

Owner
Ubiquitous Knowledge Processing Lab
Ubiquitous Knowledge Processing Lab
Official PyTorch implementation of N-ImageNet: Towards Robust, Fine-Grained Object Recognition with Event Cameras (ICCV 2021)

N-ImageNet: Towards Robust, Fine-Grained Object Recognition with Event Cameras Official PyTorch implementation of N-ImageNet: Towards Robust, Fine-Gra

32 Dec 26, 2022
Using a Seq2Seq RNN architecture via TensorFlow to predict future Bitcoin prices

Recurrent Bitcoin Network A Data Science Thesis Project About This repository contains the source code for implementing Bitcoin price prediciton using

Frizu 6 Sep 08, 2022
TF Image Segmentation: Image Segmentation framework

TF Image Segmentation: Image Segmentation framework The aim of the TF Image Segmentation framework is to provide/provide a simplified way for: Convert

Daniil Pakhomov 546 Dec 17, 2022
Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization

FAC-Net Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization Linjiang Huang (CUHK), Liang Wang (CASIA), Hongsheng

21 Nov 22, 2022
Diffusion Probabilistic Models for 3D Point Cloud Generation (CVPR 2021)

Diffusion Probabilistic Models for 3D Point Cloud Generation [Paper] [Code] The official code repository for our CVPR 2021 paper "Diffusion Probabilis

Shitong Luo 323 Jan 05, 2023
🐤 Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation

🐤 Nix-TTS An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji

Rendi Chevi 156 Jan 09, 2023
This repository contains code for the paper "Disentangling Label Distribution for Long-tailed Visual Recognition", published at CVPR' 2021

Disentangling Label Distribution for Long-tailed Visual Recognition (CVPR 2021) Arxiv link Blog post This codebase is built on Causal Norm. Install co

Hyperconnect 85 Oct 18, 2022
An unreferenced image captioning metric (ACL-21)

UMIC This repository provides an unferenced image captioning metric from our ACL 2021 paper UMIC: An Unreferenced Metric for Image Captioning via Cont

hwanheelee 14 Nov 20, 2022
Prototypical Networks for Few shot Learning in PyTorch

Prototypical Networks for Few shot Learning in PyTorch Simple alternative Implementation of Prototypical Networks for Few Shot Learning (paper, code)

Orobix 835 Jan 08, 2023
BisQue is a web-based platform designed to provide researchers with organizational and quantitative analysis tools for 5D image data. Users can extend BisQue by implementing containerized ML workflows.

Overview BisQue is a web-based platform specifically designed to provide researchers with organizational and quantitative analysis tools for up to 5D

Vision Research Lab @ UCSB 26 Nov 29, 2022
Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World

Legged Robots that Keep on Learning Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World, whic

Laura Smith 70 Dec 07, 2022
Official MegEngine implementation of CREStereo(CVPR 2022 Oral).

[CVPR 2022] Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation This repository contains MegEngine implementation of ou

MEGVII Research 309 Dec 30, 2022
Object Detection with YOLOv3

Object Detection with YOLOv3 Bu projede YOLOv3-608 modeli kullanılmıştır. Requirements Python 3.8 OpenCV Numpy Documentation Yolo ile ilgili detaylı b

Ayşe Konuş 0 Mar 27, 2022
AI Toolkit for Healthcare Imaging

Medical Open Network for AI MONAI is a PyTorch-based, open-source framework for deep learning in healthcare imaging, part of PyTorch Ecosystem. Its am

Project MONAI 3.7k Jan 07, 2023
N-Omniglot is a large neuromorphic few-shot learning dataset

N-Omniglot [Paper] || [Dataset] N-Omniglot is a large neuromorphic few-shot learning dataset. It reconstructs strokes of Omniglot as videos and uses D

11 Dec 05, 2022
A curated list of references for MLOps

A curated list of references for MLOps

Larysa Visengeriyeva 9.3k Jan 07, 2023
MapReader: A computer vision pipeline for the semantic exploration of maps at scale

MapReader A computer vision pipeline for the semantic exploration of maps at scale MapReader is an end-to-end computer vision (CV) pipeline designed b

Living with Machines 25 Dec 26, 2022
Pytorch Lightning Distributed Accelerators using Ray

Distributed PyTorch Lightning Training on Ray This library adds new PyTorch Lightning plugins for distributed training using the Ray distributed compu

167 Jan 02, 2023
No Code AI/ML platform

NoCodeAIML No Code AI/ML platform - Community Edition Video credits: Uday Kiran Typical No Code AI/ML Platform will have features like drag and drop,

Bhagvan Kommadi 5 Jan 28, 2022
OpenMMLab Computer Vision Foundation

English | 简体中文 Introduction MMCV is a foundational library for computer vision research and supports many research projects as below: MMCV: OpenMMLab

OpenMMLab 4.6k Jan 09, 2023