Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning.

Related tags

Deep LearningxTune
Overview

xTune

Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning.

Environment

DockerFile: dancingsoul/pytorch:xTune

Install the fine-tuning code: pip install --user .

Data & Model Preparation

XTREME Datasets

  1. Create a download folder with mkdir -p download in the root of this project.
  2. manually download panx_dataset (for NER) [here][2], (note that it will download as AmazonPhotos.zip) to the download directory.
  3. run the following command to download the remaining datasets: bash scripts/download_data.sh The code of downloading dataset from XTREME is from [xtreme offical repo][1].

Note that we keep the labels in test set for easier evaluation. To prevent accidental evaluation on the test sets while running experiments, the code of [xtreme offical repo][1] removes labels of the test data during pre-processing and changes the order of the test sentences for cross-lingual sentence retrieval. Replace csv.writer(fout, delimiter='\t') with csv.writer(fout, delimiter='\t', quoting=csv.QUOTE_NONE, quotechar='') in utils_process.py if using XTREME official repo.

Translations

XTREME provides translations for SQuAD v1.1 (only train and dev), MLQA, PAWS-X, TyDiQA-GoldP, XNLI, and XQuAD, which can be downloaded from [here][3]. The xtreme_translations folder should be moved to the download directory.

The target language translations for panx and udpos are obtained with Google Translate, since they are not provided. Our processed version can be downloaded from [here][4]. It should be merged with the above xtreme_translations folder.

Bi-lingual dictionaries

We obtain the bi-lingual dictionaries from the [MUSE][6] repo. For convenience, you can download them from [here][7] and move it to the download directory, i.e., ./download/dicts.

Models

XLM-Roberta is supported. We utilize the [huggingface][5] format, which can be downloaded with bash scripts/download_model.sh.

Fine-tuning Usage

Our default settings were using Nvidia V100-32GB GPU cards. If there were out-of-memory errors, you can reduce per_gpu_train_batch_size while increasing gradient_accumulation_steps, or use multi-GPU training.

xTune consists of a two-stage training process.

  • Stage 1: fine-tuning with example consistency on the English training set.
  • Stage 2: fine-tuning with example consistency on the augmented training set and regularize model consistency with the model from Stage 1.

It's recommended to use both Stage 1 and Stage 2 for token-level tasks, such as sequential labeling, and question answering. For text classification, you can only use Stage 1 if the computation budget was limited.

bash ./scripts/train.sh [setting] [dataset] [model] [stage] [gpu] [data_dir] [output_dir]

where the options are described as follows:

  • [setting]: translate-train-all (using input translation for the languages other than English) or cross-lingual-transfer (only using English for zero-shot cross-lingual transfer)
  • [dataset]: dataset names in XTREME, i.e., xnli, panx, pawsx, udpos, mlqa, tydiqa, xquad
  • [model]: xlm-roberta-base, xlm-roberta-large
  • [stage]: 1 (first stage), 2 (second stage)
  • [gpu]: used to set environment variable CUDA_VISIBLE_DEVICES
  • [data_dir]: folder of training data
  • [output_dir]: folder of fine-tuning output

Examples: XTREME Tasks

XNLI fine-tuning on English training set and translated training sets (translate-train-all)

# run stage 1 of xTune
bash ./scripts/train.sh translate-train-all xnli xlm-roberta-base 1
# run stage 2 of xTune (optional)
bash ./scripts/train.sh translate-train-all xnli xlm-roberta-base 2

XNLI fine-tuning on English training set (cross-lingual-transfer)

# run stage 1 of xTune
bash ./scripts/train.sh cross-lingual-transfer xnli xlm-roberta-base 1
# run stage 2 of xTune (optional)
bash ./scripts/train.sh cross-lingual-transfer xnli xlm-roberta-base 2

Paper

Please cite our paper \cite{bo2021xtune} if you found the resources in the repository useful.

@inproceedings{bo2021xtune,
author = {Bo Zheng, Li Dong, Shaohan Huang, Wenhui Wang, Zewen Chi, Saksham Singhal, Wanxiang Che, Ting Liu, Xia Song, Furu Wei},
booktitle = {Proceedings of ACL 2021},
title = {{Consistency Regularization for Cross-Lingual Fine-Tuning}},
year = {2021}
}

Reference

  1. https://github.com/google-research/xtreme
  2. https://www.amazon.com/clouddrive/share/d3KGCRCIYwhKJF0H3eWA26hjg2ZCRhjpEQtDL70FSBN?_encoding=UTF8&%2AVersion%2A=1&%2Aentries%2A=0&mgh=1
  3. https://console.cloud.google.com/storage/browser/xtreme_translations
  4. https://drive.google.com/drive/folders/1Rdbc0Us_4I5MpRCwLASxBwqSW8_dlF87?usp=sharing
  5. https://github.com/huggingface/transformers/
  6. https://github.com/facebookresearch/MUSE
  7. https://drive.google.com/drive/folders/1k9rQinwUXicglA5oyzo9xtgqiuUVDkjT?usp=sharing
Owner
Bo Zheng
Bo Zheng
An Exact Solver for Semi-supervised Minimum Sum-of-Squares Clustering

PC-SOS-SDP: an Exact Solver for Semi-supervised Minimum Sum-of-Squares Clustering PC-SOS-SDP is an exact algorithm based on the branch-and-bound techn

Antonio M. Sudoso 1 Nov 13, 2022
METER: Multimodal End-to-end TransformER

METER Code and pre-trained models will be publicized soon. Citation @article{dou2021meter, title={An Empirical Study of Training End-to-End Vision-a

Zi-Yi Dou 257 Jan 06, 2023
WatermarkRemoval-WDNet-WACV2021

WatermarkRemoval-WDNet-WACV2021 Thank you for your attention. Citation Please cite the related works in your publications if it helps your research: @

LUYI 63 Dec 05, 2022
Implementation of the famous Image Manipulation\Forgery Detector "ManTraNet" in Pytorch

Who has never met a forged picture on the web ? No one ! Everyday we are constantly facing fake pictures touched up in Photoshop but it is not always

Rony Abecidan 77 Dec 16, 2022
Final report with code for KAIST Course KSE 801.

Orthogonal collocation is a method for the numerical solution of partial differential equations

Chuanbo HUA 4 Apr 06, 2022
Converting CPT to bert form for use

cpt-encoder 将CPT转成bert形式使用 说明 刚刚刷到又出了一种模型:CPT,看论文显示,在很多中文任务上性能比mac bert还好,就迫不及待想把它用起来。 根据对源码的研究,发现该模型在做nlu建模时主要用的encoder部分,也就是bert,因此我将这部分权重转为bert权重类型

黄辉 1 Oct 14, 2021
Measuring Coding Challenge Competence With APPS

Measuring Coding Challenge Competence With APPS This is the repository for Measuring Coding Challenge Competence With APPS by Dan Hendrycks*, Steven B

Dan Hendrycks 218 Dec 27, 2022
PyTorch implementation of the paper: Label Noise Transition Matrix Estimation for Tasks with Lower-Quality Features

Label Noise Transition Matrix Estimation for Tasks with Lower-Quality Features Estimate the noise transition matrix with f-mutual information. This co

<a href=[email protected]"> 1 Jun 05, 2022
Code release for DS-NeRF (Depth-supervised Neural Radiance Fields)

Depth-supervised NeRF: Fewer Views and Faster Training for Free Project | Paper | YouTube Pytorch implementation of our method for learning neural rad

524 Jan 08, 2023
This computer program provides a reference implementation of Lagrangian Monte Carlo in metric induced by the Monge patch

This computer program provides a reference implementation of Lagrangian Monte Carlo in metric induced by the Monge patch. The code was prepared to the final version of the accepted manuscript in AIST

Marcelo Hartmann 2 May 06, 2022
Convert scikit-learn models to PyTorch modules

sk2torch sk2torch converts scikit-learn models into PyTorch modules that can be tuned with backpropagation and even compiled as TorchScript. Problems

Alex Nichol 101 Dec 16, 2022
Polynomial-time Meta-Interpretive Learning

Louise - polynomial-time Program Learning Getting help with Louise Louise's author can be reached by email at Stassa Patsantzis 64 Dec 26, 2022

A `Neural = Symbolic` framework for sound and complete weighted real-value logic

Logical Neural Networks LNNs are a novel Neuro = symbolic framework designed to seamlessly provide key properties of both neural nets (learning) and s

International Business Machines 138 Dec 19, 2022
Implement slightly different caffe-segnet in tensorflow

Tensorflow-SegNet Implement slightly different (see below for detail) SegNet in tensorflow, successfully trained segnet-basic in CamVid dataset. Due t

Tseng Kuan Lun 364 Oct 27, 2022
Code + pre-trained models for the paper Keeping Your Eye on the Ball Trajectory Attention in Video Transformers

Motionformer This is an official pytorch implementation of paper Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers. In this rep

Facebook Research 192 Dec 23, 2022
This code is a near-infrared spectrum modeling method based on PCA and pls

Nirs-Pls-Corn This code is a near-infrared spectrum modeling method based on PCA and pls 近红外光谱分析技术属于交叉领域,需要化学、计算机科学、生物科学等多领域的合作。为此,在(北邮邮电大学杨辉华老师团队)指导下

Fu Pengyou 6 Dec 17, 2022
FastCover: A Self-Supervised Learning Framework for Multi-Hop Influence Maximization in Social Networks by Anonymous.

FastCover: A Self-Supervised Learning Framework for Multi-Hop Influence Maximization in Social Networks by Anonymous.

0 Apr 02, 2021
This respository includes implementations on Manifoldron: Direct Space Partition via Manifold Discovery

Manifoldron: Direct Space Partition via Manifold Discovery This respository includes implementations on Manifoldron: Direct Space Partition via Manifo

dayang_wang 4 Apr 28, 2022
Code used for the results in the paper "ClassMix: Segmentation-Based Data Augmentation for Semi-Supervised Learning"

Code used for the results in the paper "ClassMix: Segmentation-Based Data Augmentation for Semi-Supervised Learning" Getting started Prerequisites CUD

70 Dec 02, 2022
Pairwise learning neural link prediction for ogb link prediction

Pairwise Learning for Neural Link Prediction for OGB (PLNLP-OGB) This repository provides evaluation codes of PLNLP for OGB link property prediction t

Zhitao WANG 31 Oct 10, 2022