Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

Overview

Cross-Attention Transfer for Machine Translation

This repo hosts the code to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021.

Setup

We provide our scripts and modifications to Fairseq. In this section, we describe how to go about running the code and, for instance, reproduce Table 2 in the paper.

Data

To view the data as we prepared and used it, switch to the main branch. But we recommend cloning code from this branch to avoid downloading a large amount of data at once. You can always obtain any data as necessary from the main branch.

Installations

We worked in a conda environment with Python 3.8.

  • First install the requirements.
      pip install requirements.txt
  • Then install Fairseq. To have the option to modify the package, install it in editable mode.
      cd fairseq-modified
      pip install -e .
  • Finally, set the following environment variable.
      export FAIRSEQ=$PWD
      cd ..

Experiments

For the purpose of this walk-through, we assume we want to train a De–En model, using the following data:

De-En
├── iwslt13.test.de
├── iwslt13.test.en
├── iwslt13.test.tok.de
├── iwslt13.test.tok.en
├── iwslt15.tune.de
├── iwslt15.tune.en
├── iwslt15.tune.tok.de
├── iwslt15.tune.tok.en
├── iwslt16.train.de
├── iwslt16.train.en
├── iwslt16.train.tok.de
└── iwslt16.train.tok.en

by transferring from a Fr–En parent model, the experiment files of which is stored under FrEn/checkpoints.

  • Start by making an experiment folder and preprocessing the data.
      mkdir test_exp
      ./xattn-transfer-for-mt/scripts/data_preprocessing/prepare_bi.sh \
          de en test_exp/ \
          De-En/iwslt16.train.tok De-En/iwslt15.tune.tok De-En/iwslt13.test.tok \
          8000
    Please note that prepare_bi.sh is written for the most general case, where you are learning vocabulary for both the source and target sides. When necessary modify it, and reuse whatever vocabulary you want. In this case, e.g., since we are transferring from Fr–En to De–En, we will reuse the target side vocabulary from the parent. So 8000 refers to the source vocabulary size, and we need to copy parent target vocabulary instead of learning one in the script.
      cp ./FrEn/data/tgt.sentencepiece.bpe.model $DATA
      cp ./FrEn/data/tgt.sentencepiece.bpe.vocab $DATA
  • Now you can run an experiment. Here we want to just update the source embeddings and the cross-attention. So we run the corresponding script. Script names are self-explanatory. Set the correct path to the desired parent model checkpoint in the script, and:
      bash ./xattn-transfer-for-mt/scripts/training/reinit-src-embeddings-and-finetune-parent-model-on-translation_src+xattn.sh \
          test_exp/ de en
  • Finally, after training, evaluate your model. Set the correct path to the detokenizer that you use in the script, and:
      bash ./xattn-transfer-for-mt/scripts/evaluation/decode_and_score_valid_and_test.sh \
          test_exp/ de en \
          $PWD/De-En/iwslt15.tune.en $PWD/De-En/iwslt13.test.en

Issues

Please contact us and report any problems you might face through the issues tab of the repo. Thanks in advance for helping us improve the repo!

Credits

The main body of code is built upon Fairseq. We found it very easy to navigate and modify. Kudos to the developers!
The data preprocessing scripts are adopted from FLORES scripts.
To have mBART fit on the GPUs that we worked with memory-wise, we used the trimming solution provided here.

Citation

@inproceedings{gheini-cross-attention,
  title = "Cross-Attention is All You Need: {A}dapting Pretrained {T}ransformers for Machine Translation",
  author = "Gheini, Mozhdeh and Ren, Xiang and May, Jonathan",
  booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
  month = nov,
  year = "2021"
}
Owner
Mozhdeh Gheini
Computer Science Ph.D. Student at the University of Southern California
Mozhdeh Gheini
classification task on dataset-CIFAR10,by using Tensorflow/keras

CIFAR10-Tensorflow classification task on dataset-CIFAR10,by using Tensorflow/keras 在这一个库中,我使用Tensorflow与keras框架搭建了几个卷积神经网络模型,针对CIFAR10数据集进行了训练与测试。分别使

3 Oct 17, 2021
Apply a perspective transformation to a raster image inside Inkscape (no need to use an external software such as GIMP or Krita).

Raster Perspective Apply a perspective transformation to bitmap image using the selected path as envelope, without the need to use an external softwar

s.ouchene 19 Dec 22, 2022
Pytorch implementation for "Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets" (ECCV 2020 Spotlight)

Distribution-Balanced Loss [Paper] The implementation of our paper Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets (

Tong WU 304 Dec 22, 2022
Implementation of StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation in PyTorch

StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation Implementation of StyleSpace Analysis: Disentangled Controls for StyleGAN Ima

Xuanchi Ren 86 Dec 07, 2022
The final project of "Applying AI to 2D Medical Imaging Data" of "AI for Healthcare" nanodegree - Udacity.

Pneumonia Detection from X-Rays Project Overview In this project, you will apply the skills that you have acquired in this 2D medical imaging course t

Omar Laham 1 Jan 14, 2022
Search and filter videos based on objects that appear in them using convolutional neural networks

Thingscoop: Utility for searching and filtering videos based on their content Description Thingscoop is a command-line utility for analyzing videos se

Anastasis Germanidis 354 Dec 04, 2022
Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation

SimplePose Code and pre-trained models for our paper, “Simple Pose: Rethinking and Improving a Bottom-up Approach for Multi-Person Pose Estimation”, a

Jia Li 256 Dec 24, 2022
Single-Stage 6D Object Pose Estimation, CVPR 2020

Overview This repository contains the code for the paper Single-Stage 6D Object Pose Estimation. Yinlin Hu, Pascal Fua, Wei Wang and Mathieu Salzmann.

CVLAB @ EPFL 89 Dec 26, 2022
UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus General info This is

71 Oct 25, 2022
Code for the paper: On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations

Non-Parametric Prior Actor-Critic (N-PPAC) This repository contains the code for On Pathologies in KL-Regularized Reinforcement Learning from Expert D

Cong Lu 5 May 13, 2022
Jittor 64*64 implementation of StyleGAN

StyleGanJittor (Tsinghua university computer graphics course) Overview Jittor 64

Song Shengyu 3 Jan 20, 2022
Combining Reinforcement Learning and Constraint Programming for Combinatorial Optimization

Hybrid solving process for combinatorial optimization problems Combinatorial optimization has found applications in numerous fields, from aerospace to

117 Dec 13, 2022
A library to inspect itermediate layers of PyTorch models.

A library to inspect itermediate layers of PyTorch models. Why? It's often the case that we want to inspect intermediate layers of a model without mod

archinet.ai 380 Dec 28, 2022
PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network"

HAN PyTorch code for our ECCV 2020 paper "Single Image Super-Resolution via a Holistic Attention Network" This repository is for HAN introduced in the

五维空间 140 Nov 23, 2022
This repository contains the official implementation code of the paper Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis

This repository contains the official implementation code of the paper Transformer-based Feature Reconstruction Network for Robust Multimodal Sentiment Analysis, accepted at ACMMM 2021.

Ziqi Yuan 10 Sep 30, 2022
This repository contains an implementation of the Permutohedral Attention Module in Pytorch

Permutohedral_attention_module This repository contains an implementation of the Permutohedral Attention Module

Samuel JOUTARD 26 Nov 27, 2022
You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling

You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling Transformer-based models are widely used in natural language processi

Zhanpeng Zeng 12 Jan 01, 2023
PySLM Python Library for Selective Laser Melting and Additive Manufacturing

PySLM Python Library for Selective Laser Melting and Additive Manufacturing PySLM is a Python library for supporting development of input files used i

Dr Luke Parry 35 Dec 27, 2022
This project hosts the code for implementing the ISAL algorithm for object detection and image classification

Influence Selection for Active Learning (ISAL) This project hosts the code for implementing the ISAL algorithm for object detection and image classifi

25 Sep 11, 2022
STEM: An approach to Multi-source Domain Adaptation with Guarantees

STEM: An approach to Multi-source Domain Adaptation with Guarantees Introduction This is the official implementation of ``STEM: An approach to Multi-s

5 Dec 19, 2022