CROSS-LINGUAL ABILITY OF MULTILINGUAL BERT: AN EMPIRICAL STUDY

Overview

M-BERT-Study

CROSS-LINGUAL ABILITY OF MULTILINGUAL BERT: AN EMPIRICAL STUDY

Motivation

Multilingual BERT (M-BERT) has shown surprising cross lingual abilities --- even when it is trained without cross lingual objectives. In this work, we analyze what causes this multilinguality from three factors: linguistic properties of the languages, the architecture of the model, and the learning objectives.

Results

Linguistic properties:

  • Code switching text (word-piece overlap) is not the main cause of multilinguality.
  • Word ordering is crucial, when words in sentences are randomly permuted, multilinguality is low, however, still significantly better than random.
  • (Unigram) word frequency is not enough, as we resampled all words with the same frequency, and found almost random performance. Combining the second and the third property infers that there is language similarity other than ordering of words between two languages, and which unigram frequency does not capture. We hypothesize that it may be similarity of n-gram occurrences.

Architecture:

  • Depth of the transformer is the most important.
  • Number of attention heads effects the absolute performance on individual languages, but the gap between in-language supervision and cross-language zero-shot learning didn't change much.
  • Total number of parameters, like depth, effects multilinguality.

Learning Objectives:

  • Next Sentence Prediction objective, when removed, leads to slight increase in performance.
  • Even marking sentences in languages with language-ids, allowing BERT to know exactly which language its learning on, did not hurt performance
  • Using word-pieces leads to strong improvements on both source and target language (likely to depend on tasks) and slight improvement cross-lingually comparing to word or character based models.

Please refer to our paper for more details.

Scripts

Creating pre-training data

If you would like to pre-train a BERT with Fake language/permuted sentences, see preprocessing-scripts for how to create the tfrecords for BERT training.

Pre-training BERT

Once you have uploaded the tfrecords to google cloud, you can set up an instance and start BERT training via bert-running-scripts.

Evaluating

With models we provide or just trained, we provide code for evaluating on two tasks, NER and entailment. See evaluating-scripts.

BERT Models

We release the following bert models (in a few days):

  • Word-piece Experiments
  • Word Order Experiments
  • Word Frequency Experiments
  • Model Structure Experiments

See data for detailed paths to download (in a few days).

Requirements

  • allennlp: 0.9.0
  • ccg_nlpy

Citation

Please cite the following paper if you find our paper useful. Thanks!

Karthikeyan K, Zihan Wang, Stephen Mayhew, Dan Roth. "Cross-Lingual Ability of Multilingual BERT: An Empirical Study" arXiv preprint arXiv:1912.07840 (2019).

@article{wang2019cross,
  title={Cross-Lingual Ability of Multilingual BERT: An Empirical Study},
  author={K, Karthikeyan and Wang, Zihan and Mayhew, Stephen and Roth, Dan},
  journal={arXiv preprint arXiv:1912.07840},
  year={2019}
}
Owner
CogComp
Cognitive Computation Group, led by Prof. Dan Roth
CogComp
Implementation of the Point Transformer layer, in Pytorch

Point Transformer - Pytorch Implementation of the Point Transformer self-attention layer, in Pytorch. The simple circuit above seemed to have allowed

Phil Wang 501 Jan 03, 2023
TensorFlow Implementation of Unsupervised Cross-Domain Image Generation

Domain Transfer Network (DTN) TensorFlow implementation of Unsupervised Cross-Domain Image Generation. Requirements Python 2.7 TensorFlow 0.12 Pickle

Yunjey Choi 864 Dec 30, 2022
RuDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP

[Paper] [Хабр] [Model Card] [Colab] [Kaggle] RuDOLPH 🦌 🎄 ☃️ One Hyper-Modal Tr

Sber AI 230 Dec 31, 2022
Contrastive Multi-View Representation Learning on Graphs

Contrastive Multi-View Representation Learning on Graphs This work introduces a self-supervised approach based on contrastive multi-view learning to l

Kaveh 208 Dec 23, 2022
Rax is a Learning-to-Rank library written in JAX

🦖 Rax: Composable Learning to Rank using JAX Rax is a Learning-to-Rank library written in JAX. Rax provides off-the-shelf implementations of ranking

Google 247 Dec 27, 2022
Dimension Reduced Turbulent Flow Data From Deep Vector Quantizers

Dimension Reduced Turbulent Flow Data From Deep Vector Quantizers This is an implementation of A Physics-Informed Vector Quantized Autoencoder for Dat

DreamSoul 3 Sep 12, 2022
"3D Human Texture Estimation from a Single Image with Transformers", ICCV 2021

Texformer: 3D Human Texture Estimation from a Single Image with Transformers This is the official implementation of "3D Human Texture Estimation from

XiangyuXu 193 Dec 05, 2022
Some bravo or inspiring research works on the topic of curriculum learning.

Towards Scalable Unpaired Virtual Try-On via Patch-Routed Spatially-Adaptive GAN Official code for NeurIPS 2021 paper "Towards Scalable Unpaired Virtu

131 Jan 07, 2023
A no-BS, dead-simple training visualizer for tf-keras

A no-BS, dead-simple training visualizer for tf-keras TrainingDashboard Plot inter-epoch and intra-epoch loss and metrics within a jupyter notebook wi

Vibhu Agrawal 3 May 28, 2021
Objax Apache-2Objax (🥉19 · ⭐ 580) - Objax is a machine learning framework that provides an Object.. Apache-2 jax

Objax Tutorials | Install | Documentation | Philosophy This is not an officially supported Google product. Objax is an open source machine learning fr

Google 729 Jan 02, 2023
Detection of drones using their thermal signatures from thermal camera through YOLO-V3 based CNN with modifications to encapsulate drone motion

Drone Detection using Thermal Signature This repository highlights the work for night-time drone detection using a using an Optris PI Lightweight ther

Chong Yu Quan 6 Dec 31, 2022
Toolbox to analyze temporal context invariance of deep neural networks

PyTCI A toolbox that estimates the integration window of a sensory response using the "Temporal Context Invariance" paradigm (TCI). The TCI method Int

4 Oct 23, 2022
Contenido del curso Bases de datos del DCC PUC versión 2021-2

IIC2413 - Bases de Datos Tabla de contenidos Equipo Profesores Ayudantes Contenidos Calendario Evaluaciones Resumen de notas Foro Política de integrid

54 Nov 23, 2022
Scalable Optical Flow-based Image Montaging and Alignment

SOFIMA SOFIMA (Scalable Optical Flow-based Image Montaging and Alignment) is a tool for stitching, aligning and warping large 2d, 3d and 4d microscopy

Google Research 16 Dec 21, 2022
This repository provides the official implementation of 'Learning to ignore: rethinking attention in CNNs' accepted in BMVC 2021.

inverse_attention This repository provides the official implementation of 'Learning to ignore: rethinking attention in CNNs' accepted in BMVC 2021. Le

Firas Laakom 5 Jul 08, 2022
RaceBERT -- A transformer based model to predict race and ethnicty from names

RaceBERT -- A transformer based model to predict race and ethnicty from names Installation pip install racebert Using a virtual environment is highly

Prasanna Parasurama 3 Nov 02, 2022
Data and Code for ACL 2021 Paper "Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning"

Introduction Code and data for ACL 2021 Paper "Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning". We cons

Pan Lu 81 Dec 27, 2022
[ICCV2021] Official code for "Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition"

CTR-GCN This repo is the official implementation for Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition. The pap

Yuxin Chen 148 Dec 16, 2022
Bio-OFC gym implementation and Gym-Fly environment

Bio-OFC gym implementation and Gym-Fly environment This repository includes the gym compatible implementation of the Bio-OFC algorithm from the paper

Siavash Golkar 1 Nov 16, 2021
A self-supervised learning framework for audio-visual speech

AV-HuBERT (Audio-Visual Hidden Unit BERT) Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction Robust Self-Supervised A

Meta Research 431 Jan 07, 2023