CROSS-LINGUAL ABILITY OF MULTILINGUAL BERT: AN EMPIRICAL STUDY

Overview

M-BERT-Study

CROSS-LINGUAL ABILITY OF MULTILINGUAL BERT: AN EMPIRICAL STUDY

Motivation

Multilingual BERT (M-BERT) has shown surprising cross lingual abilities --- even when it is trained without cross lingual objectives. In this work, we analyze what causes this multilinguality from three factors: linguistic properties of the languages, the architecture of the model, and the learning objectives.

Results

Linguistic properties:

  • Code switching text (word-piece overlap) is not the main cause of multilinguality.
  • Word ordering is crucial, when words in sentences are randomly permuted, multilinguality is low, however, still significantly better than random.
  • (Unigram) word frequency is not enough, as we resampled all words with the same frequency, and found almost random performance. Combining the second and the third property infers that there is language similarity other than ordering of words between two languages, and which unigram frequency does not capture. We hypothesize that it may be similarity of n-gram occurrences.

Architecture:

  • Depth of the transformer is the most important.
  • Number of attention heads effects the absolute performance on individual languages, but the gap between in-language supervision and cross-language zero-shot learning didn't change much.
  • Total number of parameters, like depth, effects multilinguality.

Learning Objectives:

  • Next Sentence Prediction objective, when removed, leads to slight increase in performance.
  • Even marking sentences in languages with language-ids, allowing BERT to know exactly which language its learning on, did not hurt performance
  • Using word-pieces leads to strong improvements on both source and target language (likely to depend on tasks) and slight improvement cross-lingually comparing to word or character based models.

Please refer to our paper for more details.

Scripts

Creating pre-training data

If you would like to pre-train a BERT with Fake language/permuted sentences, see preprocessing-scripts for how to create the tfrecords for BERT training.

Pre-training BERT

Once you have uploaded the tfrecords to google cloud, you can set up an instance and start BERT training via bert-running-scripts.

Evaluating

With models we provide or just trained, we provide code for evaluating on two tasks, NER and entailment. See evaluating-scripts.

BERT Models

We release the following bert models (in a few days):

  • Word-piece Experiments
  • Word Order Experiments
  • Word Frequency Experiments
  • Model Structure Experiments

See data for detailed paths to download (in a few days).

Requirements

  • allennlp: 0.9.0
  • ccg_nlpy

Citation

Please cite the following paper if you find our paper useful. Thanks!

Karthikeyan K, Zihan Wang, Stephen Mayhew, Dan Roth. "Cross-Lingual Ability of Multilingual BERT: An Empirical Study" arXiv preprint arXiv:1912.07840 (2019).

@article{wang2019cross,
  title={Cross-Lingual Ability of Multilingual BERT: An Empirical Study},
  author={K, Karthikeyan and Wang, Zihan and Mayhew, Stephen and Roth, Dan},
  journal={arXiv preprint arXiv:1912.07840},
  year={2019}
}
Owner
CogComp
Cognitive Computation Group, led by Prof. Dan Roth
CogComp
ICCV2021: Code for 'Spatial Uncertainty-Aware Semi-Supervised Crowd Counting'

ICCV2021: Code for 'Spatial Uncertainty-Aware Semi-Supervised Crowd Counting'

Yanda Meng 14 May 13, 2022
U-Net for GBM

My Final Year Project(FYP) In National University of Singapore(NUS) You need Pytorch(stable 1.9.1) Both cuda version and cpu version are OK File Str

PinkR1ver 1 Oct 27, 2021
[SIGGRAPH 2021 Asia] DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning

DeepVecFont This is the official Pytorch implementation of the paper: Yizhi Wang and Zhouhui Lian. DeepVecFont: Synthesizing High-quality Vector Fonts

Yizhi Wang 146 Dec 18, 2022
Official Implementation of DE-DETR and DELA-DETR in "Towards Data-Efficient Detection Transformers"

DE-DETRs By Wen Wang, Jing Zhang, Yang Cao, Yongliang Shen, and Dacheng Tao This repository is an official implementation of DE-DETR and DELA-DETR in

Wen Wang 61 Dec 12, 2022
A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB, or simply to separate onnx files to any size you want.

sne4onnx A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB, or

Katsuya Hyodo 10 Aug 30, 2022
ICLR2021 (Under Review)

Self-Supervised Time Series Representation Learning by Inter-Intra Relational Reasoning This repository contains the official PyTorch implementation o

Haoyi Fan 58 Dec 30, 2022
Face and Pose detector that emits MQTT events when a face or human body is detected and not detected.

Face Detect MQTT Face or Pose detector that emits MQTT events when a face or human body is detected and not detected. I built this as an alternative t

Jacob Morris 38 Oct 21, 2022
Deep Learning Training Scripts With Python

Deep Learning Training Scripts DNN Frameworks Caffe PyTorch Tensorflow CNN Models VGG ResNet DenseNet Inception Language Modeling GatedCNN-LM Attentio

Multicore Computing Research Lab 16 Dec 15, 2022
なりすまし検出(anti-spoof-mn3)のWebカメラ向けデモ

FaceDetection-Anti-Spoof-Demo なりすまし検出(anti-spoof-mn3)のWebカメラ向けデモです。 モデルはPINTO_model_zoo/191_anti-spoof-mn3からONNX形式のモデルを使用しています。 Requirement mediapipe

KazuhitoTakahashi 8 Nov 18, 2022
Multiview 3D object detection on MultiviewC dataset through moft3d.

Voxelized 3D Feature Aggregation for Multiview Detection [arXiv] Multiview 3D object detection on MultiviewC dataset through VFA. Introduction We prop

Jiahao Ma 20 Dec 21, 2022
Revisiting, benchmarking, and refining Heterogeneous Graph Neural Networks.

Heterogeneous Graph Benchmark Revisiting, benchmarking, and refining Heterogeneous Graph Neural Networks. Roadmap We organize our repo by task, and on

THUDM 176 Dec 17, 2022
Explanatory Learning: Beyond Empiricism in Neural Networks

Explanatory Learning This is the official repository for "Explanatory Learning: Beyond Empiricism in Neural Networks". Datasets Download the datasets

GLADIA Research Group 10 Dec 06, 2022
Dense Gaussian Processes for Few-Shot Segmentation

DGPNet - Dense Gaussian Processes for Few-Shot Segmentation Welcome to the public repository for DGPNet. The paper is available at arxiv: https://arxi

37 Jan 07, 2023
Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting This is the origin Pytorch implementation of Informer in the followin

Haoyi 3.1k Dec 29, 2022
A self-supervised 3D representation learning framework named viewpoint bottleneck.

Pointly-supervised 3D Scene Parsing with Viewpoint Bottleneck Paper Created by Liyi Luo, Beiwen Tian, Hao Zhao and Guyue Zhou from Institute for AI In

63 Aug 11, 2022
[SIGGRAPH'22] StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets

[Project] [PDF] This repository contains code for our SIGGRAPH'22 paper "StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets" by Axel Sauer, Katja

742 Jan 04, 2023
A python implementation of Yolov5 to detect fire or smoke in the wild in Jetson Xavier nx and Jetson nano

yolov5-fire-smoke-detect-python A python implementation of Yolov5 to detect fire or smoke in the wild in Jetson Xavier nx and Jetson nano You can see

20 Dec 15, 2022
Realtime YOLO Monster Detection With Non Maximum Supression

Realtime-YOLO-Monster-Detection-With-Non-Maximum-Supression Table of Contents In

5 Oct 07, 2022
Implementing DropPath/StochasticDepth in PyTorch

%load_ext memory_profiler Implementing Stochastic Depth/Drop Path In PyTorch DropPath is available on glasses my computer vision library! Introduction

Francesco Saverio Zuppichini 13 Jan 05, 2023
This repo is about implementing different approaches of pose estimation and also is a sub-task of the smart hospital bed project :smile:

Pose-Estimation This repo is a sub-task of the smart hospital bed project which is about implementing the task of pose estimation 😄 Many thanks to th

Max 11 Oct 17, 2022