Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Last update: Sep 07, 2022

Overview

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

This repository is derived from the NMTGMinor project at https://github.com/quanpn90/NMTGMinor
The SVCCA calculation is derived from https://github.com/nlp-dke/svcca

Powered by Mediaan.com

Speech Translation (ST) is the task of translating speech audio in a source language into text in a target language. This repository implements and experiments on different approaches for ST:

Cascaded ST, including 2 steps: Automatic Speech Recognition (ASR) and Machine Translation (MT)
Direct ST: models trained only on ST data
(Main contribution) End-to-end ST limiting the use of ST data: multi-modal models leveraging ASR and MT training data for ST task

The Transformer architecture is used as the baseline for the implementation.

High-level instruction to use the repo:

Run covost_data_preparation.py to download and preprocess the data.
Run the shell script of interst, change the variables in the script if needed.
- run_translation_pipeline.sh for single-task models (ASR, MT, ST)
- cascaded_ST_evaluation.sh evaluates cascaded ST using pretrained ASR and MT models
- run_translation_multi_modalities_pipeline.sh for multi-task, multi-modality models (including zero-shot)
- run_zeroshot_with_artificial_data.sh for zero-shot models using data augmentation
- run_bidirectional_zeroshot.sh for zero-shot models using additional opposite training data
- run_fine_tunning.sh, run_fine_tunning_fromASR.sh for fine-tuning models with ST data, resulting in few-shot models
- modality_similarity_svcca.sh, modality_similarity_classifier.sh measure text-audio similarity in representation

See notebooks/Repo_Instruction.ipynb for more details.

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Related tags

Overview

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Owner

Tu Anh Dinh

Airborne Optical Sectioning (AOS) is a wide synthetic-aperture imaging technique

Official Repository for our ECCV2020 paper: Imbalanced Continual Learning with Partitioning Reservoir Sampling

Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

Fast and robust certifiable relative pose estimation

Predict halo masses from simulations via graph neural networks

Colab notebook and additional materials for Python-driven analysis of redlining data in Philadelphia

Finite-temperature variational Monte Carlo calculation of uniform electron gas using neural canonical transformation.

This is a work in progress reimplementation of Instant Neural Graphics Primitives

Keras implementation of the GNM model in paper ’Graph-Based Semi-Supervised Learning with Nonignorable Nonresponses‘

Revealing and Protecting Labels in Distributed Training

This repository for project that can Automate Number Plate Recognition (ANPR) in Morocco Licensed Vehicles. 💻 + 🚙 + 🇲🇦 = 🤖 🕵🏻‍♂️

RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation

DetCo: Unsupervised Contrastive Learning for Object Detection

CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation

FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset (CVPR2022)

Code for the ICCV 2021 paper "Pixel Difference Networks for Efficient Edge Detection" (Oral).

[CVPR 2021] Unsupervised Degradation Representation Learning for Blind Super-Resolution

Solving reinforcement learning tasks which require language and vision

A PyTorch Implementation of SphereFace.

CTRMs: Learning to Construct Cooperative Timed Roadmaps for Multi-agent Path Planning in Continuous Spaces