Repository for fine-tuning Transformers 🤗 based seq2seq speech models in JAX/Flax.

Last update: Dec 14, 2022

Related tags

Overview

Seq2Seq Speech in JAX

A JAX/Flax repository for combining a pre-trained speech encoder model (e.g. Wav2Vec2, HuBERT, WavLM) with a pre-trained text decoder model (e.g. GPT2, Bart) to yield a Speech Sequence-to-Sequence (Seq2Seq) model for automatic speech recognition.

The script run_flax_speech_recognition_seq2seq.py can be used to fine-tune a Speech Seq2Seq model on one of the official speech recognition datasets or a custom dataset. It makes use of the pmap JAX operator to provide model parallelism accross GPU/TPU devices.

The modelling files are based very heavily on those from Hugging Face Transformers 🤗 . This is a standalone repository to enable rapid prototyping and involvement with the community. The final modelling files and training script will be merged into Transformers 🤗 to be used with the rest of the open-source library. The final system weights will be made publicly available at huggingface.co 🚀

Figure 1: Speech-encoder text-decoder style Seq2Seq model.

Example Usage

To instantiate a Wav2Vec2-2-Bart model with the FlaxSpeechEncoderDecoderModel framework, run the following Python script inside the cloned repo:

from transformers import AutoFeatureExtractor, AutoTokenizer
from models.modeling_flax_speech_encoder_decoder import FlaxSpeechEncoderDecoderModel
import numpy as np

# checkpoints to leverage
encoder_id = "facebook/wav2vec2-large-lv60"
decoder_id = "facebook/bart-large"

model = FlaxSpeechEncoderDecoderModel.from_encoder_decoder_pretrained(
    encoder_id, decoder_id, encoder_add_adapter=True, decoder_from_pt=True)

model.config.decoder_start_token_id = model.config.decoder.bos_token_id
model.config.pad_token_id = model.config.decoder.pad_token_id
model.config.eos_token_id = model.config.decoder.eos_token_id
model.config.use_cache = False
model.config.processor_class = "Wav2Vec2Processor"

# check if generation works
out = model.generate(np.ones((1, 2000)))

model.save_pretrained("./")

feature_extractor = AutoFeatureExtractor.from_pretrained(encoder_id)
feature_extractor.save_pretrained("./")
tokenizer = AutoTokenizer.from_pretrained(decoder_id)
tokenizer.save_pretrained("./")

To train the model on Librispeech ASR in default precision, run the bash script provided below:

#!/usr/bin/env bash
python run_flax_speech_recognition_seq2seq.py \
        --dataset_name="librispeech_asr" \
        --model_name_or_path="./" \
        --dataset_config_name="clean" \
        --train_split_name="train.100" \
        --eval_split_name="validation" \
        --output_dir="./" \
        --preprocessing_num_workers="16" \
        --length_column_name="input_length" \
        --overwrite_output_dir \
        --num_train_epochs="5" \
        --per_device_train_batch_size="2" \
        --per_device_eval_batch_size="2" \
        --gradient_accumulation_steps="1" \
        --logging_steps="25" \
        --max_duration_in_seconds="15" \
        --max_target_length="128" \
        --generation_max_length="40" \
        --generation_num_beams="1" \
        --learning_rate="1e-4" \
        --warmup_steps="500" \
        --text_column_name="text" \
        --save_total_limit="1" \
        --freeze_feature_encoder \
        --predict_with_generate \
        --do_lower_case \
        --do_eval \
        --do_train

Repository for fine-tuning Transformers 🤗 based seq2seq speech models in JAX/Flax.

Related tags

Overview

Seq2Seq Speech in JAX

Example Usage

Owner

Sanchit Gandhi

Sentello is python script that simulates the anti-evasion and anti-analysis techniques used by malware.

A BERT-based reverse-dictionary of Korean proverbs

Diaformer: Automatic Diagnosis via Symptoms Sequence Generation

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task

justCTF [*] 2020 challenges sources

Multispeaker & Emotional TTS based on Tacotron 2 and Waveglow

基于“Seq2Seq+前缀树”的知识图谱问答

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.

Python package for performing Entity and Text Matching using Deep Learning.

A PyTorch-based model pruning toolkit for pre-trained language models

A method to generate speech across multiple speakers

Image2pcl - Enter the metaverse with 2D image to 3D projections

Simple Speech to Text, Text to Speech

text to speech toolkit. 好用的中文语音合成工具箱，包含语音编码器、语音合成器、声码器和可视化模块。

Code for the ACL 2021 paper "Structural Guidance for Transformer Language Models"

Winner system (DAMO-NLP) of SemEval 2022 MultiCoNER shared task over 10 out of 13 tracks.

Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch

BookNLP, a natural language processing pipeline for books

Pretrained Japanese BERT models