LeBenchmark: a reproducible framework for assessing SSL from speech

Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these works suggest it is possible to reduce dependence on labeled data for building efficient speech systems, their evaluation was mostly made on ASR and using multiple and heterogeneous experimental settings (most of them for English). This renders difficult the objective comparison between SSL approaches and the evaluation of their impact on building speech systems.

In this repository, we propose LeBenchmark: a reproducible framework for assessing SSL from speech. It not only includes ASR (high and low resource) tasks but also spoken language understanding, speech translation and emotion recognition. Also, it targets speech technologies in a language different than English: French. SSL models of different sizes are trained from carefully sourced and documented datasets.

The scripts for data preparation are available here.

Our pre-trained SSL models for French are available through this HuggingFace link: https://huggingface.co/LeBenchmark

Our benchmark tasks are available on the following directories:

ASR: Automatic Speech Recognition

SLU: Spoken Language Understanding

AER: Automatic Emotion Recognition

AST: Automatic Speech Translation

Detailed descriptions of experiments and results are given in on our paper: TBC !

LeBenchmark: a reproducible framework for assessing SSL from speech

Related tags

Overview

LeBenchmark: a reproducible framework for assessing SSL from speech

Owner

NLP techniques such as named entity recognition, sentiment analysis, topic modeling, text classification with Python to predict sentiment and rating of drug from user reviews.

Script to generate VAD dataset used in Asteroid recipe

Proquabet - Convert your prose into proquints and then you essentially have Vogon poetry

A website which allows you to play with the GPT-2 transformer

This is a simple item2vec implementation using gensim for recbole

GNES enables large-scale index and semantic search for text-to-text, image-to-image, video-to-video and any-to-any content form

Utilities for preprocessing text for deep learning with Keras

AI_Assistant - This is a Python based Voice Assistant.

A python package for deep multilingual punctuation prediction.

A retro text-to-speech bot for Discord

A simple chatbot based on chatterbot that you can use for anything has basic features

A multi-lingual approach to AllenNLP CoReference Resolution along with a wrapper for spaCy.

Geometry-Consistent Neural Shape Representation with Implicit Displacement Fields

A versatile token stream for handwritten parsers.

Recognition of 38 speech commands in russian. Based on Yandex Cup 2021 ML Challenge: ASR

nlabel is a library for generating, storing and retrieving tagging information and embedding vectors from various nlp libraries through a unified interface.

Anuvada: Interpretable Models for NLP using PyTorch

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Practical Machine Learning with Python

A text file containing 479k English words for all your dictionary/word-based projects e.g: auto-completion / autosuggestion