UniSpeech - Large Scale Self-Supervised Learning for Speech

Overview

UniSpeech

The family of UniSpeech:

WavLM (arXiv): WavLM: Large-Scale Self-Supervised Pre-training for Full Stack Speech Processing

UniSpeech (ICML 2021): Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

UniSpeech-SAT (ICASSP 2022 Submission): Universal Speech Representation Learning with Speaker Aware Pre-Training

Update

Pre-trained models

We strongly suggest using our UniSpeech-SAT model for speaker related tasks, since it shows very powerful performance on various speaker related benchmarks.

Model Pretraining Dataset Finetuning Dataset Model
UniSpeech Large EN Labeled: 1350 hrs en - download
UniSpeech Large Multilingual Labeled: 1350 hrs en + 353 hrs fr + 168 hrs es + 90 hrs it - download
Unispeech Large+ Labeled: 1350 hrs en, Unlabeled: 353 hrs fr - download
UniSpeech Large+ Labeld: 1350 hrs en, Unlabeled: 168 hrs es - download
UniSpeech Large+ Labeled: 1350 hrs en, Unlabeld: 90 hrs it - download
UniSpeech Large Multilingual Labeled: 1350 hrs en + 353 hrs fr + 168 hrs es + 90 hrs it, Unlabeled: 17 hrs ky - download
UniSpeech Large+ Labeled: 1350 hrs en, Unlabeled: 353 hrs fr 1 hr fr download
UniSpeech Large+ Labeld: 1350 hrs en, Unlabeled: 168 hrs es 1 hr es download
UniSpeech Large+ Labeled: 1350 hrs en, Unlabeld: 90 hrs it 1 hr it download
UniSpeech Large Multilingual Labeled: 1350 hrs en + 353 hrs fr + 168 hrs es + 90 hrs it, Unlabeled: 17 hrs ky 1 hr ky download
UniSpeech-SAT Base 960 hrs LibriSpeech - download
UniSpeech-SAT Base+ 60k hrs Libri-Light + 10k hrs GigaSpeech + 24k hrs VoxPopuli - download
UniSpeech-SAT Large 60k hrs Libri-Light + 10k hrs GigaSpeech + 24k hrs VoxPopuli - download
WavLM Base 960 hrs LibriSpeech - Azure Storage
Google Drive
WavLM Base+ 60k hrs Libri-Light + 10k hrs GigaSpeech + 24k hrs VoxPopuli - Azure Storage
Google Drive
WavLM Large 60k hrs Libri-Light + 10k hrs GigaSpeech + 24k hrs VoxPopuli - Azure Storage
Google Drive

Universal Representation Evaluation on SUPERB

alt text

Downstream Task Performance

We also evaluate our models on typical speaker related benchmarks.

Speaker Verification

Model Fix pre-train Vox1-O Vox1-E Vox1-H
ECAPA-TDNN - 0.87 1.12 2.12
HuBERT large Yes 0.888 0.912 1.853
Wav2Vec2.0 (XLSR) Yes 0.915 0.945 1.895
UniSpeech-SAT large Yes 0.771 0.781 1.669
WavLM large Yes 0.638 0.687 1.457
HuBERT large No 0.585 0.654 1.342
Wav2Vec2.0 (XLSR) No 0.564 0.605 1.23
UniSpeech-SAT large No 0.564 0.561 1.23
WavLM large No 0.431 0.538 1.154

Our paper for verification

Speech Separation

Evaluation on LibriCSS

Model 0S 0L OV10 OV20 OV30 OV40
Conformer (SOTA) 4.5 4.4 6.2 8.5 11 12.6
UniSpeech-SAT base 4.4 4.4 5.4 7.2 9.2 10.5
UniSpeech-SAT large 4.3 4.2 5.0 6.3 8.2 8.8
WavLM base+ 4.5 4.4 5.6 7.5 9.4 10.9
WavLM large 4.2 4.1 4.8 5.8 7.4 8.5

Speaker Diarization

Evaluation on CALLHOME

Model spk_2 spk_3 spk_4 spk_5 spk_6 spk_all
EEND-vector clustering 7.96 11.93 16.38 21.21 23.1 12.49
EEND-EDA clustering (SOTA) 7.11 11.88 14.37 25.95 21.95 11.84
UniSpeech-SAT large 5.93 10.66 12.9 16.48 23.25 10.92
WavLM Base 6.99 11.12 15.20 16.48 21.61 11.75
WavLm large 6.46 10.69 11.84 12.89 20.70 10.35

License

This project is licensed under the license found in the LICENSE file in the root directory of this source tree. Portions of the source code are based on the FAIRSEQ project.

Microsoft Open Source Code of Conduct

Reference

If you find our work is useful in your research, please cite the following paper:

@inproceedings{Wang2021UniSpeech,
  author    = {Chengyi Wang and Yu Wu and Yao Qian and Kenichi Kumatani and Shujie Liu and Furu Wei and Michael Zeng and Xuedong Huang},
  editor    = {Marina Meila and Tong Zhang},
  title     = {UniSpeech: Unified Speech Representation Learning with Labeled and
               Unlabeled Data},
  booktitle = {Proceedings of the 38th International Conference on Machine Learning,
               {ICML} 2021, 18-24 July 2021, Virtual Event},
  series    = {Proceedings of Machine Learning Research},
  volume    = {139},
  pages     = {10937--10947},
  publisher = {{PMLR}},
  year      = {2021},
  url       = {http://proceedings.mlr.press/v139/wang21y.html},
  timestamp = {Thu, 21 Oct 2021 16:06:12 +0200},
  biburl    = {https://dblp.org/rec/conf/icml/0002WQK0WZ021.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}
@article{Chen2021WavLM,
  title   = {WavLM: Large-Scale Self-Supervised  Pre-training   for Full Stack Speech Processing},
  author  = {Sanyuan Chen and Chengyi Wang and Zhengyang Chen and Yu Wu and Shujie Liu and Zhuo Chen and Jinyu Li and Naoyuki Kanda and Takuya Yoshioka and Xiong Xiao and Jian Wu and Long Zhou and Shuo Ren and Yanmin Qian and Yao Qian and Jian Wu and Michael Zeng and Furu Wei},
  eprint={2110.13900},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  year={2021}
}
@article{Chen2021UniSpeechSAT,
  title   = {UniSpeech-SAT: Universal Speech Representation Learning with  Speaker Aware Pre-Training},
  author  = {Sanyuan Chen and Yu Wu and Chengyi Wang and Zhengyang Chen and Zhuo Chen and Shujie Liu and   Jian Wu and Yao Qian and Furu Wei and Jinyu Li and  Xiangzhan Yu},
  eprint={2110.05752},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  year={2021}
}

Contact Information

For help or issues using UniSpeech models, please submit a GitHub issue.

For other communications related to UniSpeech, please contact Yu Wu ([email protected]).

Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
A curated list of FOSS tools to improve the Hacker News experience

Awesome-Hackernews Hacker News is a social news website focusing on computer technologies, hacking and startups. It promotes any content likely to "gr

Bryton Lacquement 141 Dec 27, 2022
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

Rasa 15.3k Jan 03, 2023
Sequence model architectures from scratch in PyTorch

This repository implements a variety of sequence model architectures from scratch in PyTorch. Effort has been put to make the code well structured so that it can serve as learning material. The train

Brando Koch 11 Mar 28, 2022
用Resnet101+GPT搭建一个玩王者荣耀的AI

基于pytorch框架用resnet101加GPT搭建AI玩王者荣耀 本源码模型主要用了SamLynnEvans Transformer 的源码的解码部分。以及pytorch自带的预训练模型"resnet101-5d3b4d8f.pth"

冯泉荔 2.2k Jan 03, 2023
Automated question generation and question answering from Turkish texts using text-to-text transformers

Turkish Question Generation Offical source code for "Automated question generation & question answering from Turkish texts using text-to-text transfor

Open Business Software Solutions 29 Dec 14, 2022
Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings

Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings Trong bài viết này mình sẽ sử dụng pretrain model SimCS

Vo Van Phuc 18 Nov 25, 2022
Saptak Bhoumik 14 May 24, 2022
Download videos from YouTube/Twitch/Twitter right in the Windows Explorer, without installing any shady shareware apps

youtube-dl and ffmpeg Windows Explorer Integration Download videos from YouTube/Twitch/Twitter and more (any platform that is supported by youtube-dl)

Wolfgang 226 Dec 30, 2022
Code repository for "It's About Time: Analog clock Reading in the Wild"

it's about time Code repository for "It's About Time: Analog clock Reading in the Wild" Packages required: pytorch (used 1.9, any reasonable version s

52 Nov 10, 2022
CLIPfa: Connecting Farsi Text and Images

CLIPfa: Connecting Farsi Text and Images OpenAI released the paper Learning Transferable Visual Models From Natural Language Supervision in which they

Sajjad Ayoubi 66 Dec 14, 2022
Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

Word2Wave is a simple method for text-controlled GAN audio generation. You can either follow the setup instructions below and use the source code and CLI provided in this repo or you can have a play

Ilaria Manco 91 Dec 23, 2022
Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

Introduction Funnel-Transformer is a new self-attention model that gradually compresses the sequence of hidden states to a shorter one and hence reduc

GUOKUN LAI 197 Dec 11, 2022
Smart discord chatbot integrated with Dialogflow to manage different classrooms and assist in teaching!

smart-school-chatbot Smart discord chatbot integrated with Dialogflow to interact with students naturally and manage different classes in a school. De

Tom Huynh 5 Oct 24, 2022
Simple program that translates the name of files into English

Simple program that translates the name of files into English. Useful for when editing/inspecting programs that were developed in a foreign language.

0 Dec 22, 2021
Tools to download and cleanup Common Crawl data

cc_net Tools to download and clean Common Crawl as introduced in our paper CCNet. If you found these resources useful, please consider citing: @inproc

Meta Research 483 Jan 02, 2023
FB ID CLONER WUTHOT CHECKPOINT, FACEBOOK ID CLONE FROM FILE

* MY SOCIAL MEDIA : Programming And Memes Want to contact Mr. Error ? CONTACT : [ema

Mr. Error 9 Jun 17, 2021
Labelling platform for text using distant supervision

With DataQA, you can label unstructured text documents using rule-based distant supervision.

245 Aug 05, 2022
Blazing fast language detection using fastText model

Luga A blazing fast language detection using fastText's language models Luga is a Swahili word for language. fastText provides a blazing fast language

Prayson Wilfred Daniel 18 Dec 20, 2022
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0 🤗 Transformers provides thousands of pretrained models to perform tasks o

Hugging Face 77.3k Jan 03, 2023
Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!

Auto-Research A no-code utility to generate a detailed well-cited survey with topic clustered sections (draft paper format) and other interesting arti

Sidharth Pal 20 Dec 14, 2022