Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

Last update: Jan 09, 2023

Related tags

Overview

UniSpeech

The family of UniSpeech:

UniSpeech (ICML 2021): Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

UniSpeech-SAT (ICASSP 2022 Submission): Universal Speech Representation Learning with Speaker Aware Pre-Training

Pre-trained models

We strongly suggest using our UniSpeech-SAT model for speaker related tasks, since it shows very powerful performance on various speaker related benchmarks.

Model	Dataset	Model
UniSpeech Base	1500 hrs CommonVoice	download
UniSpeech Large	1500 hrs CommonVoice	download
UniSpeech-SAT Base	960 hrs LibriSpeech	download
UniSpeech-SAT Base+	60k hrs Libri-Light + 10k hrs GigaSpeech + 24k hrs VoxPopuli	download
UniSpeech-SAT Large	60k hrs Libri-Light + 10k hrs GigaSpeech + 24k hrs VoxPopuli	download

License

This project is licensed under the license found in the LICENSE file in the root directory of this source tree. Portions of the source code are based on the FAIRSEQ project.

Microsoft Open Source Code of Conduct

Contact Information

For help or issues using UniSpeech models, please submit a GitHub issue.

For other communications related to UniSpeech, please contact Yu Wu ([email protected]).

Unified Pre-training for Self-Supervised Learning and Supervised Learning for ASR

Related tags

Overview

UniSpeech

Pre-trained models

License

Contact Information

Owner

Microsoft

Densely Connected Search Space for More Flexible Neural Architecture Search (CVPR2020)

Learning to Prompt for Vision-Language Models.

The codes and related files to reproduce the results for Image Similarity Challenge Track 2.

Official PyTorch Implementation of paper EAN: Event Adaptive Network for Efficient Action Recognition

This is an unofficial implementation of the paper “Student-Teacher Feature Pyramid Matching for Unsupervised Anomaly Detection”.

The official homepage of the (outdated) COCO-Stuff 10K dataset.

Reproduces the results of the paper "Finite Basis Physics-Informed Neural Networks (FBPINNs): a scalable domain decomposition approach for solving differential equations".

Drone-based Joint Density Map Estimation, Localization and Tracking with Space-Time Multi-Scale Attention Network

Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt

This library provides an abstraction to perform Model Versioning using Weight & Biases.

Aligning Latent and Image Spaces to Connect the Unconnectable

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Code in both PyTorch and TensorFlow

Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation

FaceAPI: AI-powered Face Detection & Rotation Tracking, Face Description & Recognition, Age & Gender & Emotion Prediction for Browser and NodeJS using TensorFlow/JS

Library to enable Bayesian active learning in your research or labeling work.

A PyTorch Implementation of "SINE: Scalable Incomplete Network Embedding" (ICDM 2018).

Neural Cellular Automata + CLIP

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation (NeurIPS2021 Benchmark and Dataset Track)

A PyTorch Implementation of SphereFace.

Pytorch implementation of MalConv