SpeechNAS Better Trade off between Latency and Accuracy for Large Scale Speaker Verification

Last update: May 20, 2022

Related tags

Deep Learning speechnas

Overview

speechnas

SpeechNAS-Better-Trade-off-between-Latency-and-Accuracy-for-Large-Scale-Speaker-Verification

ASRU 2021 IEEE Automatic Speech Recognition and Understanding

If this repository is useful to you, please cite our work properly. Thank you!

SpeechNAS-Better-Trade-off-between-Latency-and-Accuracy-for-Large-Scale-Speaker-Verification, ASRU 2021.

Environment

Set up the environment for the reposity by

PyTorch 1.7+

Check configuration

Check configuration in ./config/

inference

bash metric/metric_eer/auto_run.sh

Recently, x-vector has been a successful and popular approach for speaker verification, which employs a time delay neural network (TDNN) and statistics pooling to extract speaker characterizing embedding from variable-length utterances. Improvement upon the x-vector has been an active research area, and enormous neural networks have been elaborately designed based on the x-vector, eg, extended TDNN (E-TDNN), factorized TDNN (F-TDNN), and densely connected TDNN (D-TDNN). In this work, we try to identify the optimal architectures from a TDNN based search space employing neural architecture search (NAS), named SpeechNAS. Leveraging the recent advances in the speaker recognition, such as high-order statistics pooling, multi-branch mechanism, D-TDNN and angular additive margin softmax (AAM) loss with a minimum hyper-spherical energy (MHE), SpeechNAS automatically discovers five network architectures, from SpeechNAS-1 to SpeechNAS-5, of various numbers of parameters and GFLOPs on the large-scale text-independent speaker recognition dataset VoxCeleb1. Our derived best neural network achieves an equal error rate (EER) of 1.02% on the standard test set of VoxCeleb1, which surpasses previous TDNN based state-of-the-art approaches by a large margin.

SpeechNAS Better Trade off between Latency and Accuracy for Large Scale Speaker Verification

Related tags

Overview

speechnas

Environment

Check configuration

inference

Owner

Wentao Zhu

Segmentation Training Pipeline

MEDS: Enhancing Memory Error Detection for Large-Scale Applications

Funnels: Exact maximum likelihood with dimensionality reduction.

Emotional conditioned music generation using transformer-based model.

[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Parsing, analyzing, and comparing source code across many languages

SegNet including indices pooling for Semantic Segmentation with tensorflow and keras

Godot RL Agents is a fully Open Source packages that allows video game creators

Official implementation of SIGIR'2021 paper: "Sequential Recommendation with Graph Neural Networks".

Unofficial PyTorch Implementation for HifiFace (https://arxiv.org/abs/2106.09965)

Leaf: Multiple-Choice Question Generation

Neural Surface Maps

ONNX-GLPDepth - Python scripts for performing monocular depth estimation using the GLPDepth model in ONNX

Code for the paper "M2m: Imbalanced Classification via Major-to-minor Translation" (CVPR 2020)

LIMEcraft: Handcrafted superpixel selectionand inspection for Visual eXplanations

Code for our paper: Online Variational Filtering and Parameter Learning

Omniscient Video Super-Resolution

PyTorch wrappers for using your model in audacity!

Pytorch Implementation of Auto-Compressing Subset Pruning for Semantic Image Segmentation

v objective diffusion inference code for JAX.