Temporal Dynamic Convolutional Neural Network for Text-Independent Speaker Verification and Phonemetic Analysis

Overview

TDY-CNN for Text-Independent Speaker Verification

Official implementation of

  • Temporal Dynamic Convolutional Neural Network for Text-Independent Speaker Verification and Phonemetic Analysis
    by Seong-Hu Kim, Hyeonuk Nam, Yong-Hwa Park @ Human Lab, Mechanical Engineering Department, KAIST
    arXiv

Accepted paper in ICASSP 2022.

This code was written mainly with reference to VoxCeleb_trainer of paper 'In defence of metric learning for speaker recognition'.

Temporal Dynamic Convolutional Neural Network (TDY-CNN)

TDY-CNN efficiently applies adaptive convolution depending on time bins by changing the computation order as follows:

where x and y are input and output of TDY-CNN module which depends on frequency feature f and time feature t in time-frequency domain data. k-th basis kernel is convoluted with input and k-th bias is added. The results are aggregated using the attention weights which depends on time bins. K is the number of basis kernels, and σ is an activation function ReLU. The attention weight has a value between 0 and 1, and the sum of all basis kernels on a single time bin is 1 as the weights are processed by softmax.

Requirements and versions used

Python version of 3.7.10 is used with following libraries

  • pytorch == 1.8.1
  • pytorchaudio == 0.8.1
  • numpy == 1.19.2
  • scipy == 1.5.3
  • scikit-learn == 0.23.2

Dataset

We used VoxCeleb1 & 2 dataset in this paper. You can download the dataset by reffering to VoxCeleb1 and VoxCeleb1.

Training

You can train and save model in exps folder by running:

python trainSpeakerNet.py --model TDy_ResNet34_half --log_input True --encoder_type AVG --trainfunc softmaxproto --save_path exps/TDY_CNN_ResNet34 --nPerSpeaker 2 --batch_size 400

This implementation also provides accelerating training with distributed training and mixed precision training.

  • Use --distributed flag to enable distributed training and --mixedprec flag to enable mixed precision training.
    • GPU indices should be set before training : os.environ['CUDA_VISIBLE_DEVICES'] ='0,1,2,3' in trainSpeakernet.py.

Results:

Network #Parm EER (%) C_det (%)
TDY-VGG-M 71.2M 3.04 0.237
TDY-ResNet-34(×0.25) 13.3M 1.58 0.116
TDY-ResNet-34(×0.5) 51.9M 1.48 0.118

  • This result is low-dimensional t-SNE projection of frame-level speaker embed-dings of MHRM0 and FDAS1 using (a) baseline model ResNet-34(×0.25) and (b) TDY-ResNet-34(×0.25). Left column represents embeddings for different speakers, and right column represents em-beddings for different phoneme classes.

  • Embeddings by TDY-ResNet-34(×0.25) are closely gathered regardless of phoneme groups. It shows that the temporal dynamic model extracts consistent speaker information regardless of phonemes.

Pretrained models

There are pretrained models in folder pretrained_model.

For example, you can check 1.4786 of EER by running following script using TDY-ResNet-34(×0.5).

python trainSpeakerNet.py --eval --model TDy_ResNet34_half --log_input True --encoder_type AVG --trainfunc softmaxproto --save_path exps/test --eval_frames 400 --initial_model pretrained_model/pretrained_TDy_ResNet34_half.model

Citation

@article{kim2021tdycnn,
  title={Temporal Dynamic Convolutional Neural Network for Text-Independent Speaker Verification and Phonemetic Analysis},
  author={Kim, Seong-Hu and Nam, Hyeonuk and Park, Yong-Hwa},
  journal={arXiv preprint arXiv:2110.03213},
  year={2021}
}

Please contact Seong-Hu Kim at [email protected] for any query.

Owner
Seong-Hu Kim
Seong-Hu Kim
EEGEyeNet is benchmark to evaluate ET prediction based on EEG measurements with an increasing level of difficulty

Introduction EEGEyeNet EEGEyeNet is a benchmark to evaluate ET prediction based on EEG measurements with an increasing level of difficulty. Overview T

Ard Kastrati 23 Dec 22, 2022
Deep GPs built on top of TensorFlow/Keras and GPflow

GPflux Documentation | Tutorials | API reference | Slack What does GPflux do? GPflux is a toolbox dedicated to Deep Gaussian processes (DGP), the hier

Secondmind Labs 107 Nov 02, 2022
Unofficial PyTorch implementation of Attention Free Transformer (AFT) layers by Apple Inc.

aft-pytorch Unofficial PyTorch implementation of Attention Free Transformer's layers by Zhai, et al. [abs, pdf] from Apple Inc. Installation You can i

Rishabh Anand 184 Dec 12, 2022
GAN-STEM-Conv2MultiSlice - Exploring Generative Adversarial Networks for Image-to-Image Translation in STEM Simulation

GAN-STEM-Conv2MultiSlice GAN method to help covert lower resolution STEM images generated by convolution methods to higher resolution STEM images gene

UW-Madison Computational Materials Group 2 Feb 10, 2021
Keras + Hyperopt: A very simple wrapper for convenient hyperparameter optimization

This project is now archived. It's been fun working on it, but it's time for me to move on. Thank you for all the support and feedback over the last c

Max Pumperla 2.1k Jan 03, 2023
Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

Jiwoon Ahn 337 Dec 15, 2022
Recurrent Neural Network Tutorial, Part 2 - Implementing a RNN in Python and Theano

Please read the blog post that goes with this code! Jupyter Notebook Setup System Requirements: Python, pip (Optional) virtualenv To start the Jupyter

Denny Britz 863 Dec 15, 2022
Outlier Exposure with Confidence Control for Out-of-Distribution Detection

OOD-detection-using-OECC This repository contains the essential code for the paper Outlier Exposure with Confidence Control for Out-of-Distribution De

Nazim Shaikh 64 Nov 02, 2022
AlgoVision - A Framework for Differentiable Algorithms and Algorithmic Supervision

NeurIPS 2021 Paper "Learning with Algorithmic Supervision via Continuous Relaxations"

Felix Petersen 76 Jan 01, 2023
Tensorflow implementation of "Learning Deep Features for Discriminative Localization"

Weakly_detector Tensorflow implementation of "Learning Deep Features for Discriminative Localization" B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and

Taeksoo Kim 363 Jun 29, 2022
R-package accompanying the paper "Dynamic Factor Model for Functional Time Series: Identification, Estimation, and Prediction"

dffm The goal of dffm is to provide functionality to apply the methods developed in the paper “Dynamic Factor Model for Functional Time Series: Identi

Sven Otto 3 Dec 09, 2022
Various operations like path tracking, counting, etc by using yolov5

Object-tracing-with-YOLOv5 Various operations like path tracking, counting, etc by using yolov5

Pawan Valluri 5 Nov 28, 2022
Faster RCNN pytorch windows

Faster-RCNN-pytorch-windows Faster RCNN implementation with pytorch for windows Open cmd, compile this comands: cd lib python setup.py build develop T

Hwa-Rang Kim 1 Nov 11, 2022
Recurrent Scale Approximation (RSA) for Object Detection

Recurrent Scale Approximation (RSA) for Object Detection Codebase for Recurrent Scale Approximation for Object Detection in CNN published at ICCV 2017

Yu Liu (Louis) 239 Dec 28, 2022
PyTorch Implementation for AAAI'21 "Do Response Selection Models Really Know What's Next? Utterance Manipulation Strategies for Multi-turn Response Selection"

UMS for Multi-turn Response Selection Implements the model described in the following paper Do Response Selection Models Really Know What's Next? Utte

Taesun Whang 47 Nov 22, 2022
traiNNer is an open source image and video restoration (super-resolution, denoising, deblurring and others) and image to image translation toolbox based on PyTorch.

traiNNer traiNNer is an open source image and video restoration (super-resolution, denoising, deblurring and others) and image to image translation to

202 Jan 04, 2023
Improving Query Representations for DenseRetrieval with Pseudo Relevance Feedback:A Reproducibility Study.

APR The repo for the paper Improving Query Representations for DenseRetrieval with Pseudo Relevance Feedback:A Reproducibility Study. Environment setu

ielab 8 Nov 26, 2022
H&M Fashion Image similarity search with Weaviate and DocArray

H&M Fashion Image similarity search with Weaviate and DocArray This example shows how to do image similarity search using DocArray and Weaviate as Doc

Laura Ham 18 Aug 11, 2022
[PAMI 2020] Show, Match and Segment: Joint Weakly Supervised Learning of Semantic Matching and Object Co-segmentation

Show, Match and Segment: Joint Weakly Supervised Learning of Semantic Matching and Object Co-segmentation This repository contains the source code for

Yun-Chun Chen 60 Nov 25, 2022
Unofficial PyTorch implementation of MobileViT based on paper "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".

MobileViT RegNet Unofficial PyTorch implementation of MobileViT based on paper MOBILEVIT: LIGHT-WEIGHT, GENERAL-PURPOSE, AND MOBILE-FRIENDLY VISION TR

Hong-Jia Chen 91 Dec 02, 2022