PyTorch implementation of Densely Connected Time Delay Neural Network

Overview

Densely Connected Time Delay Neural Network

PyTorch implementation of Densely Connected Time Delay Neural Network (D-TDNN) in our paper "Densely Connected Time Delay Neural Network for Speaker Verification" (INTERSPEECH 2020).

What's New ⚠️

  • [2021-02-14] We add an impl option in TimeDelay, now you can choose:

    • 'conv': implement TDNN by F.conv1d.
    • 'linear': implement TDNN by F.unfold and F.linear.

    Check this commit for more information. Note the pre-trained models of 'conv' have not been uploaded yet.

  • [2021-02-04] TDNN (default implementation) in this repo is slower than nn.Conv1d, but we adopted it because:

    • TDNN in this repo was also used to create F-TDNN models that are not perfectly supported by nn.Conv1d (asymmetric paddings).
    • nn.Conv1d(dilation>1, bias=True) is slow in training.

    However, we do not use F-TDNN here, and we always set bias=False in D-TDNN. So, we are considering uploading a new version of TDNN soon (2021-02-14 updated).

  • [2021-02-01] Our new paper is accepted by ICASSP 2021.

    Y.-Q. Yu, S. Zheng, H. Suo, Y. Lei, and W.-J. Li, "CAM: Context-Aware Masking for Robust Speaker Verification"

    CAM outperforms statistics-and-selection (SS) in terms of speed and accuracy.

Pretrained Models

We provide the pretrained models which can be used in many tasks such as:

  • Speaker Verification
  • Speaker-Dependent Speech Separation
  • Multi-Speaker Text-to-Speech
  • Voice Conversion

D-TDNN & D-TDNN-SS

Usage

Data preparation

You can either use Kaldi toolkit:

  • Download VoxCeleb1 test set and unzip it.
  • Place prepare_voxceleb1_test.sh under $kaldi_root/egs/voxceleb/v2 and change the $datadir and $voxceleb1_root in it.
  • Run chmod +x prepare_voxceleb1_test.sh && ./prepare_voxceleb1_test.sh to generate 30-dim MFCCs.
  • Place the trials under $datadir/test_no_sil.

Or checkout the kaldifeat branch if you do not want to install Kaldi.

Test

  • Download the pretrained D-TDNN model and run:
python evaluate.py --root $datadir/test_no_sil --model D-TDNN --checkpoint dtdnn.pth --device cuda

Evaluation

VoxCeleb1-O

Model Emb. Params (M) Loss Backend EER (%) DCF_0.01 DCF_0.001
TDNN 512 4.2 Softmax PLDA 2.34 0.28 0.38
E-TDNN 512 6.1 Softmax PLDA 2.08 0.26 0.41
F-TDNN 512 12.4 Softmax PLDA 1.89 0.21 0.29
D-TDNN 512 2.8 Softmax Cosine 1.81 0.20 0.28
D-TDNN-SS (0) 512 3.0 Softmax Cosine 1.55 0.20 0.30
D-TDNN-SS 512 3.5 Softmax Cosine 1.41 0.19 0.24
D-TDNN-SS 128 3.1 AAM-Softmax Cosine 1.22 0.13 0.20

Citation

If you find D-TDNN helps your research, please cite

@inproceedings{DBLP:conf/interspeech/YuL20,
  author    = {Ya-Qi Yu and
               Wu-Jun Li},
  title     = {Densely Connected Time Delay Neural Network for Speaker Verification},
  booktitle = {Annual Conference of the International Speech Communication Association (INTERSPEECH)},
  pages     = {921--925},
  year      = {2020}
}

Revision of the Paper ⚠️

References:

[16] X. Li, W. Wang, X. Hu, and J. Yang, "Selective Kernel Networks," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 510-519.

Comments
  • size mismatch while loading pre-trained weights

    size mismatch while loading pre-trained weights

    RuntimeError: Error(s) in loading state_dict for DTDNN: Missing key(s) in state_dict: "xvector.tdnn.linear.bias", "xvector.dense.linear.bias". size mismatch for xvector.tdnn.linear.weight: copying a param with shape torch.Size([128, 30, 5]) from checkpoint, the shape in current model is torch.Size([128, 150]). size mismatch for xvector.block1.tdnnd1.linear1.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([128, 128]). size mismatch for xvector.block1.tdnnd1.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block1.tdnnd2.linear1.weight: copying a param with shape torch.Size([128, 192, 1]) from checkpoint, the shape in current model is torch.Size([128, 192]). size mismatch for xvector.block1.tdnnd2.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block1.tdnnd3.linear1.weight: copying a param with shape torch.Size([128, 256, 1]) from checkpoint, the shape in current model is torch.Size([128, 256]). size mismatch for xvector.block1.tdnnd3.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block1.tdnnd4.linear1.weight: copying a param with shape torch.Size([128, 320, 1]) from checkpoint, the shape in current model is torch.Size([128, 320]). size mismatch for xvector.block1.tdnnd4.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block1.tdnnd5.linear1.weight: copying a param with shape torch.Size([128, 384, 1]) from checkpoint, the shape in current model is torch.Size([128, 384]). size mismatch for xvector.block1.tdnnd5.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block1.tdnnd6.linear1.weight: copying a param with shape torch.Size([128, 448, 1]) from checkpoint, the shape in current model is torch.Size([128, 448]). size mismatch for xvector.block1.tdnnd6.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.transit1.linear.weight: copying a param with shape torch.Size([256, 512, 1]) from checkpoint, the shape in current model is torch.Size([256, 512]). size mismatch for xvector.block2.tdnnd1.linear1.weight: copying a param with shape torch.Size([128, 256, 1]) from checkpoint, the shape in current model is torch.Size([128, 256]). size mismatch for xvector.block2.tdnnd1.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd2.linear1.weight: copying a param with shape torch.Size([128, 320, 1]) from checkpoint, the shape in current model is torch.Size([128, 320]). size mismatch for xvector.block2.tdnnd2.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd3.linear1.weight: copying a param with shape torch.Size([128, 384, 1]) from checkpoint, the shape in current model is torch.Size([128, 384]). size mismatch for xvector.block2.tdnnd3.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd4.linear1.weight: copying a param with shape torch.Size([128, 448, 1]) from checkpoint, the shape in current model is torch.Size([128, 448]). size mismatch for xvector.block2.tdnnd4.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd5.linear1.weight: copying a param with shape torch.Size([128, 512, 1]) from checkpoint, the shape in current model is torch.Size([128, 512]). size mismatch for xvector.block2.tdnnd5.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd6.linear1.weight: copying a param with shape torch.Size([128, 576, 1]) from checkpoint, the shape in current model is torch.Size([128, 576]). size mismatch for xvector.block2.tdnnd6.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd7.linear1.weight: copying a param with shape torch.Size([128, 640, 1]) from checkpoint, the shape in current model is torch.Size([128, 640]). size mismatch for xvector.block2.tdnnd7.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd8.linear1.weight: copying a param with shape torch.Size([128, 704, 1]) from checkpoint, the shape in current model is torch.Size([128, 704]). size mismatch for xvector.block2.tdnnd8.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd9.linear1.weight: copying a param with shape torch.Size([128, 768, 1]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for xvector.block2.tdnnd9.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd10.linear1.weight: copying a param with shape torch.Size([128, 832, 1]) from checkpoint, the shape in current model is torch.Size([128, 832]). size mismatch for xvector.block2.tdnnd10.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd11.linear1.weight: copying a param with shape torch.Size([128, 896, 1]) from checkpoint, the shape in current model is torch.Size([128, 896]). size mismatch for xvector.block2.tdnnd11.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd12.linear1.weight: copying a param with shape torch.Size([128, 960, 1]) from checkpoint, the shape in current model is torch.Size([128, 960]). size mismatch for xvector.block2.tdnnd12.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.transit2.linear.weight: copying a param with shape torch.Size([512, 1024, 1]) from checkpoint, the shape in current model is torch.Size([512, 1024]). size mismatch for xvector.dense.linear.weight: copying a param with shape torch.Size([512, 1024, 1]) from checkpoint, the shape in current model is torch.Size([512, 1024]).

    opened by zabir-nabil 3
  • 实验细节的疑问

    实验细节的疑问

    您好: 我想教下您的论文中,实验的实现细节: 1.实验数据:我看很多其他论文都是使用voxceleb2 dev 5994说话人作为训练集(或者voxceleb dev+voxceleb2 dev,1211+5994说话人),您有只在这部分说话人上的实验结果吗?方便透露下嘛?

    2.PLDA和Cosine Similarity:您这里实验比较这两个的EER在TDNN中是提取的是倒数第二层(分类器前一层)还是第三层(xvector)的输出啊?因为我在论文中又看到,这两个不同层embedding对不同方法性能有差异,倒数第二层的cosine方法可能会更好一些。

    Thanks!🙏

    opened by Wenhao-Yang 1
  • questions about model training

    questions about model training

    hello, yuyq96, Thank you so much for the great work you've shared. I learned that D-TDNNSS mini-batch setting 128 from D-TDNN paper. But this model is too large to train on single gpu. Could you tell me how you train it? Using nn.Parallel or DDP? Looking forward to you reply

    opened by forwiat 2
  • the difference between kaldifeat-kaldi and kaldifeat-python?

    the difference between kaldifeat-kaldi and kaldifeat-python?

    May I ask you the numerical difference between kaldifeat by kaldi implementation and kaldifeat by your python implementation? I have compared the two computed features, and I find it has some difference. I wonder that the experiment results showed in D-TDNN master and D-TDNN-kaldifeat branch is absolutely the same.

    Thanks~

    opened by mezhou 4
  • 针对论文的一些疑问

    针对论文的一些疑问

    您好,我觉得您的工作-DTDNN,在参数比较少的情况下获得了较ETDNN,FTDNN更好的结果,我认为这非常有意义。但是我对论文的实验存在两处疑惑: 1、论文中Table5中,基于softmax训练的D-TDNN模型Cosine的结果好于PLDA,在上面的TDNN,ETDNN,FTDNN的结果不一致(均是PLDA好于Cosine),请问这是什么原因导致的? 2、对于null branch,能稍微解释一下吗?

    opened by xuanjihe 10
Releases(trials)
Owner
Ya-Qi Yu
Machine Learning
Ya-Qi Yu
E-Ink Magic Calendar that automatically syncs to Google Calendar and runs off a battery powered Raspberry Pi Zero

MagInkCal This repo contains the code needed to drive an E-Ink Magic Calendar that uses a battery powered (PiSugar2) Raspberry Pi Zero WH to retrieve

2.8k Dec 28, 2022
Match SafeGraph POIs with Data collected through a cultural resource survey in Washington DC.

Match SafeGraph POI data with Cultural Resource Places in Washington DC Match SafeGraph POIs with Data collected through a cultural resource survey in

Changjie Chen 1 Jan 05, 2022
Scalable training for dense retrieval models.

Scalable implementation of dense retrieval. Training on cluster By default it trains locally: PYTHONPATH=.:$PYTHONPATH python dpr_scale/main.py traine

Facebook Research 90 Dec 28, 2022
Model-free Vehicle Tracking and State Estimation in Point Cloud Sequences

Model-free Vehicle Tracking and State Estimation in Point Cloud Sequences 1. Introduction This project is for paper Model-free Vehicle Tracking and St

TuSimple 92 Jan 03, 2023
Fast and customizable reconnaissance workflow tool based on simple YAML based DSL.

Fast and customizable reconnaissance workflow tool based on simple YAML based DSL, with support of notifications and distributed workload of that work

Américo Júnior 3 Mar 11, 2022
Preprossing-loan-data-with-NumPy - In this project, I have cleaned and pre-processed the loan data that belongs to an affiliate bank based in the United States.

Preprossing-loan-data-with-NumPy In this project, I have cleaned and pre-processed the loan data that belongs to an affiliate bank based in the United

Dhawal Chitnavis 2 Jan 03, 2022
Code for "SRHEN: Stepwise-Refining Homography Estimation Network via Parsing Geometric Correspondences in Deep Latent Space"

SRHEN This is a better and simpler implementation for "SRHEN: Stepwise-Refining Homography Estimation Network via Parsing Geometric Correspondences in

1 Oct 28, 2022
A demo of how to use JAX to create a simple gravity simulation

JAX Gravity This repo contains a demo of how to use JAX to create a simple gravity simulation. It uses JAX's experimental ode package to solve the dif

Cristian Garcia 16 Sep 22, 2022
This repo contains the code for paper Inverse Weighted Survival Games

Inverse-Weighted-Survival-Games This repo contains the code for paper Inverse Weighted Survival Games instructions general loss function (--lfn) can b

3 Jan 12, 2022
Official repository for the paper "GN-Transformer: Fusing AST and Source Code information in Graph Networks".

GN-Transformer AST This is the official repository for the paper "GN-Transformer: Fusing AST and Source Code information in Graph Networks". Data Prep

Cheng Jun-Yan 10 Nov 26, 2022
Multi-objective gym environments for reinforcement learning.

MO-Gym: Multi-Objective Reinforcement Learning Environments Gym environments for multi-objective reinforcement learning (MORL). The environments follo

Lucas Alegre 74 Jan 03, 2023
HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset (ICCV 2021)

Code for HDR Video Reconstruction HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset (ICCV 2021) Guanying Chen, Cha

Guanying Chen 64 Nov 19, 2022
Deep Learning for Time Series Classification

Deep Learning for Time Series Classification This is the companion repository for our paper titled "Deep learning for time series classification: a re

Hassan ISMAIL FAWAZ 1.2k Jan 02, 2023
StarGAN - Official PyTorch Implementation (CVPR 2018)

StarGAN - Official PyTorch Implementation ***** New: StarGAN v2 is available at https://github.com/clovaai/stargan-v2 ***** This repository provides t

Yunjey Choi 5.1k Jan 04, 2023
Decision Transformer: A brand new Offline RL Pattern

DecisionTransformer_StepbyStep Intro Decision Transformer: A brand new Offline RL Pattern. 这是关于NeurIPS 2021 热门论文Decision Transformer的复现。 👍 原文地址: Deci

Irving 14 Nov 22, 2022
HandTailor: Towards High-Precision Monocular 3D Hand Recovery

HandTailor This repository is the implementation code and model of the paper "HandTailor: Towards High-Precision Monocular 3D Hand Recovery" (arXiv) G

Lv Jun 113 Jan 06, 2023
Evaluation suite for large-scale language models.

This repo contains code for running the evaluations and reproducing the results from the Jurassic-1 Technical Paper (see blog post), with current support for running the tasks through both the AI21 S

71 Dec 17, 2022
The codes I made while I practiced various TensorFlow examples

TensorFlow_Exercises The codes I made while I practiced various TensorFlow examples About the codes I didn't create these codes by myself, but re-crea

Terry Taewoong Um 614 Dec 08, 2022
Awesome Human Pose Estimation

Human Pose Estimation Related Publication

Zhe Wang 1.2k Dec 26, 2022
Original Implementation of Prompt Tuning from Lester, et al, 2021

Prompt Tuning This is the code to reproduce the experiments from the EMNLP 2021 paper "The Power of Scale for Parameter-Efficient Prompt Tuning" (Lest

Google Research 282 Dec 28, 2022