PyTorch implementation of Densely Connected Time Delay Neural Network

Overview

Densely Connected Time Delay Neural Network

PyTorch implementation of Densely Connected Time Delay Neural Network (D-TDNN) in our paper "Densely Connected Time Delay Neural Network for Speaker Verification" (INTERSPEECH 2020).

What's New ⚠️

  • [2021-02-14] We add an impl option in TimeDelay, now you can choose:

    • 'conv': implement TDNN by F.conv1d.
    • 'linear': implement TDNN by F.unfold and F.linear.

    Check this commit for more information. Note the pre-trained models of 'conv' have not been uploaded yet.

  • [2021-02-04] TDNN (default implementation) in this repo is slower than nn.Conv1d, but we adopted it because:

    • TDNN in this repo was also used to create F-TDNN models that are not perfectly supported by nn.Conv1d (asymmetric paddings).
    • nn.Conv1d(dilation>1, bias=True) is slow in training.

    However, we do not use F-TDNN here, and we always set bias=False in D-TDNN. So, we are considering uploading a new version of TDNN soon (2021-02-14 updated).

  • [2021-02-01] Our new paper is accepted by ICASSP 2021.

    Y.-Q. Yu, S. Zheng, H. Suo, Y. Lei, and W.-J. Li, "CAM: Context-Aware Masking for Robust Speaker Verification"

    CAM outperforms statistics-and-selection (SS) in terms of speed and accuracy.

Pretrained Models

We provide the pretrained models which can be used in many tasks such as:

  • Speaker Verification
  • Speaker-Dependent Speech Separation
  • Multi-Speaker Text-to-Speech
  • Voice Conversion

D-TDNN & D-TDNN-SS

Usage

Data preparation

You can either use Kaldi toolkit:

  • Download VoxCeleb1 test set and unzip it.
  • Place prepare_voxceleb1_test.sh under $kaldi_root/egs/voxceleb/v2 and change the $datadir and $voxceleb1_root in it.
  • Run chmod +x prepare_voxceleb1_test.sh && ./prepare_voxceleb1_test.sh to generate 30-dim MFCCs.
  • Place the trials under $datadir/test_no_sil.

Or checkout the kaldifeat branch if you do not want to install Kaldi.

Test

  • Download the pretrained D-TDNN model and run:
python evaluate.py --root $datadir/test_no_sil --model D-TDNN --checkpoint dtdnn.pth --device cuda

Evaluation

VoxCeleb1-O

Model Emb. Params (M) Loss Backend EER (%) DCF_0.01 DCF_0.001
TDNN 512 4.2 Softmax PLDA 2.34 0.28 0.38
E-TDNN 512 6.1 Softmax PLDA 2.08 0.26 0.41
F-TDNN 512 12.4 Softmax PLDA 1.89 0.21 0.29
D-TDNN 512 2.8 Softmax Cosine 1.81 0.20 0.28
D-TDNN-SS (0) 512 3.0 Softmax Cosine 1.55 0.20 0.30
D-TDNN-SS 512 3.5 Softmax Cosine 1.41 0.19 0.24
D-TDNN-SS 128 3.1 AAM-Softmax Cosine 1.22 0.13 0.20

Citation

If you find D-TDNN helps your research, please cite

@inproceedings{DBLP:conf/interspeech/YuL20,
  author    = {Ya-Qi Yu and
               Wu-Jun Li},
  title     = {Densely Connected Time Delay Neural Network for Speaker Verification},
  booktitle = {Annual Conference of the International Speech Communication Association (INTERSPEECH)},
  pages     = {921--925},
  year      = {2020}
}

Revision of the Paper ⚠️

References:

[16] X. Li, W. Wang, X. Hu, and J. Yang, "Selective Kernel Networks," in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 510-519.

Comments
  • size mismatch while loading pre-trained weights

    size mismatch while loading pre-trained weights

    RuntimeError: Error(s) in loading state_dict for DTDNN: Missing key(s) in state_dict: "xvector.tdnn.linear.bias", "xvector.dense.linear.bias". size mismatch for xvector.tdnn.linear.weight: copying a param with shape torch.Size([128, 30, 5]) from checkpoint, the shape in current model is torch.Size([128, 150]). size mismatch for xvector.block1.tdnnd1.linear1.weight: copying a param with shape torch.Size([128, 128, 1]) from checkpoint, the shape in current model is torch.Size([128, 128]). size mismatch for xvector.block1.tdnnd1.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block1.tdnnd2.linear1.weight: copying a param with shape torch.Size([128, 192, 1]) from checkpoint, the shape in current model is torch.Size([128, 192]). size mismatch for xvector.block1.tdnnd2.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block1.tdnnd3.linear1.weight: copying a param with shape torch.Size([128, 256, 1]) from checkpoint, the shape in current model is torch.Size([128, 256]). size mismatch for xvector.block1.tdnnd3.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block1.tdnnd4.linear1.weight: copying a param with shape torch.Size([128, 320, 1]) from checkpoint, the shape in current model is torch.Size([128, 320]). size mismatch for xvector.block1.tdnnd4.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block1.tdnnd5.linear1.weight: copying a param with shape torch.Size([128, 384, 1]) from checkpoint, the shape in current model is torch.Size([128, 384]). size mismatch for xvector.block1.tdnnd5.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block1.tdnnd6.linear1.weight: copying a param with shape torch.Size([128, 448, 1]) from checkpoint, the shape in current model is torch.Size([128, 448]). size mismatch for xvector.block1.tdnnd6.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.transit1.linear.weight: copying a param with shape torch.Size([256, 512, 1]) from checkpoint, the shape in current model is torch.Size([256, 512]). size mismatch for xvector.block2.tdnnd1.linear1.weight: copying a param with shape torch.Size([128, 256, 1]) from checkpoint, the shape in current model is torch.Size([128, 256]). size mismatch for xvector.block2.tdnnd1.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd2.linear1.weight: copying a param with shape torch.Size([128, 320, 1]) from checkpoint, the shape in current model is torch.Size([128, 320]). size mismatch for xvector.block2.tdnnd2.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd3.linear1.weight: copying a param with shape torch.Size([128, 384, 1]) from checkpoint, the shape in current model is torch.Size([128, 384]). size mismatch for xvector.block2.tdnnd3.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd4.linear1.weight: copying a param with shape torch.Size([128, 448, 1]) from checkpoint, the shape in current model is torch.Size([128, 448]). size mismatch for xvector.block2.tdnnd4.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd5.linear1.weight: copying a param with shape torch.Size([128, 512, 1]) from checkpoint, the shape in current model is torch.Size([128, 512]). size mismatch for xvector.block2.tdnnd5.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd6.linear1.weight: copying a param with shape torch.Size([128, 576, 1]) from checkpoint, the shape in current model is torch.Size([128, 576]). size mismatch for xvector.block2.tdnnd6.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd7.linear1.weight: copying a param with shape torch.Size([128, 640, 1]) from checkpoint, the shape in current model is torch.Size([128, 640]). size mismatch for xvector.block2.tdnnd7.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd8.linear1.weight: copying a param with shape torch.Size([128, 704, 1]) from checkpoint, the shape in current model is torch.Size([128, 704]). size mismatch for xvector.block2.tdnnd8.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd9.linear1.weight: copying a param with shape torch.Size([128, 768, 1]) from checkpoint, the shape in current model is torch.Size([128, 768]). size mismatch for xvector.block2.tdnnd9.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd10.linear1.weight: copying a param with shape torch.Size([128, 832, 1]) from checkpoint, the shape in current model is torch.Size([128, 832]). size mismatch for xvector.block2.tdnnd10.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd11.linear1.weight: copying a param with shape torch.Size([128, 896, 1]) from checkpoint, the shape in current model is torch.Size([128, 896]). size mismatch for xvector.block2.tdnnd11.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.block2.tdnnd12.linear1.weight: copying a param with shape torch.Size([128, 960, 1]) from checkpoint, the shape in current model is torch.Size([128, 960]). size mismatch for xvector.block2.tdnnd12.linear2.weight: copying a param with shape torch.Size([64, 128, 3]) from checkpoint, the shape in current model is torch.Size([64, 384]). size mismatch for xvector.transit2.linear.weight: copying a param with shape torch.Size([512, 1024, 1]) from checkpoint, the shape in current model is torch.Size([512, 1024]). size mismatch for xvector.dense.linear.weight: copying a param with shape torch.Size([512, 1024, 1]) from checkpoint, the shape in current model is torch.Size([512, 1024]).

    opened by zabir-nabil 3
  • 实验细节的疑问

    实验细节的疑问

    您好: 我想教下您的论文中,实验的实现细节: 1.实验数据:我看很多其他论文都是使用voxceleb2 dev 5994说话人作为训练集(或者voxceleb dev+voxceleb2 dev,1211+5994说话人),您有只在这部分说话人上的实验结果吗?方便透露下嘛?

    2.PLDA和Cosine Similarity:您这里实验比较这两个的EER在TDNN中是提取的是倒数第二层(分类器前一层)还是第三层(xvector)的输出啊?因为我在论文中又看到,这两个不同层embedding对不同方法性能有差异,倒数第二层的cosine方法可能会更好一些。

    Thanks!🙏

    opened by Wenhao-Yang 1
  • questions about model training

    questions about model training

    hello, yuyq96, Thank you so much for the great work you've shared. I learned that D-TDNNSS mini-batch setting 128 from D-TDNN paper. But this model is too large to train on single gpu. Could you tell me how you train it? Using nn.Parallel or DDP? Looking forward to you reply

    opened by forwiat 2
  • the difference between kaldifeat-kaldi and kaldifeat-python?

    the difference between kaldifeat-kaldi and kaldifeat-python?

    May I ask you the numerical difference between kaldifeat by kaldi implementation and kaldifeat by your python implementation? I have compared the two computed features, and I find it has some difference. I wonder that the experiment results showed in D-TDNN master and D-TDNN-kaldifeat branch is absolutely the same.

    Thanks~

    opened by mezhou 4
  • 针对论文的一些疑问

    针对论文的一些疑问

    您好,我觉得您的工作-DTDNN,在参数比较少的情况下获得了较ETDNN,FTDNN更好的结果,我认为这非常有意义。但是我对论文的实验存在两处疑惑: 1、论文中Table5中,基于softmax训练的D-TDNN模型Cosine的结果好于PLDA,在上面的TDNN,ETDNN,FTDNN的结果不一致(均是PLDA好于Cosine),请问这是什么原因导致的? 2、对于null branch,能稍微解释一下吗?

    opened by xuanjihe 10
Releases(trials)
Owner
Ya-Qi Yu
Machine Learning
Ya-Qi Yu
Wordle-solver - Wordle answer generation program in python

🟨 Wordle Solver 🟩 Wordle answer generation program in python ✔️ Requirements U

Dahyun Kang 4 May 28, 2022
Using LSTM write Tang poetry

本教程将通过一个示例对LSTM进行介绍。通过搭建训练LSTM网络,我们将训练一个模型来生成唐诗。本文将对该实现进行详尽的解释,并阐明此模型的工作方式和原因。并不需要过多专业知识,但是可能需要新手花一些时间来理解的模型训练的实际情况。为了节省时间,请尽量选择GPU进行训练。

56 Dec 15, 2022
BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting

BOVText: A Large-Scale, Bilingual Open World Dataset for Video Text Spotting Updated on December 10, 2021 (Release all dataset(2021 videos)) Updated o

weijiawu 47 Dec 26, 2022
Small repo describing how to use Hugging Face's Wav2Vec2 with PyCTCDecode

🤗 Transformers Wav2Vec2 + PyCTCDecode Introduction This repo shows how 🤗 Transformers can be used in combination with kensho-technologies's PyCTCDec

Patrick von Platen 102 Oct 22, 2022
Research on controller area network Intrusion Detection Systems

Group members information Member 1: Lixue Liang Member 2: Yuet Lee Chan Member 3: Xinruo Zhang Member 4: Yifei Han User Manual Generate Attack Packets

Roche 4 Aug 30, 2022
Code for one-stage adaptive set-based HOI detector AS-Net.

AS-Net Code for one-stage adaptive set-based HOI detector AS-Net. Mingfei Chen*, Yue Liao*, Si Liu, Zhiyuan Chen, Fei Wang, Chen Qian. "Reformulating

Mingfei Chen 45 Dec 09, 2022
Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

Softlearning Softlearning is a deep reinforcement learning toolbox for training maximum entropy policies in continuous domains. The implementation is

Robotic AI & Learning Lab Berkeley 997 Dec 30, 2022
Simple sinc interpolation in PyTorch.

Kazane: simple sinc interpolation for 1D signal in PyTorch Kazane utilize FFT based convolution to provide fast sinc interpolation for 1D signal when

Chin-Yun Yu 10 May 03, 2022
An algorithm study of the 6th iOS 10 set of Boost Camp Web Mobile

알고리즘 스터디 🔥 부스트캠프 웹모바일 6기 iOS 10조의 알고리즘 스터디 입니다. 개인적인 사정 등으로 S034, S055만 참가하였습니다. 스터디 목적 상진: 코테 합격 + 부캠끝나고 아침에 일어나기 위해 필요한 사이클 기완: 꾸준하게 자리에 앉아 공부하기 +

2 Jan 11, 2022
Python with OpenCV - MediaPip Framework Hand Detection

Python HandDetection Python with OpenCV - MediaPip Framework Hand Detection Explore the docs » Contact Me About The Project It is a Computer vision pa

2 Jan 07, 2022
Code to reproduce the results in the paper "Tensor Component Analysis for Interpreting the Latent Space of GANs".

Tensor Component Analysis for Interpreting the Latent Space of GANs [ paper | project page ] Code to reproduce the results in the paper "Tensor Compon

James Oldfield 4 Jun 17, 2022
FADNet++: Real-Time and Accurate Disparity Estimation with Configurable Networks

FADNet++: Real-Time and Accurate Disparity Estimation with Configurable Networks

HKBU High Performance Machine Learning Lab 6 Nov 18, 2022
LIAO Shuiying 6 Dec 01, 2022
Neural network chess engine trained on Gary Kasparov's games.

Neural Chess It's not the best chess engine, but it is a chess engine. Proof of concept neural network chess engine (feed-forward multi-layer perceptr

3 Jun 22, 2022
Churn prediction

Churn-prediction Churn-prediction Data preprocessing:: Label encoder is used to normalize the categorical variable Data Transformation:: For each data

1 Sep 28, 2022
Multi-Person Extreme Motion Prediction

Multi-Person Extreme Motion Prediction Implementation for paper Wen Guo, Xiaoyu Bie, Xavier Alameda-Pineda, Francesc Moreno-Noguer, Multi-Person Extre

GUO-W 38 Nov 15, 2022
Aggragrating Nested Transformer Official Jax Implementation

NesT is a simple method, which aggragrates nested local transformers on image blocks. The idea makes vision transformers attain better accuracy, data efficiency, and convergence on the ImageNet bench

Google Research 169 Dec 20, 2022
SimBERT升级版(SimBERTv2)!

RoFormer-Sim RoFormer-Sim,又称SimBERTv2,是我们之前发布的SimBERT模型的升级版。 介绍 https://kexue.fm/archives/8454 训练 tensorflow 1.14 + keras 2.3.1 + bert4keras 0.10.6 下载

318 Dec 31, 2022
Deep Markov Factor Analysis (NeurIPS2021)

Deep Markov Factor Analysis (DMFA) Codes and experiments for deep Markov factor analysis (DMFA) model accepted for publication at NeurIPS2021: A. Farn

Sarah Ostadabbas 2 Dec 16, 2022
Official implementation of the Neurips 2021 paper Searching Parameterized AP Loss for Object Detection.

Parameterized AP Loss By Chenxin Tao, Zizhang Li, Xizhou Zhu, Gao Huang, Yong Liu, Jifeng Dai This is the official implementation of the Neurips 2021

46 Jul 06, 2022