This repository contains a set of codes to run (i.e., train, perform inference with, evaluate) a diarization method called EEND-vector-clustering.

Overview

EEND-vector clustering

The EEND-vector clustering (End-to-End-Neural-Diarization-vector clustering) is a speaker diarization framework that integrates two complementary major diarization approaches, i.e., traditional clustering-based and emerging end-to-end neural network-based approaches, to make the best of both worlds. In [1] it is shown that the EEND-vector clustering outperforms EEND when the recording is long (e.g., more than 5 min), while in [2] it is shown based on CALLHOME data that it outperforms x-vector clustering and EEND-EDA especially when the number of speakers in recordings is large.

This repository contains an example implementation of the EEND-vector clustering based on Pytorch to reproduce the results in [2], i.e., the CALLHOME experiments. For the trainer, we use Padertorch. This repository is implemented based on EEND and relies on some useful functions provided therein.

References

[1] Keisuke Kinoshita, Marc Delcroix, and Naohiro Tawara, "Integrating end-to-end neural and clustering-based diarization: Getting the best of both worlds," Proc. ICASSP, pp. 7198–7202, 2021

[2] Keisuke Kinoshita, Marc Delcroix, and Naohiro Tawara, "Advances in integration of end-to-end neural and clustering-based diarization for real conversational speech," Proc. Interspeech, 2021 (to appear)

Citation

@inproceedings{eend-vector-clustering,
 author = {Keisuke Kinoshita and Marc Delcroix and Naohiro Tawara},
 title = {Integrating End-to-End Neural and Clustering-Based Diarization: Getting the Best of Both Worlds},
 booktitle = {{ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}},
 pages={7198-7202}
 year = {2021}
}

Install tools

Requirements

  • NVIDIA CUDA GPU
  • CUDA Toolkit (version == 9.2, 10.1 or 10.2)

Install kaldi and python environment

cd tools
make
  • This command builds kaldi at tools/kaldi
    • if you want to use pre-build kaldi
      cd tools
      make KALDI=<existing_kaldi_root>
      This option make a symlink at tools/kaldi
  • This command extracts miniconda3 at tools/miniconda3, and creates conda envirionment named 'eend'
  • Then, installs Pytorch and Padertorch into 'eend' environment
  • Then, clones EEND to reference symbolic links stored under eend/, egs/ and utils/

Test recipe (mini_librispeech)

Configuration

  • Modify egs/mini_librispeech/v1/cmd.sh according to your job schedular. If you use your local machine, use "run.pl" (default). If you use Grid Engine, use "queue.pl" If you use SLURM, use "slurm.pl". For more information about cmd.sh see http://kaldi-asr.org/doc/queue.html.

Run data preparation, training, inference, and scoring

cd egs/mini_librispeech/v1
CUDA_VISIBLE_DEVICES=0 ./run.sh
  • See RESULT.md and compare with your result.

CALLHOME experiment

Configuraition

  • Modify egs/callhome/v1/cmd.sh according to your job schedular. If you use your local machine, use "run.pl" (default). If you use Grid Engine, use "queue.pl" If you use SLURM, use "slurm.pl". For more information about cmd.sh see http://kaldi-asr.org/doc/queue.html.

Run data preparation, training, inference, and scoring

cd egs/callhome/v1
CUDA_VISIBLE_DEVICES=0 ./run.sh --db_path <db_path>
# <db_path> means absolute path of the directory where the necessary LDC corpora are stored.
  • See RESULT.md and compare with your result.
  • If you want to run multi-GPU training, simply set CUDA_VISIBLE_DEVICES appropriately. This environment variable may be automatically set by your job schedular such as SLURM.
Code of paper: "DropAttack: A Masked Weight Adversarial Training Method to Improve Generalization of Neural Networks"

DropAttack: A Masked Weight Adversarial Training Method to Improve Generalization of Neural Networks Abstract: Adversarial training has been proven to

倪仕文 (Shiwen Ni) 58 Nov 10, 2022
SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

SE3 Pose Interpolation Pose estimated from SLAM system are always discrete, and

Ran Cheng 4 Dec 15, 2022
Repository for the "Gotta Go Fast When Generating Data with Score-Based Models" paper

Gotta Go Fast When Generating Data with Score-Based Models This repo contains the official implementation for the paper Gotta Go Fast When Generating

Alexia Jolicoeur-Martineau 89 Nov 09, 2022
Generic template to bootstrap your PyTorch project with PyTorch Lightning, Hydra, W&B, and DVC.

NN Template Generic template to bootstrap your PyTorch project. Click on Use this Template and avoid writing boilerplate code for: PyTorch Lightning,

Luca Moschella 520 Dec 30, 2022
Code for Understanding Pooling in Graph Neural Networks

Select, Reduce, Connect This repository contains the code used for the experiments of: "Understanding Pooling in Graph Neural Networks" Setup Install

Daniele Grattarola 37 Dec 13, 2022
Open-Ended Commonsense Reasoning (NAACL 2021)

Open-Ended Commonsense Reasoning Quick links: [Paper] | [Video] | [Slides] | [Documentation] This is the repository of the paper, Differentiable Open-

(Bill) Yuchen Lin 31 Oct 19, 2022
Implements an infinite sum of poisson-weighted convolutions

An infinite sum of Poisson-weighted convolutions Kyle Cranmer, Aug 2018 If viewing on GitHub, this looks better with nbviewer: click here Consider a v

Kyle Cranmer 26 Dec 07, 2022
Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers Results results on COCO val Backbone Method Lr Schd PQ Config Download

155 Dec 20, 2022
Official implementation of our CVPR2021 paper "OTA: Optimal Transport Assignment for Object Detection" in Pytorch.

OTA: Optimal Transport Assignment for Object Detection This project provides an implementation for our CVPR2021 paper "OTA: Optimal Transport Assignme

217 Jan 03, 2023
Unbalanced Feature Transport for Exemplar-based Image Translation (CVPR 2021)

UNITE and UNITE+ Unbalanced Feature Transport for Exemplar-based Image Translation (CVPR 2021) Unbalanced Intrinsic Feature Transport for Exemplar-bas

Fangneng Zhan 183 Nov 09, 2022
Few-shot Relation Extraction via Bayesian Meta-learning on Relation Graphs

Few-shot Relation Extraction via Bayesian Meta-learning on Relation Graphs This is an implemetation of the paper Few-shot Relation Extraction via Baye

MilaGraph 36 Nov 22, 2022
Synthetic Scene Text from 3D Engines

Introduction UnrealText is a project that synthesizes scene text images using 3D graphics engine. This repository accompanies our paper: UnrealText: S

Shangbang Long 215 Dec 29, 2022
CTF challenges from redpwnCTF 2021

redpwnCTF 2021 Challenges This repository contains challenges from redpwnCTF 2021 in the rCDS format; challenge information is in the challenge.yaml f

redpwn 27 Dec 07, 2022
Facestar dataset. High quality audio-visual recordings of human conversational speech.

Facestar Dataset Description Existing audio-visual datasets for human speech are either captured in a clean, controlled environment but contain only a

Meta Research 87 Dec 21, 2022
The Official PyTorch Implementation of "VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models" (ICLR 2021 spotlight paper)

Official PyTorch implementation of "VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models" (ICLR 2021 Spotlight Paper) Zhisheng

NVIDIA Research Projects 45 Dec 26, 2022
This is the official implementation of our proposed SwinMR

SwinMR This is the official implementation of our proposed SwinMR: Swin Transformer for Fast MRI Please cite: @article{huang2022swin, title={Swi

A Yang Lab (led by Dr Guang Yang) 27 Nov 17, 2022
Collective Multi-type Entity Alignment Between Knowledge Graphs (WWW'20)

CG-MuAlign A reference implementation for "Collective Multi-type Entity Alignment Between Knowledge Graphs", published in WWW 2020. If you find our pa

Bran Zhu 28 Dec 11, 2022
Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX.

ONNX Object Localization Network Python scripts performing class agnostic object localization using the Object Localization Network model in ONNX. Ori

Ibai Gorordo 15 Oct 14, 2022
Weakly Supervised Scene Text Detection using Deep Reinforcement Learning

Weakly Supervised Scene Text Detection using Deep Reinforcement Learning This repository contains the setup for all experiments performed in our Paper

Emanuel Metzenthin 3 Dec 16, 2022
YOLOPのPythonでのONNX推論サンプル

YOLOP-ONNX-Video-Inference-Sample YOLOPのPythonでのONNX推論サンプルです。 ONNXモデルは、hustvl/YOLOP/weights を使用しています。 Requirement OpenCV 3.4.2 or later onnxruntime 1.

KazuhitoTakahashi 8 Sep 05, 2022