Original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations"

Last update: Apr 30, 2022

Overview

Speaker-Embeddings-Correlation-Pooling

This is the original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations" by T. Stafylakis, J. Rohdin, and L. Burget (Interspeech 2021), a result of the collaboration between Omilia - Conversational Intelligence and Brno University of Technology (BUT), which you may find here.

The code is in TensorFlow1 (TF1) but it should work with TF2 too. I only provide the code for creating the network and the required hyperparameters. The training hyperparameters we used can be found in the paper.

The code is well-commented, at least the part and (hyper-)parameters required for the correlation pooling.

Apart from the experiments provided in the paper, the code allows the user to: (a) Combine standard statistics pooling with correlation pooling, by concatenating the two pooling layers into a single one, and (b) Extract correlation pooling from outputs of all 4 internal ResNet blocks (aka stages) and concatenate them in the pooling layer.

The code can be more efficiently written using tensor-only operators. However, to facilitate research we have implemented it using lists of tensors, e.g. after merging frequency bins to frequency ranges. Despite this inefficiency, we observe no differences between correlation pooling and standard stats pooling in training speed.

Start with the file train_resnet.py, which creates the ResNet (with the pooling mechanism) and sets its parameters. All parameters are set so that you reproduce our best performing experiment (P7 in the paper).

So, try it and let us know what you'll get! Themos

Original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations"

Related tags

Overview

Speaker-Embeddings-Correlation-Pooling

Owner

Themos Stafylakis

Must-read papers on improving efficiency for pre-trained language models.

Contains analysis of trends from Fitbit Dataset (source: Kaggle) to see how the trends can be applied to Bellabeat customers and Bellabeat products

Fake Shakespearean Text Generator

中文无监督SimCSE Pytorch实现

This project aims to conduct a text information retrieval and text mining on medical research publication regarding Covid19 - treatments and vaccinations.

This program do translate english words to portuguese

中文空间语义理解评测

Harvis is designed to automate your C2 Infrastructure.

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

A fast, efficient universal vector embedding utility package.

Smart discord chatbot integrated with Dialogflow

✨Fast Coreference Resolution in spaCy with Neural Networks

Mapping a variable-length sentence to a fixed-length vector using BERT model

Stand-alone language identification system

PUA Programming Language written in Python.

Chatbot with Pytorch, Python & Nextjs

Retraining OpenAI's GPT-2 on Discord Chats

Generate product descriptions, blogs, ads and more using GPT architecture with a single request to TextCortex API a.k.a Hemingwai

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents