Unofficial implement with paper SpeakerGAN: Speaker identification with conditional generative adversarial network

Last update: Jan 03, 2023

Related tags

Overview

Introduction

This repository is about paper SpeakerGAN , and is unofficially implemented by Mingming Huang ([email protected]), Tiezheng Wang ([email protected]) and thanks for advice from TongFeng.

SpeakerGAN paper

SpeakerGAN: Speaker identification with conditional generative adversarial network， by Liyang Chen , Yifeng Liu , Wendong Xiao , Yingxue Wang ,Haiyong Xie.

Usage

For train / test / generate:

python speakergan.py

You may need to change the path of wav vad preprocessed files.

Our results

acc: 94.27% with random sampled testset. 

acc: 93.21% with fixed start sampled testset.

using model file: model/49_D.pkl

acc: 98.44% on training classification accuracy with real samples.

There is about 4% gap on testset lower compared to paper result. We can't find out the reason. We want your help !

Details of paper

The following are details about this paper.

================ input ==================

feature: fbank, 8000hz, 25ms frame, 10ms overlap. shape:(160,64)
dataset: librispeech-100 train-clean-100 POI:251
data preprocess: vad、mean and variance normalization, shuffled.
60% train. 40% test.

================ model architecture ==================

dataflow: data -> feature extraction -> G & D
model architecture:

G: gated CNN, encoder-decoder, Huber loss + adversarial loss

D: ResnetBlocks, template average pooling, FC, softmax, crossentropy loss + adversarial loss
G: shuffler layer, GLU
D: ReLU

================ training ==================

lr: 0-9, 0.0005 | 9-49, 0.0002
L(d): λ1 λ2 = 1
batch_size: 64
D_train steps / G_train steps = 4
Ladv Loss: Label smoothing, 1 -> 0.7 ~ 1.0, 0 -> 0 ~ 0.3

======== not sure or differences with paper ========

weights,bias initialize function, use: xavier_uniform and zeros
pytorch huber_loss： + 0.5 to be same with paper. but no implement here.
for shorter wav, paper: padded with zero. we: padded with feature again.
gated cnn architecture.
we use webrtcvad mode(3) for vad preprocess.

Unofficial implement with paper SpeakerGAN: Speaker identification with conditional generative adversarial network

Related tags

Overview

Introduction

SpeakerGAN paper

Usage

Our results

Details of paper

Owner

Open-source code for Generic Grouping Network (GGN, CVPR 2022)

A crossplatform menu bar application using mpv as DLNA Media Renderer.

SAS output to EXCEL converter for Cornell/MIT Language and acquisition lab

Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

KE-Dialogue: Injecting knowledge graph into a fully end-to-end dialogue system.

ScaleNet: A Shallow Architecture for Scale Estimation

A Vision Transformer approach that uses concatenated query and reference images to learn the relationship between query and reference images directly.

(CVPR 2022) Energy-based Latent Aligner for Incremental Learning

Gender Classification Machine Learning Model using Sk-learn in Python with 97%+ accuracy and deployment

Pytorch Implementation of "Contrastive Representation Learning for Exemplar-Guided Paraphrase Generation"

[CVPR 2021] Counterfactual VQA: A Cause-Effect Look at Language Bias

Final project for machine learning (CSC 590). Detection of hepatitis C and progression through blood samples.

Learning to Identify Top Elo Ratings with A Dueling Bandits Approach

Python package for covariance matrices manipulation and Biosignal classification with application in Brain Computer interface

Implementation of "Glancing Transformer for Non-Autoregressive Neural Machine Translation"

[UNMAINTAINED] Automated machine learning for analytics & production

Pytorch0.4.1 codes for InsightFace

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

Materials for upcoming beginner-friendly PyTorch course (work in progress).

A simple code to convert image format and channel as well as resizing and renaming multiple images.