Auxiliary Raw Net (ARawNet) is a ASVSpoof detection model taking both raw waveform and handcrafted features as inputs, to balance the trade-off between performance and model complexity.

Last update: Jul 08, 2022

Related tags

Deep Learning AuxiliaryRawNet

Overview

This repository is an implementation of the Auxiliary Raw Net (ARawNet), which is ASVSpoof detection system taking both raw waveform and handcrafted features as inputs,to balance the trade-off between performance and model complexity. The paper can be checked here.

The model performance is tested on the ASVSpoof 2019 Dataset.

Setup

Environment

Show details

speechbrain==0.5.7
pandas
torch==1.9.1
torchaudio==0.9.1
nnAudio==0.2.6
ptflops==0.6.6

Create a conda environment with conda env create -f environment.yml.
Activate the conda environment with conda activate .

Data preprocessing

.
├── data                       
│   │
│   ├── PA                  
│   │   └── ...
│   └── LA           
│       ├── ASVspoof2019_LA_asv_protocols
│       ├── ASVspoof2019_LA_asv_scores
│       ├── ASVspoof2019_LA_cm_protocols
│       ├── ASVspoof2019_LA_train
│       ├── ASVspoof2019_LA_dev
│       
│
└── ARawNet

Download dataset. Our experiment is trained on the Logical access (LA) scenario of the ASVspoof 2019 dataset. Dataset can be downloaded here.
Unzip and save the data to a folder data in the same directory as ARawNet as shown in below.
Run python preprocess.py Or you can use our processed data directly under "/processed_data".

Train

python train_raw_net.py yaml/RawSNet.yaml --data_parallel_backend -data_parallel_count=2

Evaluate

python eval.py

Check Model Size and multiply-and-accumulates (MACs)

python check_model_size.py yaml/RawSNet.yaml

Model Performance

Accuracy metric

min t−DCF =min{βPcm (s)+Pcm(s)}

Explanations can be found here: t-DCF

Experiment Results

	Front-end	Main Encoder	E_A	EER	min-tDCF
Res2Net	Spec	Res2Net	-	8.783	0.2237
	LFCC		-	2.869	0.0786
	CQT		-	2.502	0.0743
Rawnet2	Raw waveforms	Rawnet2	-	5.13	0.1175
ARawNet	Mel-Spectrogram	XVector	✅	1.32	0.03894
			-	2.39320	0.06875
ARawNet	Mel-Spectrogram	ECAPA-TDNN	✅	1.39	0.04316
			-	2.11	0.06425
ARawNet	CQT	XVector	✅	1.74	0.05194
			-	3.39875	0.09510
ARawNet	CQT	ECAPA-TDNN	✅	1.11	0.03645
			-	1.72667	0.05077

Main Encoder	Auxiliary Encoder	Parameters	MACs
Rawnet2	-	25.43 M	7.61 GMac
Res2Net	-	0.92 M	1.11 GMac
XVector	✅	5.81 M	2.71 GMac
XVector	-	4.66M	1.88 GMac
ECAPA-TDNN	✅	7.18 M	3.19 GMac
ECAPA-TDNN	-	6.03M	2.36 GMac

Cite Our Paper

If you use this repository, please consider citing:

@inproceedings{Teng2021ComplementingHF, title={Complementing Handcrafted Features with Raw Waveform Using a Light-weight Auxiliary Model}, author={Zhongwei Teng and Quchen Fu and Jules White and M. Powell and Douglas C. Schmidt}, year={2021} }

@inproceedings{Fu2021FastAudioAL, title={FastAudio: A Learnable Audio Front-End for Spoof Speech Detection}, author={Quchen Fu and Zhongwei Teng and Jules White and M. Powell and Douglas C. Schmidt}, year={2021} }

Auxiliary Raw Net (ARawNet) is a ASVSpoof detection model taking both raw waveform and handcrafted features as inputs, to balance the trade-off between performance and model complexity.

Related tags

Overview

Overview

Setup

Environment

Data preprocessing

Train

Evaluate

Check Model Size and multiply-and-accumulates (MACs)

Model Performance

Accuracy metric

Experiment Results

Cite Our Paper

Owner

Official Repo of my work for SREC Nandyal Machine Learning Bootcamp

[ICME 2021 Oral] CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

Oscar and VinVL

Keras udrl - Keras implementation of Upside Down Reinforcement Learning

OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.

Emotional conditioned music generation using transformer-based model.

Python Single Object Tracking Evaluation

This repository contains a toolkit for collecting, labeling and tracking object keypoints

Single Image Deraining Using Bilateral Recurrent Network (TIP 2020)

Pretrained Pytorch face detection (MTCNN) and recognition (InceptionResnet) models

Diverse Object-Scene Compositions For Zero-Shot Action Recognition

Here we present the implementation in TensorFlow of our work about liver lesion segmentation accepted in the Machine Learning 4 Health Workshop

A Flow-based Generative Network for Speech Synthesis

Video Swin Transformer - PyTorch

Self-Supervised Generative Style Transfer for One-Shot Medical Image Segmentation

Official Implementation of DE-DETR and DELA-DETR in "Towards Data-Efficient Detection Transformers"

A Python library for unevenly-spaced time series analysis

Code for intrusion detection system (IDS) development using CNN models and transfer learning

nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation "

MagFace: A Universal Representation for Face Recognition and Quality Assessment