A complete speech segmentation system using Kaldi and x-vectors for voice activity detection (VAD) and speaker diarisation.

Overview

bbc-speech-segmenter: Voice Activity Detection & Speaker Diarization

A complete speech segmentation system using Kaldi and x-vectors for voice activity detection (VAD) and speaker diarisation.

The x-vector-vad system is described in the paper; Ogura, M. & Haynes, M. (2021) X-vector-vad for Multi-genre Broadcast Speech-to-text. The paper has been submitted to 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) and is currently under review as of June 2021.

Quickstart

$ docker pull bbcrd/bbc-speech-segmenter

# Test

$ docker run -w /wrk -v `pwd`:/wrk bbcrd/bbc-speech-segmenter ./test.sh

# Segmentation help

$ docker run bbcrd/bbc-speech-segmenter ./run-segmentation.sh --help
usage: run-segmentation.sh [options] input.wav input.stm output-dir

options:
  --nj NUM                 Maximum number of CPU cores to use
  --stage STAGE            Start from this stage
  --cluster-threshold THR  Cluster stopping criteria. Default: -0.3
  --vad-threshold THR      Xvector classifier threshold. Lower the number the
                           more speech segments shall be returned at the
                           expense of accuracy. Default: 0.2
  --vad-method             Filter segments on an individual or segment basis.
                           Default: individual
  --no-vad                 Skip xvector vad stages. Default: false
  --help                   Print this message

# Run segmentation (VAD + diarisation), results are in output-dir/diarize.stm

$ docker run -v `pwd`:/data bbcrd/bbc-speech-segmenter \
  ./run-segmentation.sh /data/audio.wav /data/audio.stm /data/output-dir

$ cat output-dir/diarize.stm
audio 0 audio_S00004 3.750 10.125 <speech>
audio 0 audio_S00003 10.125 13.687 <speech>
audio 0 audio_S00004 13.688 16.313 <speech>
...

# Train x-vector classifier

$ docker run -w /wrk/recipe -v `pwd`:/wrk bbcrd/bbc-speech-segmenter \
  local/xvector_utils.py data/bbc-vad-train/reference.stm            \
  data/bbc-vad-train/xvectors.ark new_model.pkl

# Evaluate x-vector classifier

$ docker run -w /wrk/recipe -v `pwd`:/wrk bbcrd/bbc-speech-segmenter \
  local/xvector_utils.py evaluate data/bbc-vad-eval/reference.stm    \
  data/bbc-vad-eval/xvectors.ark model/xvector-classifier.pkl

Audio & STM file format

In order to run the segmentation script you need your audio in 16Khz Mono WAV format. You also need an STM file describing the segments you want to apply voice activity detection and speaker diarization to.

For more information on the STM file format see XVECTOR_UTILS.md.

# Convert audio file to 16Khz mono wav

$ ffmpeg audio.mp3 -vn -ac 1 -ar 16000 audio.wav

# Create STM file for input

$ DURATION=$(ffprobe -i audio.wav -show_entries format=duration -v quiet -of csv="p=0")
$ DURATION=$(printf "%0.2f\n" $DURATION)

$ FILENAME=$(basename audio.wav)

$ echo "${FILENAME%.*} 0 ${FILENAME%.*} 0.00 $DURATION <label> _" > audio.stm

$ cat audio.stm
audio 0 audio 0.00 60.00 <label> _

Use Docker image to run code in local checkout

# Bulid Docker image

$ docker build -t bbc-speech-segmenter .

# Spin up a Docker container in an interactive mode

$ docker run -it -v `pwd`:/wrk bbc-speech-segmenter /bin/bash

# Inside a Docker container

$ cd /wrk/

# Run test

$ ./test.sh
All checks passed

Training and evaluation

X-vector utility

xvector_utils.py can be used to train and evaluate x-vector classifier, as well as o extract and visualize x-vectors. For more detailed information, see XVECTOR_UTILS.md.

The documentation also gives details on file formats such as ARK, SCP or STM, which are required to use this tool.

Run x-vector VAD training

Two files are required for x-vector-vad training:

  • Reference STM file
  • X-vectors ARK file

For example, from inside the Docker container:

$ cd /wrk/recipe

$ python3 local/xvector_utils.py train \
  data/bbc-vad-train/reference.stm     \
  data/bbc-vad-train/xvectors.ark      \
  new_model.pkl

The model will be saved as new_model.pkl.

Run x-vector VAD evaluation

Three files are needed in order to run VAD evaluation:

  • Reference STM file
  • X-vectors ARK file
  • x-vector-vad classifier model

For example, from inside the Docker container:

$ cd /wrk/recipe

$ python3 local/xvector_utils.py evaluate \
  data/bbc-vad-eval/reference.stm        \
  data/bbc-vad-eval/xvectors.ark         \
  model/xvector-classifier.pkl

WebRTC baseline

The code for the baseline WebRTC system referenced in the paper is available in the directory recipe/baselines/denoising_DIHARD18_webrtc.

Request access to bbc-vad-train

Due to size restriction, only bbc-vad-eval is included in the repository. If you'd like access to bbc-vad-train, please contact Matt Haynes.

Authors

Owner
BBC
Open source code used on public facing services, internal services and educational resources.
BBC
Namish Khanna 40 Oct 11, 2022
A quick recipe to learn all about Transformers

Transformers have accelerated the development of new techniques and models for natural language processing (NLP) tasks.

DAIR.AI 772 Dec 31, 2022
Semantic Segmentation Suite in TensorFlow

Semantic Segmentation Suite in TensorFlow. Implement, train, and test new Semantic Segmentation models easily!

George Seif 2.5k Jan 06, 2023
Official PyTorch Implementation of SSMix (Findings of ACL 2021)

SSMix: Saliency-based Span Mixup for Text Classification (Findings of ACL 2021) Official PyTorch Implementation of SSMix | Paper Abstract Data augment

Clova AI Research 52 Dec 27, 2022
Yoloxkeypointsegment - An anchor-free version of YOLO, with a simpler design but better performance

Introduction 关键点版本:已完成 全景分割版本:已完成 实例分割版本:已完成 YOLOX is an anchor-free version of

23 Oct 20, 2022
PyTorch implementation of Neural Combinatorial Optimization with Reinforcement Learning.

neural-combinatorial-rl-pytorch PyTorch implementation of Neural Combinatorial Optimization with Reinforcement Learning. I have implemented the basic

Patrick E. 454 Jan 06, 2023
Implementation of Lie Transformer, Equivariant Self-Attention, in Pytorch

Lie Transformer - Pytorch (wip) Implementation of Lie Transformer, Equivariant Self-Attention, in Pytorch. Only the SE3 version will be present in thi

Phil Wang 78 Oct 26, 2022
Progressive Growing of GANs for Improved Quality, Stability, and Variation

Progressive Growing of GANs for Improved Quality, Stability, and Variation — Official TensorFlow implementation of the ICLR 2018 paper Tero Karras (NV

Tero Karras 5.9k Jan 05, 2023
People Interaction Graph

Gihan Jayatilaka*, Jameel Hassan*, Suren Sritharan*, Janith Senananayaka, Harshana Weligampola, et. al., 2021. Holistic Interpretation of Public Scenes Using Computer Vision and Temporal Graphs to Id

University of Peradeniya : COVID Research Group 1 Aug 24, 2022
Learning Optical Flow from a Few Matches (CVPR 2021)

Learning Optical Flow from a Few Matches This repository contains the source code for our paper: Learning Optical Flow from a Few Matches CVPR 2021 Sh

Shihao Jiang (Zac) 159 Dec 16, 2022
A NSFW content filter.

Project_Nfilter A NSFW content filter. With a motive of minimizing the spreads and leakage of NSFW contents on internet and access to others devices ,

1 Jan 20, 2022
Facilitates implementing deep neural-network backbones, data augmentations

Introduction Nowadays, the training of Deep Learning models is fragmented and unified. When AI engineers face up with one specific task, the common wa

40 Dec 29, 2022
Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes

Neural Scene Flow Fields PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021 [Projec

Zhengqi Li 583 Dec 30, 2022
The official pytorch implemention of the CVPR paper "Temporal Modulation Network for Controllable Space-Time Video Super-Resolution".

This is the official PyTorch implementation of TMNet in the CVPR 2021 paper "Temporal Modulation Network for Controllable Space-Time VideoSuper-Resolu

Gang Xu 95 Oct 24, 2022
Code for all the Advent of Code'21 challenges mostly written in python

Advent of Code 21 Code for all the Advent of Code'21 challenges mostly written in python. They are not necessarily the best or fastest solutions but j

4 May 26, 2022
A Deep Reinforcement Learning Framework for Stock Market Trading

DQN-Trading This is a framework based on deep reinforcement learning for stock market trading. This project is the implementation code for the two pap

61 Jan 01, 2023
Deep Latent Force Models

Deep Latent Force Models This repository contains a PyTorch implementation of the deep latent force model (DLFM), presented in the paper, Compositiona

Tom McDonald 5 Oct 26, 2022
Multi-Scale Geometric Consistency Guided Multi-View Stereo

ACMM [News] The code for ACMH is released!!! [News] The code for ACMP is released!!! About ACMM is a multi-scale geometric consistency guided multi-vi

Qingshan Xu 118 Jan 04, 2023
Official repository of "DeepMIH: Deep Invertible Network for Multiple Image Hiding", TPAMI 2022.

DeepMIH: Deep Invertible Network for Multiple Image Hiding (TPAMI 2022) This repo is the official code for DeepMIH: Deep Invertible Network for Multip

Junpeng Jing 67 Nov 22, 2022
Isaac Gym Reinforcement Learning Environments

Isaac Gym Reinforcement Learning Environments

NVIDIA Omniverse 714 Jan 08, 2023