Code for the ICASSP-2021 paper: Continuous Speech Separation with Conformer.

Last update: Nov 28, 2022

Related tags

Overview

Continuous Speech Separation with Conformer

Introduction

We examine the use of the Conformer architecture for continuous speech separation. Conformer allows the separation model to efficiently capture both local and global context information, which is helpful for speech separation. Experimental results using the LibriCSS dataset show that the Conformer separation model achieves state of the art results for both single-channel and multi-channel settings.

For a detailed description and experimental results, please refer to our paper: Continuous Speech Separation with Conformer (Accepted by ICASSP 2021).

Environment

python 3.6.9, torch 1.7.1

Get Started

Download the overlapped speech of LibriCSS dataset.

wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1PdloA-V8HGxkRu9MnT35_civpc3YXJsT' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1PdloA-V8HGxkRu9MnT35_civpc3YXJsT" -O overlapped_speech.zip && rm -rf /tmp/cookies.txt && unzip overlapped_speech.zip && rm overlapped_speech.zip

Download the Conformer separation models.

wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1OlTbEvxYUoqWIHfeAXCftL9srbWUo4I1' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1OlTbEvxYUoqWIHfeAXCftL9srbWUo4I1" -O checkpoints.zip && rm -rf /tmp/cookies.txt && unzip checkpoints.zip && rm checkpoints.zip

Run the separation.

3.1 Single-channel separation

export MODEL_NAME=1ch_conformer_base
python3 separate.py \
    --checkpoint checkpoints/$MODEL_NAME \
    --mix-scp utils/overlapped_speech_1ch.scp \
    --dump-dir separated_speech/monaural/utterances_with_$MODEL_NAME \
    --device-id 0 \
    --num_spks 2

The separated speech can be found in the directory 'separated_speech/monaural/utterances_with_$MODEL_NAME'

3.2 Seven-channel separation

export MODEL_NAME=conformer_base
python3 separate.py \
    --checkpoint checkpoints/$MODEL_NAME \
    --mix-scp utils/overlapped_speech_7ch.scp \
    --dump-dir separated_speech/7ch/utterances_with_$MODEL_NAME \
    --device-id 0 \
    --num_spks 2 \
    --mvdr True

The separated speech can be found in the directory 'separated_speech/7ch/utterances_with_$MODEL_NAME'

Citation

If you find our work useful, please cite our paper:

@inproceedings{CSS_with_Conformer,
  title={Continuous speech separation with conformer},
  author={Chen, Sanyuan and Wu, Yu and Chen, Zhuo and Wu, Jian and Li, Jinyu and Yoshioka, Takuya and Wang, Chengyi and Liu, Shujie and Zhou, Ming},
  booktitle={ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={5749--5753},
  year={2021},
  organization={IEEE}
}

Code for the ICASSP-2021 paper: Continuous Speech Separation with Conformer.

Related tags

Overview

Continuous Speech Separation with Conformer

Introduction

Environment

Get Started

Citation

Owner

Sanyuan Chen (陈三元)

Anagram Generator in Python

A python library for face detection and features extraction based on mediapipe library

Code for testing convergence rates of Lipschitz learning on graphs

Official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo'

A PaddlePaddle version image model zoo.

Keras Realtime Multi-Person Pose Estimation - Keras version of Realtime Multi-Person Pose Estimation project

Repository for the paper "From global to local MDI variable importances for random forests and when they are Shapley values"

Code to produce syntactic representations that can be used to study syntax processing in the human brain

NLU Dataset Diagnostics

Use unsupervised and supervised learning to predict stocks

Dimension Reduced Turbulent Flow Data From Deep Vector Quantizers

This repository contains the implementation of the following paper: Cross-Descriptor Visual Localization and Mapping

ICLR 2021: Pre-Training for Context Representation in Conversational Semantic Parsing

This is an official implementation for "Self-Supervised Learning with Swin Transformers".

Fully Convolutional DenseNets for semantic segmentation.

🚗 INGI Dakar 2K21 - Be the first one on the finish line ! 🚗

On Effective Scheduling of Model-based Reinforcement Learning

Person Re-identification

Graph Robustness Benchmark: A scalable, unified, modular, and reproducible benchmark for evaluating the adversarial robustness of Graph Machine Learning.

Reproduce partial features of DeePMD-kit using PyTorch.