PyTorch implementation of "A Two-Stage End-to-End System for Speech-in-Noise Hearing Aid Processing"

Last update: Aug 19, 2022

Overview

Implementation of the Sheffield entry for the first Clarity enhancement challenge (CEC1)

This repository contains the PyTorch implementation of "A Two-Stage End-to-End System for Speech-in-Noise Hearing Aid Processing", the Sheffield entry for the first Clarity enhancement challenge (CEC1). The system consists of a Conv-TasNet based denoising module, and a finite-inpulse-response (FIR) filter based amplification module. A differentiable approximation to the Cambridge MSBG model released in the CEC1 is used in the loss function.

Requirements

To run the training recipe of the amplification module, the MSBG package and PyTorch STOI are needed.

Training

To build the overall system, the Conv-TasNet based denoising module needs to be trained in the first stage, and the scripts are in the recipe_den_convtasnet. The FIR based amplification module is trained in the second stage, and the scripts are in the recipe_amp_fir. The MBSTOI folder contains the MBSTOI implementation from the CEC1 project, with also the DBSTOI implementation.

References

[1] Luo Y, Mesgarani N. Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation[J]. IEEE/ACM transactions on audio, speech, and language processing, 2019, 27(8): 1256-1266.
[2] Andersen A H, de Haan J M, Tan Z H, et al. Refinement and validation of the binaural short time objective intelligibility measure for spatially diverse conditions[J]. Speech Communication, 2018, 102: 1-13.
[3] C.H.Taal, R.C.Hendriks, R.Heusdens, J.Jensen 'A Short-Time Objective Intelligibility Measure for Time-Frequency Weighted Noisy Speech', ICASSP 2010, Texas, Dallas.

Citation

If you use this work, please cite:

@article{tutwo,
  title={A Two-Stage End-to-End System for Speech-in-Noise Hearing Aid Processing},
  author={Tu, Zehai and Zhang, Jisi and Ma, Ning and Barker, Jon},
  year={2021},
  booktitle={The Clarity Workshop on Machine Learning Challenges for Hearing Aids (Clarity-2021)},
}

PyTorch implementation of "A Two-Stage End-to-End System for Speech-in-Noise Hearing Aid Processing"

Related tags

Overview

Implementation of the Sheffield entry for the first Clarity enhancement challenge (CEC1)

Requirements

Training

References

Citation

Owner

Code for unmixing audio signals in four different stems "drums, bass, vocals, others". The code is adapted from "Jukebox: A Generative Model for Music"

R-Drop: Regularized Dropout for Neural Networks

Deploy optimized transformer based models on Nvidia Triton server

Official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo'

Discriminative Condition-Aware PLDA

PyTorch implementation for Partially View-aligned Representation Learning with Noise-robust Contrastive Loss (CVPR 2021)

RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth, in ICCV 2021 (oral)

Code for the TPAMI paper: "Syntax Customized Video Captioning by Imitating Exemplar Sentences"

Implementation EfficientDet: Scalable and Efficient Object Detection in PyTorch

Automatic Calibration for Non-repetitive Scanning Solid-State LiDAR and Camera Systems

This repo provides the source code & data of our paper "GreaseLM: Graph REASoning Enhanced Language Models"

[ICLR 2021] Is Attention Better Than Matrix Decomposition?

Exploring Cross-Image Pixel Contrast for Semantic Segmentation

TigerLily: Finding drug interactions in silico with the Graph.

Implementation of ViViT: A Video Vision Transformer

Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

UltraPose: Synthesizing Dense Pose with 1 Billion Points by Human-body Decoupling 3D Model

PyTorch implementation of Barlow Twins.

Efficient Sharpness-aware Minimization for Improved Training of Neural Networks

Solution to the first stage Quiz of Hamoye internship: Introduction to Python for Machine Learning