SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

Last update: Nov 20, 2021

Related tags

Text Data & NLP SASE

Overview

SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

We propose a SASE model with adaptive noise distribution, which achieves state of the art results on the VioceBank+DEMAND dataset.
We simulated the federated learning setting of a real environment and verified the robustness of the proposed SASE noise reduction model in a real environment through experiments and visualization.
The proposed SASE model is computed based on the complex domain, and the TF-GA block is used to extract richer information of speech distribution and noise distribution, while SA-GOEA and SA-GUEA are adaptive to learn the distribution mask of noise.
In this paper, we propose a model aggregation optimization weighting strategy that is more applicable to FLbased speech enhancement tasks.

Dependencies

python >=3.6 (3.8.5 was used in the experiments)
PyTorch == 1.10.0+cu113
flwr == 2.0.1

How to run the code

1. Prepare data

VoiceBank+DEMAND can be accessed from this [link](## SUPERSEDED: THIS DATASET HAS BEEN REPLACED. ## Noisy speech database for training speech enhancement algorithms and TTS models)
CommonVoice(Chinese) link +Noise92 [link](NOISEX (cmu.edu))

2. Train on the VoiceBank+DEMAND dataset

python main.py

3. Train on the CommonVoice(Chinese)+Noise92 dataset with Federated learning

./run-server.sh
./run-client.sh
- You can change the number of clients by changing NUM_CLIENTS

4. Generate wav files and evaluate

python main.py -g --resume "model_file" -df "wavs_root"

SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

Related tags

Overview

SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

Dependencies

How to run the code

1. Prepare data

2. Train on the VoiceBank+DEMAND dataset

3. Train on the CommonVoice(Chinese)+Noise92 dataset with Federated learning

4. Generate wav files and evaluate

Result

1. Evaluate on VoiceBank+DEMAND dataset

2. Evaluate on CommonVoice+Noise92 dataset

Owner

Tower

Py65 65816 - Add support for the 65C816 to py65

Multispeaker & Emotional TTS based on Tacotron 2 and Waveglow

EdiTTS: Score-based Editing for Controllable Text-to-Speech

Comprehensive-E2E-TTS - PyTorch Implementation

One Stop Anomaly Shop: Anomaly detection using two-phase approach: (a) pre-labeling using statistics, Natural Language Processing and static rules; (b) anomaly scoring using supervised and unsupervised machine learning.

基于pytorch_rnn的古诗词生成

Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages

The aim of this task is to predict someone's English proficiency based on a text input.

RuCLIP tiny (Russian Contrastive Language–Image Pretraining) is a neural network trained to work with different pairs (images, texts).

Applied Natural Language Processing in the Enterprise - An O'Reilly Media Publication

BERT, LDA, and TFIDF based keyword extraction in Python

Python implementation of TextRank for phrase extraction and summarization of text documents

Unsupervised intent recognition

Switch spaces for knowledge graph embeddings

Python port of Google's libphonenumber

Knowledge Oriented Programming Language

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation (SIGGRAPH Asia 2021)

:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

wxPython app for converting encodings, modifying and fixing SRT files