Pytorch implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion"

Last update: Nov 18, 2022

Related tags

Overview

MOSNet

pytorch implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion" https://arxiv.org/abs/1904.08352

Dependency

Linux Ubuntu 20.04

GPU: GeForce RTX 2080 Ti
CUDA version: 10.0

Python 3.7

pytorch==1.4.0
numpy==1.19.5
tqdm
scipy==1.6.2
pandas==1.2.4
matplotlib
librosa==0.6.0

Usage

Reproducing results in the paper

cd ./data and run bash download.sh to download the VCC2018 evaluation results and submitted speech. (downsample the submitted speech might take some times)
Run python mos_results_preprocess.py to prepare the evaluation results. (Run python bootsrap_estimation.py to do the bootstrap experiment for intrinsic MOS calculation)
Run python utils.py to extract .wav to .h5
Run python train.py -c config.json to train a CNN-BLSTM version of MOSNet.
Run python test.py -c config.json --epoch BEST_EPOCH --is_fp16 to test a CNN-BLSTM version of MOSNet.

Note

Thanks to the authors of the paper MOSNet and the code is based on their tensorflow implementation https://github.com/lochenchou/MOSNet. However, my workstation will show OOM errors even with BATCH_SIZE=4 under tensorflow2.0 and RTX 2080 Ti. Therefore I implement the code with pytorch. Currently only 7700MiB memory is used when BATCH_SIZE=64. If you find any problem with my code, you can write a issue.

Citation

If you find this work useful in your research, please consider citing:

@inproceedings{mosnet,
  author={Lo, Chen-Chou and Fu, Szu-Wei and Huang, Wen-Chin and Wang, Xin and Yamagishi, Junichi and Tsao, Yu and Wang, Hsin-Min},
  title={MOSNet: Deep Learning based Objective Assessment for Voice Conversion},
  year=2019,
  booktitle={Proc. Interspeech 2019},
}

License

This work is released under MIT License (see LICENSE file for details).

VCC2018 Database & Results

The model is trained on the large listening evaluation results released by the Voice Conversion Challenge 2018.
The listening test results can be downloaded from here
The databases and results (submitted speech) can be downloaded from here

Pytorch implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion"

Related tags

Overview

MOSNet

Dependency

Usage

Reproducing results in the paper

Note

Citation

License

VCC2018 Database & Results

Owner

Multi Agent Reinforcement Learning for ROS in 2D Simulation Environments

This code is for our paper "VTGAN: Semi-supervised Retinal Image Synthesis and Disease Prediction using Vision Transformers"

Implements Gradient Centralization and allows it to use as a Python package in TensorFlow

A new benchmark for Icon Question Answering (IconQA) and a large-scale icon dataset Icon645.

Markov Attention Models

PyTorch implementation of "A Simple Baseline for Low-Budget Active Learning".

Extracting and filtering paraphrases by bridging natural language inference and paraphrasing

Pytorch implementation of Learning Rate Dropout.

Systematic generalisation with group invariant predictions

PyTorch implementation of PSPNet

Ladder Variational Autoencoders (LVAE) in PyTorch

UFPR-ADMR-v2 Dataset

Chinese Mandarin tts text-to-speech 中文 (普通话) 语音合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,

Deep Learning and Reinforcement Learning Library for Scientists and Engineers 🔥

An educational tool to introduce AI planning concepts using mobile manipulator robots.

Fully Convolutional Refined Auto Encoding Generative Adversarial Networks for 3D Multi Object Scenes

Official implementation of "Robust channel-wise illumination estimation"

A PyTorch implementation of "CoAtNet: Marrying Convolution and Attention for All Data Sizes".

Source code for CAST - Crisis Domain Adaptation Using Sequence-to-sequence Transformers (Accepted to ISCRAM 2021, CorePaper).

[NeurIPS'20] Multiscale Deep Equilibrium Models

Pytorch implementation of "MOSNet: Deep Learning based Objective Assessment for Voice Conversion"

Related tags

Overview

MOSNet

Dependency

Usage

Reproducing results in the paper

Note

Citation

License

VCC2018 Database & Results

Owner

Multi Agent Reinforcement Learning for ROS in 2D Simulation Environments

This code is for our paper "VTGAN: Semi-supervised Retinal Image Synthesis and Disease Prediction using Vision Transformers"

Implements Gradient Centralization and allows it to use as a Python package in TensorFlow

A new benchmark for Icon Question Answering (IconQA) and a large-scale icon dataset Icon645.

Markov Attention Models

PyTorch implementation of "A Simple Baseline for Low-Budget Active Learning".

Extracting and filtering paraphrases by bridging natural language inference and paraphrasing

Pytorch implementation of Learning Rate Dropout.

Systematic generalisation with group invariant predictions

PyTorch implementation of PSPNet

Ladder Variational Autoencoders (LVAE) in PyTorch

UFPR-ADMR-v2 Dataset

Chinese Mandarin tts text-to-speech 中文 (普通话) 语音 合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,

Deep Learning and Reinforcement Learning Library for Scientists and Engineers 🔥

An educational tool to introduce AI planning concepts using mobile manipulator robots.

Fully Convolutional Refined Auto Encoding Generative Adversarial Networks for 3D Multi Object Scenes

Official implementation of "Robust channel-wise illumination estimation"

A PyTorch implementation of "CoAtNet: Marrying Convolution and Attention for All Data Sizes".

Source code for CAST - Crisis Domain Adaptation Using Sequence-to-sequence Transformers (Accepted to ISCRAM 2021, CorePaper).

[NeurIPS'20] Multiscale Deep Equilibrium Models

Chinese Mandarin tts text-to-speech 中文 (普通话) 语音合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,