Refactored version of FastSpeech2

Overview

FastSpeech2

This repository is a refactored version from ming024's own. I focused on refactoring structure for fitting my cases and making parallel pre-processing codes. And I wrote installation guide with the latest version of MFA(Montreal Force Aligner).

Installation

  • Tested on python 3.8, Ubuntu 20.04

    • Notice ! For installing MFA, you should install the miniconda.
    • If you run MFA under 16.04 or ealier version of Ubuntu, you will face a compile error.
  • In your system

    • To install pyworld, run "sudo apt-get install python3.x-dev". (x is your python version).
    • To install sndfile, run "sudo apt-get install libsndfile-dev"
    • To use MFA, run "sudo apt-get install libopenblas-base"
  • Install requirements

# install pytorch_sound
pip install git+https://github.com/appleholic/pytorch_sound
pip install -e .
  • Download datasets
  1. VCTK
  2. LibriTTS
    • To be updated
  • Install MFA

    • Visit and follow a guide that described in MFA installation website.
    • Additional installation
      • mfa thirdparty download
      • mfa download acoustic english
  • Pre-trained checkpoint

Preprocess (VCTK case)

  1. Prepare MFA
python fastspeech2/scripts/prepare_align.py configs/vctk_prepare_align.json
  1. Run MFA for making alignments
# Define your the number of threads to run MFA at the last of a command. "-j [The number of threads]"
mfa align data/fastspeech2/vctk lexicons/librispeech-lexicon.txt english data/fastspeech2/vctk-pre -j 24
  1. Feature preprocessing
python fastspeech2/scripts/preprocess.py configs/vctk_preprocess.json

Train

  1. Multi-speaker fastspeech2
python fastspeech2/scripts/train.py configs/fastspeech2_vctk_tts.json
  • If you want to change the parameters of training FastSpeech2, check out the code and put the option to configuration file.
    • train code : fastspeech2/scripts/train.py
    • config : configs/fastspeech2_vctk_tts.json
  1. Fastspeech2 with reference encoder (To be updated)

Synthesize

Multi-spaker model

  • In a code
from fastspeech2.inference import Inferencer
from speech_interface.interfaces.hifi_gan import InterfaceHifiGAN

# arguments
# chk_path: str, lexicon_path: str, device: str = 'cuda'
inferencer = Inferencer(chk_path=chk_path, lexicon_path=lexicon_path, device=device)

# initialize hifigan
interface = InterfaceHifiGAN(model_name='hifi_gan_v1_universal', device='cuda')

# arguments
# text: str, speaker: int = 0, pitch_control: float = 1., energy_control: float = 1., duration_control: float = 1.
txt = 'Hello, I am a programmer.'
mel_spectrogram = inferencer.tts(txt, speaker=0)

# Reconstructs speech by using Hifi-GAN
pred_wav = interface.decode(mel_spectrogram.transpose(1, 2)).squeeze()

# If you test on a jupyter notebook
from IPython.display import Audio
Audio(pred_wav.cpu().numpy(), rate=22050)
  • In command line
python fastspeech2/scripts/synthesize.py [TEXT] [OUTPUT PATH] [CHECKPOINT PATH] [LEXICON PATH] [[DEVICE]] [[SPEAKER]]

Reference encoder (not updated)

Reference

Owner
ILJI CHOI
AI Research Engineer
ILJI CHOI
A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

GuwenModels: 古文自然语言处理模型合集, 收录互联网上的古文相关模型及资源. A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

Ethan 66 Dec 26, 2022
This is a modification of the OpenAI-CLIP repository of moein-shariatnia

This is a modification of the OpenAI-CLIP repository of moein-shariatnia

Sangwon Beak 2 Mar 04, 2022
Perform sentiment analysis on textual data that people generally post on websites like social networks and movie review sites.

Sentiment Analyzer The goal of this project is to perform sentiment analysis on textual data that people generally post on websites like social networ

Madhusudan.C.S 53 Mar 01, 2022
The aim of this task is to predict someone's English proficiency based on a text input.

English_proficiency_prediction_NLP The aim of this task is to predict someone's English proficiency based on a text input. Using the The NICT JLE Corp

1 Dec 13, 2021
An implementation of model parallel GPT-2 and GPT-3-style models using the mesh-tensorflow library.

GPT Neo 🎉 1T or bust my dudes 🎉 An implementation of model & data parallel GPT3-like models using the mesh-tensorflow library. If you're just here t

EleutherAI 6.7k Dec 28, 2022
STonKGs is a Sophisticated Transformer that can be jointly trained on biomedical text and knowledge graphs

STonKGs STonKGs is a Sophisticated Transformer that can be jointly trained on biomedical text and knowledge graphs. This multimodal Transformer combin

STonKGs 27 Aug 11, 2022
Neural network models for joint POS tagging and dependency parsing (CoNLL 2017-2018)

Neural Network Models for Joint POS Tagging and Dependency Parsing Implementations of joint models for POS tagging and dependency parsing, as describe

Dat Quoc Nguyen 152 Sep 02, 2022
Harvis is designed to automate your C2 Infrastructure.

Harvis Harvis is designed to automate your C2 Infrastructure, currently using Mythic C2. 📌 What is it? Harvis is a python tool to help you create mul

Thiago Mayllart 99 Oct 06, 2022
Neural text generators like the GPT models promise a general-purpose means of manipulating texts.

Boolean Prompting for Neural Text Generators Neural text generators like the GPT models promise a general-purpose means of manipulating texts. These m

Jeffrey M. Binder 20 Jan 09, 2023
Global Rhythm Style Transfer Without Text Transcriptions

Global Prosody Style Transfer Without Text Transcriptions This repository provides a PyTorch implementation of AutoPST, which enables unsupervised glo

Kaizhi Qian 193 Dec 30, 2022
Application to help find best train itinerary, uses speech to text, has a spam filter to segregate invalid inputs, NLP and Pathfinding algos.

T-IAI-901-MSC2022 - GROUP 18 Gestion de projet Notre travail a été organisé et réparti dans un Trello. https://trello.com/b/X3s2fpPJ/ia-projet Install

1 Feb 05, 2022
Python library to make development of portfolio analysis faster and easier

Trafalgar Python library to make development of portfolio analysis faster and easier Installation 🔥 For the moment, Trafalgar is still in beta develo

Santosh Passoubady 641 Jan 01, 2023
Repositório do trabalho de introdução a NLP

Trabalho da disciplina de BI NLP Repositório do trabalho da disciplina Introdução a Processamento de Linguagem Natural da pós BI-Master da PUC-RIO. Eq

Leonardo Lins 1 Jan 18, 2022
ReCoin - Restoring our environment and businesses in parallel

Shashank Ojha, Sabrina Button, Abdellah Ghassel, Joshua Gonzales "Reduce Reuse R

sabrina button 1 Mar 14, 2022
Azure Text-to-speech service for Home Assistant

Azure Text-to-speech service for Home Assistant The Azure text-to-speech platform uses online Azure Text-to-Speech cognitive service to read a text wi

Yassine Selmi 2 Aug 06, 2022
The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models

Graformer The repository for the paper: Multilingual Translation via Grafting Pre-trained Language Models Graformer (also named BridgeTransformer in t

22 Dec 14, 2022
Twewy-discord-chatbot - Build a Discord AI Chatbot that Speaks like Your Favorite Character

Build a Discord AI Chatbot that Speaks like Your Favorite Character! This is a Discord AI Chatbot that uses the Microsoft DialoGPT conversational mode

Lynn Zheng 231 Dec 30, 2022
Sequence model architectures from scratch in PyTorch

This repository implements a variety of sequence model architectures from scratch in PyTorch. Effort has been put to make the code well structured so that it can serve as learning material. The train

Brando Koch 11 Mar 28, 2022
An extension for asreview implements a version of the tf-idf feature extractor that saves the matrix and the vocabulary.

Extension - matrix and vocabulary extractor for TF-IDF and Doc2Vec An extension for ASReview that adds a tf-idf extractor that saves the matrix and th

ASReview 4 Jun 17, 2022
🦆 Contextually-keyed word vectors

sense2vec: Contextually-keyed word vectors sense2vec (Trask et. al, 2015) is a nice twist on word2vec that lets you learn more interesting and detaile

Explosion 1.5k Dec 25, 2022