ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library

ERISHA is a multilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available. The term ERISHA means speech in Sanskrit. The framework of ERISHA includes various deep learning architectures such as Global Style Token (GST), Variational Autoencoder (VAE), and Gaussian Mixture Variational Autoencoder (GMVAE), and X-vectors for building prosody encoder.

Currently, the library is in its initial stage of development and will be updated frequently in the coming days.

Stay tuned for more updates, and we are open to collaboration !!!

Installation and Training

Refer INSTALL for initial setup

Available recipes

Available Features

Resampling of speech waveforms to target sampling rate in recipes
Support to train TTS system for other languages
Support to train Multilingual TTS system for other languages

Upcoming updates

[User Documentation]
Pytorch Lightning
Multiclass N-pair loss
[Cluster sampling for improving latent representation of speaker and expressivity](Proposed work)

Acknowledgements

This implementation uses code from the following repos: NVIDIA, Keith Ito, Prem Seetharaman, Chengqi Deng,Dannynis, Jhosimar George Arias Figueroa

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

Related tags

Overview

ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library

Installation and Training

Available recipes

Available Features

Upcoming updates

Acknowledgements

Owner

Ajinkya Kulkarni

Repo for 2021 SDD assessment task 2, by Felix, Anna, and James.

StyleGAN2 - Official TensorFlow Implementation

This MVP data web app uses the Streamlit framework and Facebook's Prophet forecasting package to generate a dynamic forecast from your own data.

Content shared at DS-OX Meetup

This package is for running the semantic SLAM algorithm using extracted planar surfaces from the received detection

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

Parameter Efficient Deep Probabilistic Forecasting

VolumeGAN - 3D-aware Image Synthesis via Learning Structural and Textural Representations

Research into Forex price prediction from price history using Deep Sequence Modeling with Stacked LSTMs.

This is the pytorch re-implementation of the IterNorm

Hyper-parameter optimization for sklearn

Husein pet projects in here!

Code for EMNLP 2021 paper Contrastive Out-of-Distribution Detection for Pretrained Transformers.

A solution to ensure Crowd Management with Contactless and Safe systems.

SSPNet: Scale Selection Pyramid Network for Tiny Person Detection from UAV Images.

multimodal transformer

Sequential Model-based Algorithm Configuration

Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks

hipCaffe: the HIP port of Caffe

Kaggle | 9th place (part of) solution for the Bristol-Myers Squibb – Molecular Translation challenge