Ukrainian TTS (text-to-speech) using Coqui TTS

Overview
title emoji colorFrom colorTo sdk app_file pinned
Ukrainian TTS
🐸
green
green
gradio
app.py
false

Ukrainian TTS 📢 🤖

Ukrainian TTS (text-to-speech) using Coqui TTS.

Trained on M-AILABS Ukrainian dataset using sumska voice.

Link to online demo -> https://huggingface.co/spaces/robinhad/ukrainian-tts

Support

If you like my work, please support -> SUPPORT LINK

Example

test.mp4

How to use :

  1. pip install -r requirements.txt.
  2. Download model from "Releases" tab.
  3. Launch as one-time command:
tts --text "Text for TTS" \
    --model_path path/to/model.pth.tar \
    --config_path path/to/config.json \
    --out_path folder/to/save/output.wav

or alternatively launch web server using:

tts-server --model_path path/to/model.pth.tar \
    --config_path path/to/config.json

How to train:

  1. Refer to "Nervous beginner guide" in Coqui TTS docs.
  2. Instead of provided config.json use one from this repo.

Attribution

Code for app.py taken from https://huggingface.co/spaces/julien-c/coqui

Comments
  • Error with file: speakers.pth

    Error with file: speakers.pth

    FileNotFoundError: [Errno 2] No such file or directory: '/home/user/Soft/Python/mamba1/TTS/vits_mykyta_latest-September-12-2022_12+38AM-829e2c24/speakers.pth'

    opened by akirsoft 4
  • doc: fix examples in README

    doc: fix examples in README

    Problem

    The one-time snippet does not work as is and complains that the speaker is not defined

     > initialization of speaker-embedding layers.
     > Text: Перевірка мікрофона
     > Text splitted to sentences.
    ['Перевірка мікрофона']
    Traceback (most recent call last):
      File "/home/serg/.local/bin/tts", line 8, in <module>
        sys.exit(main())
      File "/home/serg/.local/lib/python3.8/site-packages/TTS/bin/synthesize.py", line 350, in main
        wav = synthesizer.tts(
      File "/home/serg/.local/lib/python3.8/site-packages/TTS/utils/synthesizer.py", line 228, in tts
        raise ValueError(
    ValueError:  [!] Look like you use a multi-speaker model. You need to define either a `speaker_name` or a `speaker_wav` to use a multi-speaker model.
    

    Also it speakers.pth should be downloaded.

    Fix

    Just a few documentation changes:

    • make instructions on what to download from Releases more precise
    • add --speaker_id argument with one of the speakers
    opened by seriar 2
  • One vowel words in the end of the sentence aren't stressed

    One vowel words in the end of the sentence aren't stressed

    Input:

    
    Бобер на березі з бобренятами бублики пік.
    
    Боронила борона по боронованому полю.
    
    Ішов Прокіп, кипів окріп, прийшов Прокіп - кипить окріп, як при Прокопі, так і при Прокопі і при Прокопенятах.
    
    Сидить Прокоп — кипить окроп, Пішов Прокоп — кипить окроп. Як при Прокопові кипів окроп, Так і без Прокопа кипить окроп.
    

    Result:

    
    Боб+ер н+а березі з бобрен+ятами б+ублики пік.
    
    Борон+ила борон+а п+о борон+ованому п+олю.
    
    Іш+ов Пр+окіп, кип+ів окр+іп, прийш+ов Пр+окіп - кип+ить окр+іп, +як пр+и Пр+окопі, т+ак +і пр+и Пр+окопі +і пр+и Прокопенятах.
    
    Сид+ить Прок+оп — кип+ить окроп, Піш+ов Прок+оп — кип+ить окроп. +Як пр+и Пр+окопові кип+ів окроп, Т+ак +і б+ез Пр+окопа кип+ить окроп.```
    opened by robinhad 0
  • Error import StressOption

    Error import StressOption

    Traceback (most recent call last): File "/home/user/Soft/Python/mamba1/test.py", line 1, in from ukrainian_tts.tts import TTS, Voices, StressOption ImportError: cannot import name 'StressOption' from 'ukrainian_tts.tts'

    opened by akirsoft 0
  • Vits improvements

    Vits improvements

    vitsArgs = VitsArgs(
        # hifi V3
        resblock_type_decoder = '2',
        upsample_rates_decoder = [8,8,4],
        upsample_kernel_sizes_decoder = [16,16,8],
        upsample_initial_channel_decoder = 256,
        resblock_kernel_sizes_decoder = [3,5,7],
        resblock_dilation_sizes_decoder = [[1,2], [2,6], [3,12]],
    )
    
    opened by robinhad 0
  • Model improvement checklist

    Model improvement checklist

    • [x] Add Ukrainian accentor - https://github.com/egorsmkv/ukrainian-accentor
    • [ ] Fine-tune from existing checkpoint (e.g. VITS Ljspeech)
    • [ ] Try to increase fft_size, hop_length to match sample_rate accordingly
    • [ ] Include more dataset samples into model
    opened by robinhad 0
Releases(v4.0.0)
  • v4.0.0(Dec 10, 2022)

  • v3.0.0(Sep 14, 2022)

    This is a release of Ukrainian TTS model and checkpoint. License for this model is GNU GPL v3 License. This release has a stress support using + sign before vowels. Model was trained for 280 000 steps by @robinhad . Kudos to @egorsmkv for providing dataset for this model. Kudos to @proger for providing alignment scripts. Kudos to @dchaplinsky for Dmytro voice.

    Example:

    Test sentence:

    К+ам'ян+ець-Под+ільський - м+істо в Хмельн+ицькій +області Укра+їни, ц+ентр Кам'ян+ець-Под+ільської міськ+ої об'+єднаної територі+альної гром+ади +і Кам'ян+ець-Под+ільського рай+ону.
    

    Mykyta (male):

    https://user-images.githubusercontent.com/5759207/190852232-34956a1d-77a9-42b9-b96d-39d0091e3e34.mp4

    Olena (female):

    https://user-images.githubusercontent.com/5759207/190852238-366782c1-9472-45fc-8fea-31346242f927.mp4

    Dmytro (male):

    https://user-images.githubusercontent.com/5759207/190852251-db105567-52ba-47b5-8ec6-5053c3baac8c.mp4

    Olha (female):

    https://user-images.githubusercontent.com/5759207/190852259-c6746172-05c4-4918-8286-a459c654eef1.mp4

    Lada (female):

    https://user-images.githubusercontent.com/5759207/190852270-7aed2db9-dc08-4a9f-8775-07b745657ca1.mp4

    Source code(tar.gz)
    Source code(zip)
    config.json(12.07 KB)
    model-inference.pth(329.95 MB)
    model.pth(989.97 MB)
    speakers.pth(495 bytes)
  • v2.0.0(Jul 10, 2022)

    This is a release of Ukrainian TTS model and checkpoint using voice (7 hours) from Mykyta dataset. License for this model is GNU GPL v3 License. This release has a stress support using + sign before vowels. Model was trained for 140 000 steps by @robinhad . Kudos to @egorsmkv for providing Mykyta and Olena dataset.

    Example:

    Test sentence:

    К+ам'ян+ець-Под+ільський - м+істо в Хмельн+ицькій +області Укра+їни, ц+ентр Кам'ян+ець-Под+ільської міськ+ої об'+єднаної територі+альної гром+ади +і Кам'ян+ець-Под+ільського рай+ону.
    

    Mykyta (male):

    https://user-images.githubusercontent.com/5759207/178158485-29a5d496-7eeb-4938-8ea7-c345bc9fed57.mp4

    Olena (female):

    https://user-images.githubusercontent.com/5759207/178158492-8504080e-2f13-43f1-83f0-489b1f9cd66b.mp4

    Source code(tar.gz)
    Source code(zip)
    config.json(9.97 KB)
    model-inference.pth(329.95 MB)
    model.pth(989.72 MB)
    optimized.pth(329.95 MB)
    speakers.pth(431 bytes)
  • v2.0.0-beta(May 8, 2022)

    This is a beta release of Ukrainian TTS model and checkpoint using voice (7 hours) from Mykyta dataset. License for this model is GNU GPL v3 License. This release has a stress support using + sign before vowels. Model was trained for 150 000 steps by @robinhad . Kudos to @egorsmkv for providing Mykyta dataset.

    Example:

    https://user-images.githubusercontent.com/5759207/167305810-2b023da7-0657-44ac-961f-5abf1aa6ea7d.mp4

    :

    Source code(tar.gz)
    Source code(zip)
    config.json(8.85 KB)
    LICENSE(34.32 KB)
    model-inference.pth(317.15 MB)
    model.pth(951.32 MB)
    tts_output.wav(1.11 MB)
  • v1.0.0(Jan 14, 2022)

  • v0.0.1(Oct 14, 2021)

novel deep learning research works with PaddlePaddle

Research 发布基于飞桨的前沿研究工作,包括CV、NLP、KG、STDM等领域的顶会论文和比赛冠军模型。 目录 计算机视觉(Computer Vision) 自然语言处理(Natrual Language Processing) 知识图谱(Knowledge Graph) 时空数据挖掘(Spa

1.5k Jan 03, 2023
A collection of GNN-based fake news detection models.

This repo includes the Pytorch-Geometric implementation of a series of Graph Neural Network (GNN) based fake news detection models. All GNN models are implemented and evaluated under the User Prefere

SafeGraph 251 Jan 01, 2023
Code for "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022.

README Code for Two-stage Identifier: "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022. For details of the model a

Yongliang Shen 45 Nov 29, 2022
Google AI 2018 BERT pytorch implementation

BERT-pytorch Pytorch implementation of Google AI's 2018 BERT, with simple annotation BERT 2018 BERT: Pre-training of Deep Bidirectional Transformers f

Junseong Kim 5.3k Jan 07, 2023
Chinese NewsTitle Generation Project by GPT2.带有超级详细注释的中文GPT2新闻标题生成项目。

GPT2-NewsTitle 带有超详细注释的GPT2新闻标题生成项目 UpDate 01.02.2021 从网上收集数据,将清华新闻数据、搜狗新闻数据等新闻数据集,以及开源的一些摘要数据进行整理清洗,构建一个较完善的中文摘要数据集。 数据集清洗时,仅进行了简单地规则清洗。

logCong 785 Dec 29, 2022
A CSRankings-like index for speech researchers

Speech Rankings This project mimics CSRankings to generate an ordered list of researchers in speech/spoken language processing along with their possib

Mutian He 19 Nov 26, 2022
An assignment on creating a minimalist neural network toolkit for CS11-747

minnn by Graham Neubig, Zhisong Zhang, and Divyansh Kaushik This is an exercise in developing a minimalist neural network toolkit for NLP, part of Car

Graham Neubig 63 Dec 29, 2022
Python3 to Crystal Translation using Python AST Walker

py2cr.py A code translator using AST from Python to Crystal. This is basically a NodeVisitor with Crystal output. See AST documentation (https://docs.

66 Jul 25, 2022
Translate U is capable of translating the text present in an image from one language to the other.

Translate U is capable of translating the text present in an image from one language to the other. The app uses OCR and Google translate to identify and translate across 80+ languages.

Neelanjan Manna 1 Dec 22, 2021
Addon for adding subtitle files to blender VSE as Text sequences. Using pysub2 python module.

Import Subtitles for Blender VSE Addon for adding subtitle files to blender VSE as Text sequences. Using pysub2 python module. Supported formats by py

4 Feb 27, 2022
translate using your voice

speech-to-text-translator Usage translate using your voice description this project makes translating a word easy, all you have to do is speak and...

1 Oct 18, 2021
Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form.

Neural G2P to portuguese language Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written for

fluz 11 Nov 16, 2022
DELTA is a deep learning based natural language and speech processing platform.

DELTA - A DEep learning Language Technology plAtform What is DELTA? DELTA is a deep learning based end-to-end natural language and speech processing p

DELTA 1.5k Dec 26, 2022
Lightweight utility tools for the detection of multiple spellings, meanings, and language-specific terminology in British and American English

Breame ( British English and American English) Breame is a lightweight Python package with a number of utility tools to aid in the detection of words

Charles 8 Oct 10, 2022
aMLP Transformer Model for Japanese

aMLP-japanese Japanese aMLP Pretrained Model aMLPとは、Liu, Daiらが提案する、Transformerモデルです。 ざっくりというと、BERTの代わりに使えて、より性能の良いモデルです。 詳しい解説は、こちらの記事などを参考にしてください。 この

tanreinama 13 Aug 11, 2022
초성 해석기 based on ko-BART

초성 해석기 개요 한국어 초성만으로 이루어진 문장을 입력하면, 완성된 문장을 예측하는 초성 해석기입니다. 초성: ㄴㄴ ㄴㄹ ㅈㅇㅎ 예측 문장: 나는 너를 좋아해 모델 모델은 SKT-AI에서 공개한 Ko-BART를 이용합니다. 데이터 문장 단위로 이루어진 아무 코퍼스나

Dawoon Jung 29 Oct 28, 2022
Awesome Treasure of Transformers Models Collection

💁 Awesome Treasure of Transformers Models for Natural Language processing contains papers, videos, blogs, official repo along with colab Notebooks. 🛫☑️

Ashish Patel 577 Jan 07, 2023
Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Ubiquitous Knowledge Processing Lab 59 Dec 01, 2022
Fast, DB Backed pretrained word embeddings for natural language processing.

Embeddings Embeddings is a python package that provides pretrained word embeddings for natural language processing and machine learning. Instead of lo

Victor Zhong 212 Nov 21, 2022
A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

ParlAI (pronounced “par-lay”) is a python framework for sharing, training and testing dialogue models, from open-domain chitchat, to task-oriented dia

Facebook Research 9.7k Jan 09, 2023