Ukrainian TTS (text-to-speech) using Coqui TTS

Overview
title emoji colorFrom colorTo sdk app_file pinned
Ukrainian TTS
🐸
green
green
gradio
app.py
false

Ukrainian TTS πŸ“’ πŸ€–

Ukrainian TTS (text-to-speech) using Coqui TTS.

Trained on M-AILABS Ukrainian dataset using sumska voice.

Link to online demo -> https://huggingface.co/spaces/robinhad/ukrainian-tts

Support

If you like my work, please support -> SUPPORT LINK

Example

test.mp4

How to use :

  1. pip install -r requirements.txt.
  2. Download model from "Releases" tab.
  3. Launch as one-time command:
tts --text "Text for TTS" \
    --model_path path/to/model.pth.tar \
    --config_path path/to/config.json \
    --out_path folder/to/save/output.wav

or alternatively launch web server using:

tts-server --model_path path/to/model.pth.tar \
    --config_path path/to/config.json

How to train:

  1. Refer to "Nervous beginner guide" in Coqui TTS docs.
  2. Instead of provided config.json use one from this repo.

Attribution

Code for app.py taken from https://huggingface.co/spaces/julien-c/coqui

Comments
  • Error with file: speakers.pth

    Error with file: speakers.pth

    FileNotFoundError: [Errno 2] No such file or directory: '/home/user/Soft/Python/mamba1/TTS/vits_mykyta_latest-September-12-2022_12+38AM-829e2c24/speakers.pth'

    opened by akirsoft 4
  • doc: fix examples in README

    doc: fix examples in README

    Problem

    The one-time snippet does not work as is and complains that the speaker is not defined

     > initialization of speaker-embedding layers.
     > Text: ΠŸΠ΅Ρ€Π΅Π²Ρ–Ρ€ΠΊΠ° ΠΌΡ–ΠΊΡ€ΠΎΡ„ΠΎΠ½Π°
     > Text splitted to sentences.
    ['ΠŸΠ΅Ρ€Π΅Π²Ρ–Ρ€ΠΊΠ° ΠΌΡ–ΠΊΡ€ΠΎΡ„ΠΎΠ½Π°']
    Traceback (most recent call last):
      File "/home/serg/.local/bin/tts", line 8, in <module>
        sys.exit(main())
      File "/home/serg/.local/lib/python3.8/site-packages/TTS/bin/synthesize.py", line 350, in main
        wav = synthesizer.tts(
      File "/home/serg/.local/lib/python3.8/site-packages/TTS/utils/synthesizer.py", line 228, in tts
        raise ValueError(
    ValueError:  [!] Look like you use a multi-speaker model. You need to define either a `speaker_name` or a `speaker_wav` to use a multi-speaker model.
    

    Also it speakers.pth should be downloaded.

    Fix

    Just a few documentation changes:

    • make instructions on what to download from Releases more precise
    • add --speaker_id argument with one of the speakers
    opened by seriar 2
  • One vowel words in the end of the sentence aren't stressed

    One vowel words in the end of the sentence aren't stressed

    Input:

    
    Π‘ΠΎΠ±Π΅Ρ€ Π½Π° Π±Π΅Ρ€Π΅Π·Ρ– Π· бобрСнятами Π±ΡƒΠ±Π»ΠΈΠΊΠΈ ΠΏΡ–ΠΊ.
    
    Π‘ΠΎΡ€ΠΎΠ½ΠΈΠ»Π° Π±ΠΎΡ€ΠΎΠ½Π° ΠΏΠΎ Π±ΠΎΡ€ΠΎΠ½ΠΎΠ²Π°Π½ΠΎΠΌΡƒ полю.
    
    Π†ΡˆΠΎΠ² ΠŸΡ€ΠΎΠΊΡ–ΠΏ, ΠΊΠΈΠΏΡ–Π² ΠΎΠΊΡ€Ρ–ΠΏ, ΠΏΡ€ΠΈΠΉΡˆΠΎΠ² ΠŸΡ€ΠΎΠΊΡ–ΠΏ - ΠΊΠΈΠΏΠΈΡ‚ΡŒ ΠΎΠΊΡ€Ρ–ΠΏ, як ΠΏΡ€ΠΈ ΠŸΡ€ΠΎΠΊΠΎΠΏΡ–, Ρ‚Π°ΠΊ Ρ– ΠΏΡ€ΠΈ ΠŸΡ€ΠΎΠΊΠΎΠΏΡ– Ρ– ΠΏΡ€ΠΈ ΠŸΡ€ΠΎΠΊΠΎΠΏΠ΅Π½ΡΡ‚Π°Ρ….
    
    Π‘ΠΈΠ΄ΠΈΡ‚ΡŒ ΠŸΡ€ΠΎΠΊΠΎΠΏ β€” ΠΊΠΈΠΏΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ, ΠŸΡ–ΡˆΠΎΠ² ΠŸΡ€ΠΎΠΊΠΎΠΏ β€” ΠΊΠΈΠΏΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ. Π―ΠΊ ΠΏΡ€ΠΈ ΠŸΡ€ΠΎΠΊΠΎΠΏΠΎΠ²Ρ– ΠΊΠΈΠΏΡ–Π² ΠΎΠΊΡ€ΠΎΠΏ, Π’Π°ΠΊ Ρ– Π±Π΅Π· ΠŸΡ€ΠΎΠΊΠΎΠΏΠ° ΠΊΠΈΠΏΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ.
    

    Result:

    
    Π‘ΠΎΠ±+Π΅Ρ€ Π½+Π° Π±Π΅Ρ€Π΅Π·Ρ– Π· Π±ΠΎΠ±Ρ€Π΅Π½+ятами Π±+ΡƒΠ±Π»ΠΈΠΊΠΈ ΠΏΡ–ΠΊ.
    
    Π‘ΠΎΡ€ΠΎΠ½+ΠΈΠ»Π° Π±ΠΎΡ€ΠΎΠ½+Π° ΠΏ+ΠΎ Π±ΠΎΡ€ΠΎΠ½+ΠΎΠ²Π°Π½ΠΎΠΌΡƒ ΠΏ+олю.
    
    Π†Ρˆ+ΠΎΠ² ΠŸΡ€+ΠΎΠΊΡ–ΠΏ, ΠΊΠΈΠΏ+Ρ–Π² ΠΎΠΊΡ€+Ρ–ΠΏ, ΠΏΡ€ΠΈΠΉΡˆ+ΠΎΠ² ΠŸΡ€+ΠΎΠΊΡ–ΠΏ - ΠΊΠΈΠΏ+ΠΈΡ‚ΡŒ ΠΎΠΊΡ€+Ρ–ΠΏ, +як ΠΏΡ€+ΠΈ ΠŸΡ€+ΠΎΠΊΠΎΠΏΡ–, Ρ‚+Π°ΠΊ +Ρ– ΠΏΡ€+ΠΈ ΠŸΡ€+ΠΎΠΊΠΎΠΏΡ– +Ρ– ΠΏΡ€+ΠΈ ΠŸΡ€ΠΎΠΊΠΎΠΏΠ΅Π½ΡΡ‚Π°Ρ….
    
    Π‘ΠΈΠ΄+ΠΈΡ‚ΡŒ ΠŸΡ€ΠΎΠΊ+ΠΎΠΏ β€” ΠΊΠΈΠΏ+ΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ, ΠŸΡ–Ρˆ+ΠΎΠ² ΠŸΡ€ΠΎΠΊ+ΠΎΠΏ β€” ΠΊΠΈΠΏ+ΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ. +Π―ΠΊ ΠΏΡ€+ΠΈ ΠŸΡ€+ΠΎΠΊΠΎΠΏΠΎΠ²Ρ– ΠΊΠΈΠΏ+Ρ–Π² ΠΎΠΊΡ€ΠΎΠΏ, Π’+Π°ΠΊ +Ρ– Π±+Π΅Π· ΠŸΡ€+ΠΎΠΊΠΎΠΏΠ° ΠΊΠΈΠΏ+ΠΈΡ‚ΡŒ ΠΎΠΊΡ€ΠΎΠΏ.```
    opened by robinhad 0
  • Error import StressOption

    Error import StressOption

    Traceback (most recent call last): File "/home/user/Soft/Python/mamba1/test.py", line 1, in from ukrainian_tts.tts import TTS, Voices, StressOption ImportError: cannot import name 'StressOption' from 'ukrainian_tts.tts'

    opened by akirsoft 0
  • Vits improvements

    Vits improvements

    vitsArgs = VitsArgs(
        # hifi V3
        resblock_type_decoder = '2',
        upsample_rates_decoder = [8,8,4],
        upsample_kernel_sizes_decoder = [16,16,8],
        upsample_initial_channel_decoder = 256,
        resblock_kernel_sizes_decoder = [3,5,7],
        resblock_dilation_sizes_decoder = [[1,2], [2,6], [3,12]],
    )
    
    opened by robinhad 0
  • Model improvement checklist

    Model improvement checklist

    • [x] Add Ukrainian accentor - https://github.com/egorsmkv/ukrainian-accentor
    • [ ] Fine-tune from existing checkpoint (e.g. VITS Ljspeech)
    • [ ] Try to increase fft_size, hop_length to match sample_rate accordingly
    • [ ] Include more dataset samples into model
    opened by robinhad 0
Releases(v4.0.0)
  • v4.0.0(Dec 10, 2022)

  • v3.0.0(Sep 14, 2022)

    This is a release of Ukrainian TTS model and checkpoint. License for this model is GNU GPL v3 License. This release has a stress support using + sign before vowels. Model was trained for 280 000 steps by @robinhad . Kudos to @egorsmkv for providing dataset for this model. Kudos to @proger for providing alignment scripts. Kudos to @dchaplinsky for Dmytro voice.

    Example:

    Test sentence:

    К+Π°ΠΌ'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΈΠΉ - ΠΌ+істо Π² Π₯мСльн+ΠΈΡ†ΡŒΠΊΡ–ΠΉ +області Π£ΠΊΡ€Π°+Ρ—Π½ΠΈ, Ρ†+Π΅Π½Ρ‚Ρ€ Кам'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΎΡ— ΠΌΡ–ΡΡŒΠΊ+ΠΎΡ— ΠΎΠ±'+Ρ”Π΄Π½Π°Π½ΠΎΡ— Ρ‚Π΅Ρ€ΠΈΡ‚ΠΎΡ€Ρ–+Π°Π»ΡŒΠ½ΠΎΡ— Π³Ρ€ΠΎΠΌ+Π°Π΄ΠΈ +Ρ– Кам'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΎΠ³ΠΎ Ρ€Π°ΠΉ+ΠΎΠ½Ρƒ.
    

    Mykyta (male):

    https://user-images.githubusercontent.com/5759207/190852232-34956a1d-77a9-42b9-b96d-39d0091e3e34.mp4

    Olena (female):

    https://user-images.githubusercontent.com/5759207/190852238-366782c1-9472-45fc-8fea-31346242f927.mp4

    Dmytro (male):

    https://user-images.githubusercontent.com/5759207/190852251-db105567-52ba-47b5-8ec6-5053c3baac8c.mp4

    Olha (female):

    https://user-images.githubusercontent.com/5759207/190852259-c6746172-05c4-4918-8286-a459c654eef1.mp4

    Lada (female):

    https://user-images.githubusercontent.com/5759207/190852270-7aed2db9-dc08-4a9f-8775-07b745657ca1.mp4

    Source code(tar.gz)
    Source code(zip)
    config.json(12.07 KB)
    model-inference.pth(329.95 MB)
    model.pth(989.97 MB)
    speakers.pth(495 bytes)
  • v2.0.0(Jul 10, 2022)

    This is a release of Ukrainian TTS model and checkpoint using voice (7 hours) from Mykyta dataset. License for this model is GNU GPL v3 License. This release has a stress support using + sign before vowels. Model was trained for 140 000 steps by @robinhad . Kudos to @egorsmkv for providing Mykyta and Olena dataset.

    Example:

    Test sentence:

    К+Π°ΠΌ'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΈΠΉ - ΠΌ+істо Π² Π₯мСльн+ΠΈΡ†ΡŒΠΊΡ–ΠΉ +області Π£ΠΊΡ€Π°+Ρ—Π½ΠΈ, Ρ†+Π΅Π½Ρ‚Ρ€ Кам'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΎΡ— ΠΌΡ–ΡΡŒΠΊ+ΠΎΡ— ΠΎΠ±'+Ρ”Π΄Π½Π°Π½ΠΎΡ— Ρ‚Π΅Ρ€ΠΈΡ‚ΠΎΡ€Ρ–+Π°Π»ΡŒΠ½ΠΎΡ— Π³Ρ€ΠΎΠΌ+Π°Π΄ΠΈ +Ρ– Кам'ян+Π΅Ρ†ΡŒ-Под+Ρ–Π»ΡŒΡΡŒΠΊΠΎΠ³ΠΎ Ρ€Π°ΠΉ+ΠΎΠ½Ρƒ.
    

    Mykyta (male):

    https://user-images.githubusercontent.com/5759207/178158485-29a5d496-7eeb-4938-8ea7-c345bc9fed57.mp4

    Olena (female):

    https://user-images.githubusercontent.com/5759207/178158492-8504080e-2f13-43f1-83f0-489b1f9cd66b.mp4

    Source code(tar.gz)
    Source code(zip)
    config.json(9.97 KB)
    model-inference.pth(329.95 MB)
    model.pth(989.72 MB)
    optimized.pth(329.95 MB)
    speakers.pth(431 bytes)
  • v2.0.0-beta(May 8, 2022)

    This is a beta release of Ukrainian TTS model and checkpoint using voice (7 hours) from Mykyta dataset. License for this model is GNU GPL v3 License. This release has a stress support using + sign before vowels. Model was trained for 150 000 steps by @robinhad . Kudos to @egorsmkv for providing Mykyta dataset.

    Example:

    https://user-images.githubusercontent.com/5759207/167305810-2b023da7-0657-44ac-961f-5abf1aa6ea7d.mp4

    :

    Source code(tar.gz)
    Source code(zip)
    config.json(8.85 KB)
    LICENSE(34.32 KB)
    model-inference.pth(317.15 MB)
    model.pth(951.32 MB)
    tts_output.wav(1.11 MB)
  • v1.0.0(Jan 14, 2022)

  • v0.0.1(Oct 14, 2021)

Quick insights from Zoom meeting transcripts using Graph + NLP

Transcript Analysis - Graph + NLP This program extracts insights from Zoom Meeting Transcripts (.vtt) using TigerGraph and NLTK. In order to run this

Advit Deepak 7 Sep 17, 2022
PIZZA - a task-oriented semantic parsing dataset

The PIZZA dataset continues the exploration of task-oriented parsing by introducing a new dataset for parsing pizza and drink orders, whose semantics cannot be captured by flat slots and intents.

17 Dec 14, 2022
precise iris segmentation

PI-DECODER Introduction PI-DECODER, a decoder structure designed for Precise Iris Segmentation and Location. The decoder structure is shown below: Ple

8 Aug 08, 2022
πŸ‘‘ spaCy building blocks and visualizers for Streamlit apps

spacy-streamlit: spaCy building blocks for Streamlit apps This package contains utilities for visualizing spaCy models and building interactive spaCy-

Explosion 620 Dec 29, 2022
Sequence model architectures from scratch in PyTorch

This repository implements a variety of sequence model architectures from scratch in PyTorch. Effort has been put to make the code well structured so that it can serve as learning material. The train

Brando Koch 11 Mar 28, 2022
Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition

SEW (Squeezed and Efficient Wav2vec) The repo contains the code of the paper "Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speec

ASAPP Research 67 Dec 01, 2022
FedNLP: A Benchmarking Framework for Federated Learning in Natural Language Processing

FedNLP is a research-oriented benchmarking framework for advancing federated learning (FL) in natural language processing (NLP). It uses FedML repository as the git submodule. In other words, FedNLP

FedML-AI 216 Nov 27, 2022
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS)

This repository is an implementation of Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis (SV2TTS) with a vocoder that works in real-time. Feel free to check my the

Corentin Jemine 38.5k Jan 03, 2023
OceanScript is an Esoteric language used to encode and decode text into a formulation of characters

OceanScript is an Esoteric language used to encode and decode text into a formulation of characters - where the final result looks like waves in the ocean.

Implementation of Fast Transformer in Pytorch

Fast Transformer - Pytorch Implementation of Fast Transformer in Pytorch. This only work as an encoder. Yannic video AI Epiphany Install $ pip install

Phil Wang 167 Dec 27, 2022
Code for EMNLP20 paper: "ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training"

ProphetNet-X This repo provides the code for reproducing the experiments in ProphetNet. In the paper, we propose a new pre-trained language model call

Microsoft 394 Dec 17, 2022
Korean stereoypte detector with TUNiB-Electra and K-StereoSet

Korean Stereotype Detector Korean stereotype sentence classifier using K-StereoSet with TUNiB-Electra Web demo you can test this model easily in demo

Sae_Chan_Oh 11 Feb 18, 2022
Local cross-platform machine translation GUI, based on CTranslate2

DesktopTranslator Local cross-platform machine translation GUI, based on CTranslate2 Download Windows Installer You can either download a ready-made W

Yasmin Moslem 29 Jan 05, 2023
Pretty-doc - Composable text objects with python

pretty-doc from __future__ import annotations from dataclasses import dataclass

Taine Zhao 2 Jan 17, 2022
Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision Training Efficiency We show the training efficiency of our DSLP model b

Chenyang Huang 37 Jan 04, 2023
Grover is a model for Neural Fake News -- both generation and detectio

Grover is a model for Neural Fake News -- both generation and detection. However, it probably can also be used for other generation tasks.

Rowan Zellers 856 Dec 24, 2022
pytorch implementation of Attention is all you need

A Pytorch Implementation of the Transformer: Attention Is All You Need Our implementation is largely based on Tensorflow implementation Requirements N

230 Dec 07, 2022
Awesome-NLP-Research (ANLP)

Awesome-NLP-Research (ANLP)

Language, Information, and Learning at Yale 72 Dec 19, 2022
Pytorch implementation of winner from VQA Chllange Workshop in CVPR'17

2017 VQA Challenge Winner (CVPR'17 Workshop) pytorch implementation of Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challeng

Mark Dong 166 Dec 11, 2022
auto_code_complete is a auto word-completetion program which allows you to customize it on your need

auto_code_complete v1.3 purpose and usage auto_code_complete is a auto word-completetion program which allows you to customize it on your needs. the m

RUO 2 Feb 22, 2022