glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Overview

Glow-Speak

glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Installation

git clone https://github.com/rhasspy/glow-speak.git
cd glow-speak/

python3 -m venv .venv
source .venv/bin/activate
pip3 install --upgrade pip
pip3 install --upgrade setuptools wheel
pip3 install -f 'https://synesthesiam.github.io/prebuilt-apps/' -r requirements.txt

python3 setup.py develop
glow-speak --version

Voices

The following languages/voices are supported:

  • German
    • de_thorsten
  • Chinese
    • cmn_jing_li
  • Greek
    • el_rapunzelina
  • English
    • en-us_ljspeech
    • en-us_mary_ann
  • Spanish
    • es_tux
  • Finnish
    • fi_harri_tapani_ylilammi
  • French
    • fr_siwis
  • Hungarian
    • hu_diana_majlinger
  • Italian
    • it_riccardo_fasol
  • Korean
    • ko_kss
  • Dutch
    • nl_rdh
  • Russian
    • ru_nikolaev
  • Swedish
    • sv_talesyntese
  • Swahili
    • sw_biblia_takatifu
  • Vietnamese
    • vi_vais1000

Usage

Download Voices

glow-speak-download de_thorsten

Command-Line Synthesis

glow-speak -v en-us_mary_ann 'This is a test.' --output-file test.wav

HTTP Server

glow-speak-http-server --debug

Visit http://localhost:5002

Socket Server

Start the server:

glow-speak-socket-server --voice en-us_mary_ann --socket /tmp/glow-speak.sock

From a separate terminal:

echo 'This is a test.' | bin/glow-speak-socket-client --socket /tmp/glow-speak.sock | xargs aplay

Lines from client to server are synthesized, and the path to the WAV file is returned (usually in /tmp).

You might also like...
End-to-End Speech Processing Toolkit
End-to-End Speech Processing Toolkit

ESPnet: end-to-end speech processing toolkit system/pytorch ver. 1.0.1 1.1.0 1.2.0 1.3.1 1.4.0 1.5.1 1.6.0 1.7.1 1.8.1 ubuntu18/python3.8/pip ubuntu18

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

Athena is an open-source implementation of end-to-end speech processing engine.

Athena is an open-source implementation of end-to-end speech processing engine. Our vision is to empower both industrial application and academic research on end-to-end models for speech processing. To make speech processing available to everyone, we're also releasing example implementation and recipe on some opensource dataset for various tasks (Automatic Speech Recognition, Speech Synthesis, Voice Conversion, Speaker Recognition, etc).

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

🤗 Contributing to OpenSpeech 🤗 OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform ta

 SHAS: Approaching optimal Segmentation for End-to-End Speech Translation
SHAS: Approaching optimal Segmentation for End-to-End Speech Translation

SHAS: Approaching optimal Segmentation for End-to-End Speech Translation In this repo you can find the code of the Supervised Hybrid Audio Segmentatio

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

CRNN paper:An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition 1. create your ow

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks

Official PyTorch code for ClipBERT, an efficient framework for end-to-end learning on image-text and video-text tasks. It takes raw videos/images + text as inputs, and outputs task predictions. ClipBERT is designed based on 2D CNNs and transformers, and uses a sparse sampling strategy to enable efficient end-to-end video-and-language learning.

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge This is an implementation of the paper,

Comments
  • AssertionError on web interface (only) - and Raspberry Pi Bullseye test

    AssertionError on web interface (only) - and Raspberry Pi Bullseye test

    Hi Micheal,

    great work again! :smiley:

    I just saw this repository and thought I'd give it a try on my freshly installed Raspberry Pi 4 with 32bit Raspberry Pi OS Bullseye (Debian 11). Installation almost finished without errors! :partying_face: ... I just had to fix one thing: sudo apt-get install libatlas-base-dev After 15min I was already generating audio :grin: :+1:

    When I tested en mary_ann and thorsten_de via the web interface I got this error as soon as my test sentence ended with a question mark:

    DEBUG:glow-speak:ɪ_z ð_ɪ_s ɐ_n_ˈʌ_ð_ɚ t_ˈɛ_s_t? .
    ERROR:glow_speak.http_server:
    Traceback (most recent call last):
      File "/home/pi/glow-speak/.venv/lib/python3.9/site-packages/quart/app.py", line 1490, in full_dispatch_request
        result = await self.dispatch_request(request_context)
      File "/home/pi/glow-speak/.venv/lib/python3.9/site-packages/quart/app.py", line 1536, in dispatch_request
        return await self.ensure_async(handler)(**request_.view_args)
      File "/home/pi/glow-speak/glow_speak/http_server.py", line 484, in app_say
        wav_bytes = await text_to_wav(text, voice, **tts_args)
      File "/home/pi/glow-speak/glow_speak/http_server.py", line 323, in text_to_wav
        text_ids = text_to_ids(
      File "/home/pi/glow-speak/glow_speak/__init__.py", line 110, in text_to_ids
        text_ids = phonemes2ids(
      File "/home/pi/glow-speak/.venv/lib/python3.9/site-packages/phonemes2ids/__init__.py", line 190, in phonemes2ids
        maybe_extend_ids(sub_phoneme, word_ids, append_list=False)
      File "/home/pi/glow-speak/.venv/lib/python3.9/site-packages/phonemes2ids/__init__.py", line 108, in maybe_extend_ids
        maybe_ids = missing_func(phoneme)
      File "/home/pi/glow-speak/glow_speak/__init__.py", line 59, in guess_ids
        typing.List[Phoneme], guess_phonemes(phoneme, self.to_phonemes)
      File "/home/pi/glow-speak/.venv/lib/python3.9/site-packages/gruut_ipa/accent.py", line 159, in guess_phonemes
        assert dist_split is not None
    AssertionError
    

    Maybe some encoding error when reading the web input?

    Speed seems pretty good, comparable to Larynx I'd say :+1: and I noticed the pronunciations have been improved for German :clap: :sunglasses:

    opened by fquirin 0
Owner
Rhasspy
Offline voice assistant
Rhasspy
Sentiment Analysis Project using Count Vectorizer and TF-IDF Vectorizer

Sentiment Analysis Project This project contains two sentiment analysis programs for Hotel Reviews using a Hotel Reviews dataset from Datafiniti. The

Simran Farrukh 0 Mar 28, 2022
Various Algorithms for Short Text Mining

Short Text Mining in Python Introduction This package shorttext is a Python package that facilitates supervised and unsupervised learning for short te

Kwan-Yuet 466 Dec 06, 2022
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

Rasa 15.3k Dec 30, 2022
Nested Named Entity Recognition

Nested Named Entity Recognition Training Dataset: CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark url: https://tianchi.aliyun.

8 Dec 25, 2022
SEJE is a prototype for the paper Learning Text-Image Joint Embedding for Efficient Cross-Modal Retrieval with Deep Feature Engineering.

SEJE is a prototype for the paper Learning Text-Image Joint Embedding for Efficient Cross-Modal Retrieval with Deep Feature Engineering. Contents Inst

0 Oct 21, 2021
Script to generate VAD dataset used in Asteroid recipe

About the dataset LibriVAD is an open source dataset for voice activity detection in noisy environments. It is derived from LibriSpeech signals (clean

11 Sep 15, 2022
SciBERT is a BERT model trained on scientific text.

SciBERT is a BERT model trained on scientific text.

AI2 1.2k Dec 24, 2022
Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.

English|简体中文 ERNIE是百度开创性提出的基于知识增强的持续学习语义理解框架,该框架将大数据预训练与多源丰富知识相结合,通过持续学习技术,不断吸收海量文本数据中词汇、结构、语义等方面的知识,实现模型效果不断进化。ERNIE在累积 40 余个典型 NLP 任务取得 SOTA 效果,并在 G

5.4k Jan 03, 2023
Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

anaGo anaGo is a Python library for sequence labeling(NER, PoS Tagging,...), implemented in Keras. anaGo can solve sequence labeling tasks such as nam

Hiroki Nakayama 1.5k Dec 05, 2022
Japanese NLP Library

Japanese NLP Library Back to Home Contents 1 Requirements 1.1 Links 1.2 Install 1.3 History 2 Libraries and Modules 2.1 Tokenize jTokenize.py 2.2 Cabo

Pulkit Kathuria 144 Dec 27, 2022
Training code for Korean multi-class sentiment analysis

KoSentimentAnalysis Bert implementation for the Korean multi-class sentiment analysis 왜 한국어 감정 다중분류 모델은 거의 없는 것일까?에서 시작된 프로젝트 Environment: Pytorch, Da

Donghoon Shin 3 Dec 02, 2022
Tools for curating biomedical training data for large-scale language modeling

Tools for curating biomedical training data for large-scale language modeling

BigScience Workshop 242 Dec 25, 2022
Abhijith Neil Abraham 2 Nov 05, 2021
MPNet: Masked and Permuted Pre-training for Language Understanding

MPNet MPNet: Masked and Permuted Pre-training for Language Understanding, by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu, is a novel pre-tr

Microsoft 228 Nov 21, 2022
Huggingface Transformers + Adapters = ❤️

adapter-transformers A friendly fork of HuggingFace's Transformers, adding Adapters to PyTorch language models adapter-transformers is an extension of

AdapterHub 1.2k Jan 09, 2023
A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN

artificial intelligence cosmic love and attention fire in the sky a pyramid made of ice a lonely house in the woods marriage in the mountains lantern

Phil Wang 2.3k Jan 01, 2023
Implementation of some unbalanced loss like focal_loss, dice_loss, DSC Loss, GHM Loss et.al

Implementation of some unbalanced loss for NLP task like focal_loss, dice_loss, DSC Loss, GHM Loss et.al Summary Here is a loss implementation reposit

121 Jan 01, 2023
A python framework to transform natural language questions to queries in a database query language.

__ _ _ _ ___ _ __ _ _ / _` | | | |/ _ \ '_ \| | | | | (_| | |_| | __/ |_) | |_| | \__, |\__,_|\___| .__/ \__, | |_| |_| |___/

Machinalis 1.2k Dec 18, 2022
ADCS cert template modification and ACL enumeration

Purpose This tool is designed to aid an operator in modifying ADCS certificate templates so that a created vulnerable state can be leveraged for privi

Fortalice Solutions, LLC 78 Dec 12, 2022
Python package for performing Entity and Text Matching using Deep Learning.

DeepMatcher DeepMatcher is a Python package for performing entity and text matching using deep learning. It provides built-in neural networks and util

461 Dec 28, 2022