Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

Overview

logo

Pypi version Python3 version MIT License total stats download stats / month discord


Malaya-Speech is a Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

Documentation

Proper documentation is available at https://malaya-speech.readthedocs.io/

Installing from the PyPI

CPU version

$ pip install malaya-speech

GPU version

$ pip install malaya-speech[gpu]

Only Python 3.6.0 and above and Tensorflow 1.15.0 and above are supported.

We recommend to use virtualenv for development. All examples tested on Tensorflow version 1.15.4, 1.15.5, 2.4.1 and 2.5.

Features

  • Age Detection, detect age in speech using Finetuned Speaker Vector.
  • Speaker Diarization, diarizing speakers using Pretrained Speaker Vector.
  • Emotion Detection, detect emotions in speech using Finetuned Speaker Vector.
  • Force Alignment, generate a time-aligned transcription of an audio file using RNNT.
  • Gender Detection, detect genders in speech using Finetuned Speaker Vector.
  • Language Detection, detect hyperlocal languages in speech using Finetuned Speaker Vector.
  • Multispeaker Separation, Multispeaker separation using FastSep on 8k Wav.
  • Noise Reduction, reduce multilevel noises using STFT UNET.
  • Speaker Change, detect changing speakers using Finetuned Speaker Vector.
  • Speaker overlap, detect overlap speakers using Finetuned Speaker Vector.
  • Speaker Vector, calculate similarity between speakers using Pretrained Speaker Vector.
  • Speech Enhancement, enhance voice activities using Waveform UNET.
  • SpeechSplit Conversion, detailed speaking style conversion by disentangling speech into content, timbre, rhythm and pitch using PyWorld and PySPTK.
  • Speech-to-Text, End-to-End Speech to Text for Malay, Mixed (Malay, Singlish and Mandarin) and Singlish using RNNT and Wav2Vec2 CTC.
  • Super Resolution, Super Resolution 4x for Waveform.
  • Text-to-Speech, Text to Speech for Malay and Singlish using Tacotron2, FastSpeech2 and FastPitch.
  • Vocoder, convert Mel to Waveform using MelGAN, Multiband MelGAN and Universal MelGAN Vocoder.
  • Voice Activity Detection, detect voice activities using Finetuned Speaker Vector.
  • Voice Conversion, Many-to-One, One-to-Many, Many-to-Many, and Zero-shot Voice Conversion.
  • Hybrid 8-bit Quantization, provide hybrid 8-bit quantization for all models to reduce inference time up to 2x and model size up to 4x.

Pretrained Models

Malaya-Speech also released pretrained models, simply check at malaya-speech/pretrained-model

References

If you use our software for research, please cite:

@misc{Malaya, Speech-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow,
  author = {Husein, Zolkepli},
  title = {Malaya-Speech},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/huseinzol05/malaya-speech}}
}

Acknowledgement

Thanks to KeyReply for sponsoring private cloud to train Malaya-Speech models, without it, this library will collapse entirely.

logo
You might also like...
ExKaldi-RT: An Online Speech Recognition Extension Toolkit of Kaldi

ExKaldi-RT is an online ASR toolkit for Python language. It reads realtime streaming audio and do online feature extraction, probability computation, and online decoding.

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models
IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models. Everything is pure Python and PyTorch based to keep it as simple and beginner-friendly, yet powerful as possible.

Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech tagging and word segmentation.

text to speech toolkit. 好用的中文语音合成工具箱,包含语音编码器、语音合成器、声码器和可视化模块。
text to speech toolkit. 好用的中文语音合成工具箱,包含语音编码器、语音合成器、声码器和可视化模块。

ttskit Text To Speech Toolkit: 语音合成工具箱。 安装 pip install -U ttskit 注意 可能需另外安装的依赖包:torch,版本要求torch=1.6.0,=1.7.1,根据自己的实际环境安装合适cuda或cpu版本的torch。 ttskit的

PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit.
PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit.

PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit. It provides easy-to-use, low-overhead, first-class Python wrappers for t

HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools

HuggingSound HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools. I have no intention of building a very complex tool here.

Code for ACL 2022 main conference paper "STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation".

STEMM: Self-learning with Speech-Text Manifold Mixup for Speech Translation This is a PyTorch implementation for the ACL 2022 main conference paper ST

Tensorflow Implementation of A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Tensorflow Implementation of A Generative Flow for Text-to-Speech via Monotonic Alignment Search

Tevatron is a simple and efficient toolkit for training and running dense retrievers with deep language models.

Tevatron Tevatron is a simple and efficient toolkit for training and running dense retrievers with deep language models. The toolkit has a modularized

Releases(1.3.0)
  • 1.3.0(Sep 18, 2022)

    1. Added GPT2 LM combined with pyctcdecoder, https://malaya-speech.readthedocs.io/en/latest/gpt2-lm.html
    2. Added Mask LM combined with pyctcdecoder, https://malaya-speech.readthedocs.io/en/latest/masked-lm.html
    3. Added Transducer with GPT2 LM beam decoder, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-lm-gpt2.html
    4. Added Transducer with Mask LM beam decoder, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-lm-gpt2.html
    5. Added GPT2 LM CTC decoder, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-pyctcdecode-gpt2.html
    6. Added Mask LM CTC decoder, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-pyctcdecode-mlm.html
    7. Added Squeezeformer transducer models.
    8. Added End-to-End FastSpeech2 STT models, no longer required a vocoder, https://malaya-speech.readthedocs.io/en/latest/tts-e2e-fastspeech2.html
    9. Added End-to-End VITS STT models, no longer required a vocoder, https://malaya-speech.readthedocs.io/en/latest/tts-vits.html
    10. Added Neural Vocoder Super Resolution models, https://malaya-speech.readthedocs.io/en/latest/load-super-resolution-tfgan.html
    11. Added super resolution diffusion models, https://malaya-speech.readthedocs.io/en/latest/load-super-resolution-audio-diffusion.html
    12. Added HMM speaker diarization, https://malaya-speech.readthedocs.io/en/latest/load-diarization-clustering-hmm.html
    Source code(tar.gz)
    Source code(zip)
  • 1.2.7(Jun 13, 2022)

    1. Added Speech-to-Text HuggingFace using Mesolitica finetuned models, https://huggingface.co/mesolitica, https://malaya-speech.readthedocs.io/en/latest/stt-huggingface.html
    2. Added Force Alignment HuggingFace using Mesolitica finetuned models, https://huggingface.co/mesolitica, https://malaya-speech.readthedocs.io/en/latest/stt-huggingface.html
    3. Added Text-to-Speech LightSpeech, https://arxiv.org/abs/2102.04040, https://malaya-speech.readthedocs.io/en/latest/tts-lightspeech-model.html
    4. Now Transducer LM support multi-languages.
    Source code(tar.gz)
    Source code(zip)
  • 1.2.6(May 6, 2022)

    1. Use HuggingFace as backend repository.
    2. Added yasmin and osman speakers for TTS Tacotron2, https://malaya-speech.readthedocs.io/en/latest/tts-tacotron2-model.html
    3. Added yasmin and osman speakers for TTS FastSpeech2, https://malaya-speech.readthedocs.io/en/latest/tts-fastspeech2-model.html
    4. Added yasmin and osman speakers for TTS GlowTTS, https://malaya-speech.readthedocs.io/en/latest/tts-glowtts-model.html
    5. Use yasmin and osman speakers for long text TTS, https://malaya-speech.readthedocs.io/en/latest/tts-long-text.html
    Source code(tar.gz)
    Source code(zip)
  • 1.2.5(Mar 20, 2022)

  • 1.2.4(Mar 1, 2022)

    1. Added malay language pretrained BEST-RQ models, https://github.com/huseinzol05/malaya-speech/tree/master/pretrained-model/stt/best_rq
    2. Added BEST-RQ STT, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model.html#List-available-CTC-model
    Source code(tar.gz)
    Source code(zip)
  • 1.2.2(Dec 29, 2021)

  • 1.2.1(Dec 2, 2021)

    1. Added more KenLM models, included Malay + Singlish, https://malaya-speech.readthedocs.io/en/latest/ctc-language-model.html
    2. Improved ASR CTC models, Hubert-Conformer-Large achieved 12.8% WER-LM, 3.8% CER-LM, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model.html
    3. Added CTC Decoders interface for ASR CTC models, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-ctc-decoders.html
    4. Added pyctcdecode interface for ASR CTC models, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model-pyctcdecode.html
    5. Improved ASR RNNT models, large-conformer achieved 14.8% WER-LM, 5.9% CER-LM, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model.html
    6. Added KenLM support for ASR RNNT models, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-lm.html
    7. Added ASR RNNT for 2 mixed languages, Malay and Singlish, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-lm.html#
    8. Added ASR RNNT for 3 mixed languages, Malay, Singlish and Mandarin, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-3mixed.html
    9. Added GlowTTS Text-to-Speech, https://malaya-speech.readthedocs.io/en/latest/tts-glowtts-model.html
    10. Added GlowTTS Text-to-Speech Multispeakers, https://malaya-speech.readthedocs.io/en/latest/tts-glowtts-multispeaker-model.html
    11. Added HiFiGAN Vocoder, https://malaya-speech.readthedocs.io/en/latest/load-vocoder.html
    12. Added Universal HiFiGAN Vocoder, https://malaya-speech.readthedocs.io/en/latest/load-universal-hifigan.html
    Source code(tar.gz)
    Source code(zip)
  • 1.2(Oct 2, 2021)

    1. Added HuBERT, https://malaya-speech.readthedocs.io/en/latest/load-stt-ctc-model.html, new SOTA on Malay CER.
    2. Improved Singlish TTS model, now supported Universal MelGAN as vocoder, https://malaya-speech.readthedocs.io/en/latest/tts-singlish.html
    3. Added Force Alignment module, now you can generate a time-aligned for your transcription, https://malaya-speech.readthedocs.io/en/latest/force-alignment.html
    4. Improved Mixed STT Transducer models, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-mixed.html
    5. Add new Mixed STT SOTA models, called conformer-stack-mixed, 2% better than other Mixed STT models, no paper produced, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-mixed.html#List-available-RNNT-model
    6. Add Singlish STT Transducer models, thanks to Singapore National Speech Corpus for the dataset, https://www.imda.gov.sg/programme-listing/digital-services-lab/national-speech-corpus, https://malaya-speech.readthedocs.io/en/latest/load-stt-transducer-model-singlish.html
    Source code(tar.gz)
    Source code(zip)
  • 1.1.1(Jun 29, 2021)

    1. Improved Bahasa Speech-to-Text, Large Conformer beat Google Speech-to-Text accuracy.
    2. Improved Mixed (malay and singlish) Speech-to-Text.
    3. Added real time Mixed (malay and singlish) Speech-to-Text documentation, https://malaya-speech.readthedocs.io/en/latest/realtime-asr-mixed.html
    Source code(tar.gz)
    Source code(zip)
  • 1.1(Jun 1, 2021)

  • 1.0(Apr 18, 2021)

Owner
HUSEIN ZOLKEPLI
I really love to fart and korek hidung.
HUSEIN ZOLKEPLI
Input english text, then translate it between languages n times using the Deep Translator Python Library.

mass-translator About Input english text, then translate it between languages n times using the Deep Translator Python Library. How to Use Install dep

2 Mar 04, 2022
Convolutional Neural Networks for Sentence Classification

Convolutional Neural Networks for Sentence Classification Code for the paper Convolutional Neural Networks for Sentence Classification (EMNLP 2014). R

Yoon Kim 2k Jan 02, 2023
Code for "Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments".

Code for "Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments".

Yu Zhang 50 Nov 08, 2022
scikit-learn wrappers for Python fastText.

skift scikit-learn wrappers for Python fastText. from skift import FirstColFtClassifier df = pandas.DataFrame([['woof', 0], ['meow', 1]], colu

Shay Palachy 233 Sep 09, 2022
apple's universal binaries BUT MUCH WORSE (PRACTICAL SHITPOST) (NOT PRODUCTION READY)

hyperuniversality investment opportunity: what if we could run multiple architectures in a single file, again apple universal binaries, but worse how

luna 2 Oct 19, 2021
HiFi DeepVariant + WhatsHap workflowHiFi DeepVariant + WhatsHap workflow

HiFi DeepVariant + WhatsHap workflow Workflow steps align HiFi reads to reference with pbmm2 call small variants with DeepVariant, using two-pass meth

William Rowell 2 May 14, 2022
A benchmark for evaluation and comparison of various NLP tasks in Persian language.

Persian NLP Benchmark The repository aims to track existing natural language processing models and evaluate their performance on well-known datasets.

Mofid AI 68 Dec 19, 2022
This is a project built for FALLABOUT2021 event under SRMMIC, This project deals with NLP poetry generation.

FALLABOUT-SRMMIC 21 POETRY-GENERATION HINGLISH DESCRIPTION We have developed a NLP(natural language processing) model which automatically generates a

7 Sep 28, 2021
Protein Language Model

ProteinLM We pretrain protein language model based on Megatron-LM framework, and then evaluate the pretrained model results on TAPE (Tasks Assessing P

THUDM 77 Dec 27, 2022
Skipgram Negative Sampling in PyTorch

PyTorch SGNS Word2Vec's SkipGramNegativeSampling in Python. Yet another but quite general negative sampling loss implemented in PyTorch. It can be use

Jamie J. Seol 287 Dec 14, 2022
💛 Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"

Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes Official PyTorch implementation and EmoCause evaluatio

Hyunwoo Kim 50 Dec 21, 2022
Voice Assistant inspired by Google Assistant, Cortana, Alexa, Siri, ...

author: @shival_gupta VoiceAI This program is an example of a simple virtual assitant It will listen to you and do accordingly It will begin with wish

Shival Gupta 1 Jan 06, 2022
Ecco is a python library for exploring and explaining Natural Language Processing models using interactive visualizations.

Visualize, analyze, and explore NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the behavior of Transformer-based language models (like GPT2, BER

Jay Alammar 1.6k Dec 25, 2022
Legal text retrieval for python

legal-text-retrieval Overview This system contains 2 steps: generate training data containing negative sample found by mixture score of cosine(tfidf)

Nguyễn Minh Phương 22 Dec 06, 2022
Sentello is python script that simulates the anti-evasion and anti-analysis techniques used by malware.

sentello Sentello is a python script that simulates the anti-evasion and anti-analysis techniques used by malware. For techniques that are difficult t

Malwation 62 Oct 02, 2022
Recognition of 38 speech commands in russian. Based on Yandex Cup 2021 ML Challenge: ASR

Speech_38_ru_commands Recognition of 38 speech commands in russian. Based on Yandex Cup 2021 ML Challenge: ASR Программа умеет распознавать 38 ключевы

Andrey 9 May 05, 2022
CYGNUS, the Cynical AI, combines snarky responses with uncanny aggression.

New & (hopefully) Improved CYGNUS with several API updates, user updates, and online/offline operations added!!!

Simran Farrukh 0 Mar 28, 2022
Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

MLP Singer Official implementation of MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis. Audio samples are available on our demo page.

Neosapience 103 Dec 23, 2022
Words_And_Phrases - Just a repo for useful words and phrases that might come handy in some scenarios. Feel free to add yours

Words_And_Phrases Just a repo for useful words and phrases that might come handy in some scenarios. Feel free to add yours Abbreviations Abbreviation

Subhadeep Mandal 1 Feb 01, 2022