Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form.

Last update: Nov 16, 2022

Overview

Neural G2P to portuguese language

Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form. It has a highly essential role for natural language processing, text-to-speech synthesis and automatic speech recognition systems. This project was adapted from https://github.com/hajix/G2P.

Dependencies

The following libraries are used:
pytorch
tqdm
matplotlib

Install dependencies using pip:

pip3 install -r requirements.txt

Dataset

The dataset used here was taken from site http://www.portaldalinguaportuguesa.org/, as well as some insertions made by me so that the dataset would give more coverage to common words in the daily life of the Brazilian Portuguese. Some ambiguities were also resolved as the intent of this dataset is to contain a specific speaker bias. The dictionary based on São Paulo speakers was chosen.

As in https://github.com/hajix/G2P, on which this implementation was based, you could easily provide and use your own language specific pronunciatin doctionary for training G2P. More details about data preparation and contribution could be found in resources.
Feel free to provide resources for other languages.

Attention Model

Both encoder-decoder seq2seq model and attention model could handle G2P problem. Here we train attention based model. The encoder model get sequence of graphemes and produces states at each timestep. Encoder states used during attention decoding. The decoder attends to appropriate encoder state (according to its state) and produces phonemes.

Train

To start training the model run:

python train.py

You can also use tensorboard to check the training loss:

tensorboard --logdir log --bind_all

Training parameters could be found at config.py.

Inference

To get pronunciation of a word:

# PT-BR example
python inference.py --sentence 'olá, vamos testar esse projeto.'
o|l|a| |,| |v|a|m|ʊ|s| |t|e|s|t|a| |e|s|i| |p|ɾ|o|ʒ|e|t|ʊ| |.

You could also visualize the attention weights, using --visualize:

# PT-BR example
python inference.py --visualize --sentence 'olá, vamos testar esse projeto.'
o|l|a| |,| |v|a|m|ʊ|s| |t|e|s|t|a| |e|s|i| |p|ɾ|o|ʒ|e|t|ʊ| |.

Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form.

Related tags

Overview

Neural G2P to portuguese language

Dependencies

Dataset

Attention Model

Train

Inference

Owner

fluz

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

NSFW A chatbot based on GPT2-chitchat

KoBERTopic은 BERTopic을 한국어 데이터에 적용할 수 있도록 토크나이저와 BERT를 수정한 코드입니다.

A Structured Self-attentive Sentence Embedding

Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular intervals.It sends out the most recent news at random!

A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).

Creating a Feed of MISP Events from ThreatFox (by abuse.ch)

Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement

Multilingual Emotion classification using BERT (fine-tuning). Published at the WASSA workshop (ACL2022).

Transformers-regression - Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing Regressions In NLP Model Updates

Data and code to support "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley)

Easy to use, state-of-the-art Neural Machine Translation for 100+ languages

CredData is a set of files including credentials in open source projects

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

Common Voice Dataset explorer

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Practical Natural Language Processing Tools for Humans is build on the top of Senna Natural Language Processing (NLP)

A demo of chinese asr

Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form.

Related tags

Overview

Neural G2P to portuguese language

Dependencies

Dataset

Attention Model

Train

Inference

Owner

fluz

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

**NSFW** A chatbot based on GPT2-chitchat

KoBERTopic은 BERTopic을 한국어 데이터에 적용할 수 있도록 토크나이저와 BERT를 수정한 코드입니다.

A Structured Self-attentive Sentence Embedding

Code for CodeT5: a new code-aware pre-trained encoder-decoder model.

This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular intervals.It sends out the most recent news at random!

A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).

Creating a Feed of MISP Events from ThreatFox (by abuse.ch)

Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement

Multilingual Emotion classification using BERT (fine-tuning). Published at the WASSA workshop (ACL2022).

Transformers-regression - Regression Bugs Are In Your Model! Measuring, Reducing and Analyzing Regressions In NLP Model Updates

Data and code to support "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley)

Easy to use, state-of-the-art Neural Machine Translation for 100+ languages

CredData is a set of files including credentials in open source projects

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.

Common Voice Dataset explorer

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Practical Natural Language Processing Tools for Humans is build on the top of Senna Natural Language Processing (NLP)

A demo of chinese asr

NSFW A chatbot based on GPT2-chitchat