A flask application to predict the speech emotion of any .wav file.

Last update: Dec 15, 2021

Overview

This is a speech emotion recognition app. It will allow you to train a modular MLP model with the RAVDESS dataset, and then use that model with a flask application to predict the speech emotion of any .wav file.

REQS:

To download the RAVDESS speech emotion recognition data, go to: https://drive.google.com/file/d/1wWsrN2Ep7x6lWqOXfr4rpKGYrJhWc8z7/view

for installing all dependencie simply open terminal and run:

. ./install_deps.sh

This should create your venv and populate it with all necessary dependencies

MODEL:

A multilayer perceptron model to detect the emotion of wav files. To create and edit the model see create_model.py Once the create_model.py is adjusted to your liking (emotions_to_observe, and path to sound data), simply run:

python3 create_model.py

to create the model.model binary file and test accuracy of your model

APP:

Once the model.model binary is created, you can spin up the flask application (ToneCheck): To do so run

. ./start_flask.sh

The app will run default on localhost:5000, the emotions available for predictions will correspond with the emotions_to_observe variable you have edited inside create_models.py (and are therefore available inside the model binary file)

A flask application to predict the speech emotion of any .wav file.

Related tags

Overview

REQS:

MODEL:

APP:

Owner

Aryan Vijaywargia

source code for paper: WhiteningBERT: An Easy Unsupervised Sentence Embedding Approach.

Python library for parsing resumes using natural language processing and machine learning

Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5

Create a machine learning model which will predict if the mortgage will be approved or not based on 5 variables

STS Benchmark comprises a selection of the English datasets used in the STS tasks organized in the context of SemEval between 2012 and 2017. The selection of datasets include text from image captions, news headlines and user forums.

A PyTorch-based model pruning toolkit for pre-trained language models

VoiceFixer VoiceFixer is a framework for general speech restoration.

基于Transformer的单模型、多尺度的VAE模型

GPT-3: Language Models are Few-Shot Learners

Demo programs for the Talking Head Anime from a Single Image 2: More Expressive project.

NLP, before and after spaCy

File-based TF-IDF: Calculates keywords in a document, using a word corpus.

This repository serves as a place to document a toy attempt on how to create a generative text model in Catalan, based on GPT-2

Graph4nlp is the library for the easy use of Graph Neural Networks for NLP

a chinese segment base on crf

A method to generate speech across multiple speakers

Official code for "Parser-Free Virtual Try-on via Distilling Appearance Flows", CVPR 2021

CDLA: A Chinese document layout analysis (CDLA) dataset

Grapheme-to-phoneme (G2P) conversion is the process of generating pronunciation for words based on their written form.

Datasets of Automatic Keyphrase Extraction