EmoBERT-MLOps

The goal of this repository is to build an end-to-end MLOps pipeline based on the MLOps course from Made with ML, but this project have some differences on design, tools and frameworks used, with the objective to practice and give a different angle and implementation to the original course.

This project uses a BERT model for emotion classification and is based on the GoEmotions dataset.

Content list

TODO

Dataset descrition

Taken from https://ai.googleblog.com/2021/10/goemotions-dataset-for-fine-grained.html

In “GoEmotions: A Dataset of Fine-Grained Emotions”, we describe GoEmotions, a human-annotated dataset of 58k Reddit comments extracted from popular English-language subreddits and labeled with 27 emotion categories. As the largest fully annotated English language fine-grained emotion dataset to date, we designed the GoEmotions taxonomy with both psychology and data applicability in mind. In contrast to the basic six emotions, which include only one positive emotion (joy), our taxonomy includes 12 positive, 11 negative, 4 ambiguous emotion categories and 1 “neutral”, making it widely suitable for conversation understanding tasks that require a subtle differentiation between emotion expressions.

Model descrition

TODO

End-to-end MLOps pipeline of a BERT model for emotion classification.

Related tags

Overview

EmoBERT-MLOps

Content list

Dataset descrition

Model descrition

Owner

Dimitre Oliveira

Code repository of the paper Neural circuit policies enabling auditable autonomy published in Nature Machine Intelligence

🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy

A Plover python dictionary allowing for consistent symbol input with specification of attachment and capitalisation in one stroke.

Journalism AI – Quotes extraction for modular journalism

Linear programming solver for paper-reviewer matching and mind-matching

Twitter-Sentiment-Analysis - Twitter sentiment analysis for india's top online retailers(2019 to 2022)

DiY Oxygen Concentrator based on the OxiKit

A Transformer Implementation that is easy to understand and customizable.

Tensorflow implementation of paper: Learning to Diagnose with LSTM Recurrent Neural Networks.

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

🚀Clone a voice in 5 seconds to generate arbitrary speech in real-time

Gold standard corpus annotated with verb-preverb connections for Hungarian.

Code-autocomplete, a code completion plugin for Python

SpikeX - SpaCy Pipes for Knowledge Extraction

scikit-learn wrappers for Python fastText.

Trex is a tool to match semantically similar functions based on transfer learning.

Задания КЕГЭ по информатике 2021 на Python

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation (SIGGRAPH Asia 2021)

An Explainable Leaderboard for NLP

PhoNLP: A BERT-based multi-task learning toolkit for part-of-speech tagging, named entity recognition and dependency parsing