A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks

Last update: Jan 07, 2023

Overview

pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

A Transformer-based library for SocialNLP classification tasks.

Currently supports:

Sentiment Analysis (Spanish, English)
Emotion Analysis (Spanish, English)

Just do pip install pysentimiento and start using it:

from pysentimiento import SentimentAnalyzer
analyzer = SentimentAnalyzer(lang="es")

analyzer.predict("Qué gran jugador es Messi")
# returns SentimentOutput(output=POS, probas={POS: 0.998, NEG: 0.002, NEU: 0.000})
analyzer.predict("Esto es pésimo")
# returns SentimentOutput(output=NEG, probas={NEG: 0.999, POS: 0.001, NEU: 0.000})
analyzer.predict("Qué es esto?")
# returns SentimentOutput(output=NEU, probas={NEU: 0.993, NEG: 0.005, POS: 0.002})

analyzer.predict("jejeje no te creo mucho")
# SentimentOutput(output=NEG, probas={NEG: 0.587, NEU: 0.408, POS: 0.005})
"""
Emotion Analysis in English
"""

emotion_analyzer = EmotionAnalyzer(lang="en")

emotion_analyzer.predict("yayyy")
# returns EmotionOutput(output=joy, probas={joy: 0.723, others: 0.198, surprise: 0.038, disgust: 0.011, sadness: 0.011, fear: 0.010, anger: 0.009})
emotion_analyzer.predict("fuck off")
# returns EmotionOutput(output=anger, probas={anger: 0.798, surprise: 0.055, fear: 0.040, disgust: 0.036, joy: 0.028, others: 0.023, sadness: 0.019})

Also, you might use pretrained models directly with transformers library.

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("finiteautomata/beto-sentiment-analysis")

model = AutoModelForSequenceClassification.from_pretrained("finiteautomata/beto-sentiment-analysis")

Preprocessing

pysentimiento features a tweet preprocessor specially suited for tweet classification with transformer-based models.

from pysentimiento.preprocessing import preprocess_tweet

# Replaces user handles and URLs by special tokens
preprocess_tweet("@perezjotaeme debería cambiar esto http://bit.ly/sarasa") # "@usuario debería cambiar esto url"

# Shortens repeated characters
preprocess_tweet("no entiendo naaaaaaaadaaaaaaaa", shorten=2) # "no entiendo naadaa"

# Normalizes laughters
preprocess_tweet("jajajajaajjajaajajaja no lo puedo creer ajajaj") # "jaja no lo puedo creer jaja"

# Handles hashtags
preprocess_tweet("esto es #UnaGenialidad")
# "esto es una genialidad"

# Handles emojis
preprocess_tweet("🎉🎉", lang="en")
# 'emoji party popper emoji emoji party popper emoji'

Trained models so far

Check CLASSIFIERS.md for details on the reported performances of each model.

Spanish models

English models

Instructions for developers

First, download TASS 2020 data to data/tass2020 (you have to register here to download the dataset)

Labels must be placed under data/tass2020/test1.1/labels

Run script to train models

Check TRAIN_EVALUATE.md

Upload models to Huggingface's Model Hub

Check "Model sharing and upload" instructions in huggingface docs.

License

pysentimiento is an open-source library. However, please be aware that models are trained with third-party datasets and are subject to their respective licenses, many of which are for non-commercial use

TASS Dataset license (License for Sentiment Analysis in Spanish, Emotion Analysis in Spanish & English)
SEMEval 2017 Dataset license (Sentiment Analysis in English)

Citation

If you use pysentimiento in your work, please cite this paper

@misc{perez2021pysentimiento,
      title={pysentimiento: A Python Toolkit for Sentiment Analysis and SocialNLP tasks},
      author={Juan Manuel Pérez and Juan Carlos Giudici and Franco Luque},
      year={2021},
      eprint={2106.09462},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

TODO:

Upload some other models
Train in other languages

Suggestions and bugfixes

Please use the repository issue tracker to point out bugs and make suggestions (new models, use another datasets, some other languages, etc)

A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks

Related tags

Overview

pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

Preprocessing

Trained models so far

Spanish models

English models

Instructions for developers

License

Citation

TODO:

Suggestions and bugfixes

Owner

A spatial genome aligner for analyzing multiplexed DNA-FISH imaging data.

B-cos Networks: Attention is All we Need for Interpretability

Membership Inference Attack against Graph Neural Networks

[CVPR 2021] Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion

SciPy fixes and extensions

QICK: Quantum Instrumentation Control Kit

Edge-aware Guidance Fusion Network for RGB-Thermal Scene Parsing

Build tensorflow keras model pipelines in a single line of code. Created by Ram Seshadri. Collaborators welcome. Permission granted upon request.

Repo for our ICML21 paper Unsupervised Learning of Visual 3D Keypoints for Control

Improved Fitness Optimization Landscapes for Sequence Design

Tree Nested PyTorch Tensor Lib

Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data

Implementation of StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation in PyTorch

Deep Learning for Time Series Classification

Official code for ICCV2021 paper "M3D-VTON: A Monocular-to-3D Virtual Try-on Network"

Unsupervised Foreground Extraction via Deep Region Competition

Official codes: Self-Supervised Learning by Estimating Twin Class Distribution

Suite of 500 procedurally-generated NLP tasks to study language model adaptability

Action Segmentation Evaluation

Code for Understanding Pooling in Graph Neural Networks