Package for extracting emotions from social media text. Tailored for financial data.

Overview

EmTract: Extracting Emotions from Social Media Text Tailored for Financial Contexts

EmTract is a tool that extracts emotions from social media text. It incorporates key aspects of social media data (e.g., non-standard phrases, emojis and emoticons), and uses cutting edge natural language processing (NLP) techniques to learn latent representations, such as word order, word usage, and local context, to predict the emotions.

Details on the model and text processing are in the appendix of EmTract: Investor Emotions and Market Behavior.

User Guide

Installation

Before being able to use the package python3 must be installed. We also recommend using a virtual environment so that the tool runs with the same dependencies with which it was developed. Instruction on how to set up a virtual environment can be found here.

Once basic requirements are setup, follow these instructions:

  1. Clone the repository: git clone https://github.com/dvamossy/EmTract.git
  2. Navigate into repository: cd EmTract
  3. (Optional) Create and activate virtual environment:
    python3 -m venv venv
    source venv/bin/activate
    
  4. Run ./install.sh. This will install python requirements and also download our model files

Usage

Our package should be run with the following command:

python3 -m emtract.inference [args]

Where args are the following:

  • --model_type: can be twitter or stocktwits. Default is stocktwits
  • --interactive: Run in interactive mode
  • --input_file/-i: input to use for predictions (only for non interactive mode)
  • --output_file/-o: output location for predictions(only for non interactive mode)

Output

For each input (i.e., text), EmTract outputs probabilities (they sum to 1!) corresponding to seven emotional states: neutral, happy, sad, anger, disgust, surprise, fear. It also labels the text by computing the argmax of the probabilities.

Modes

Our tool can be run in 2 execution modes.

Interactive mode allows the user to input a tweet and evaluate it in real time. This is great for exploratory analysis.

python3 -m emtract.inference --interactive

The other mode is intended for automating predictions. Here an input file must be specified that will be used as the prediction input. This file must be a csv or text file with 1 column. This column should have the messages/text to predict with.

python3 -m emtract.inference -i tweets_example.csv -o predictions.csv

Model Types

Our models leverage GloVe Embeddings with Bidirectional GRU architecture.

We trained our emotion models with 2 different data sources. One from Twitter, and another from StockTwits. The Twitter training data comes from here; it is available at data/twitter_emotion.csv. The StockTwits training data is explained in the paper.

One of the key concerns using emotion packages is that it is unknown how well they transfer to financial text data. We alleviate this concern by hand-tagging 10,000 StockTwits messages. These are available at data/hand_tagged_sample.parquet.snappy; they were not included during training any of our models. We use this for testing model performance, and alternative emotion packages (notebooks/Alternative Packages.ipynb).

We found our StockTwits model to perform best on the hand-tagged sample, and therefore it is used as the default for predictions.

Alternative Models

We also have an implementation of DistilBERT in notebooks/Alternative Models.ipynb on the Twitter data; which can be easily extended to any other state-of-the-art models. We find marginal performance gains on the hand-tagged sample, which comes at the cost of far slower inference.

Citation

If you use EmTract in your research, please cite us as follows:

Domonkos Vamossy and Rolf Skog. EmTract: Investor Emotions and Market Behavior https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3975884, 2021.

Contributing and Feedback

This project welcomes contributions and suggestions.

Our goal is to provide a unified framework for extracting emotions from financial social media text. Particularly useful for research on emotions in financial contexts would be labeling financial social media text. We plan to upload sample text upon request.

Sparse-dense operators implementation for Paddle

Sparse-dense operators implementation for Paddle This module implements coo, csc and csr matrix formats and their inter-ops with dense matrices. Feel

北海若 3 Dec 17, 2022
PyTorch GPU implementation of the ES-RNN model for time series forecasting

Fast ES-RNN: A GPU Implementation of the ES-RNN Algorithm A GPU-enabled version of the hybrid ES-RNN model by Slawek et al that won the M4 time-series

Kaung 305 Jan 03, 2023
OpenLT: An open-source project for long-tail classification

OpenLT: An open-source project for long-tail classification Supported Methods for Long-tailed Recognition: Cross-Entropy Loss Focal Loss (ICCV'17) Cla

Ming Li 37 Sep 15, 2022
Code for ACL 2019 Paper: "COMET: Commonsense Transformers for Automatic Knowledge Graph Construction"

To run a generation experiment (either conceptnet or atomic), follow these instructions: First Steps First clone, the repo: git clone https://github.c

Antoine Bosselut 575 Jan 01, 2023
Removing Inter-Experimental Variability from Functional Data in Systems Neuroscience

Removing Inter-Experimental Variability from Functional Data in Systems Neuroscience This repository is the official implementation of [https://www.bi

Eulerlab 6 Oct 09, 2022
A Python package to process & model ChEMBL data.

insilico: A Python package to process & model ChEMBL data. ChEMBL is a manually curated chemical database of bioactive molecules with drug-like proper

Steven Newton 0 Dec 09, 2021
A custom DeepStack model that has been trained detecting ONLY the USPS logo

This repository provides a custom DeepStack model that has been trained detecting ONLY the USPS logo. This was created after I discovered that the Deepstack OpenLogo custom model I was using did not

Stephen Stratoti 9 Dec 27, 2022
ShuttleNet: Position-aware Fusion of Rally Progress and Player Styles for Stroke Forecasting in Badminton (AAAI'22)

ShuttleNet: Position-aware Rally Progress and Player Styles Fusion for Stroke Forecasting in Badminton (AAAI 2022) Official code of the paper ShuttleN

Wei-Yao Wang 11 Nov 30, 2022
Air Quality Prediction Using LSTM

AirQualityPredictionUsingLSTM In this Repo, i present to you the winning solution of smart gujarat hackathon 2019 where the task was to predict the qu

Deepak Nandwani 2 Dec 13, 2022
Code release for SLIP Self-supervision meets Language-Image Pre-training

SLIP: Self-supervision meets Language-Image Pre-training What you can find in this repo: Pre-trained models (with ViT-Small, Base, Large) and code to

Meta Research 621 Dec 31, 2022
Multi Camera Calibration

Multi Camera Calibration 'modules/camera_calibration/app/camera_calibration.cpp' is for calculating extrinsic parameter of each individual cameras. 'm

7 Dec 01, 2022
Unified Instance and Knowledge Alignment Pretraining for Aspect-based Sentiment Analysis

Unified Instance and Knowledge Alignment Pretraining for Aspect-based Sentiment Analysis Requirements python 3.7 pytorch-gpu 1.7 numpy 1.19.4 pytorch_

12 Oct 29, 2022
Create UIs for prototyping your machine learning model in 3 minutes

Note: We just launched Hosted, where anyone can upload their interface for permanent hosting. Check it out! Welcome to Gradio Quickly create customiza

Gradio 11.7k Jan 07, 2023
Pytorch-Swin-Unet-V2 - a modified version of Swin Unet based on Swin Transfomer V2

Swin Unet V2 Swin Unet V2 is a modified version of Swin Unet arxiv based on Swin

Chenxu Peng 26 Dec 03, 2022
A neuroanatomy-based augmented reality experience powered by computer vision. Features 3D visuals of the Atlas Brain Map slices.

Brain Augmented Reality (AR) A neuroanatomy-based augmented reality experience powered by computer vision that features 3D visuals of the Atlas Brain

Yasmeen Brain 10 Oct 06, 2022
Code for 'Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning' (AAAI 2022)

Blockwise Sequential Model Learning Code for 'Blockwise Sequential Model Learning for Partially Observable Reinforcement Learning' (AAAI 2022) For ins

2 Jun 17, 2022
[NeurIPS'21] "AugMax: Adversarial Composition of Random Augmentations for Robust Training" by Haotao Wang, Chaowei Xiao, Jean Kossaifi, Zhiding Yu, Animashree Anandkumar, and Zhangyang Wang.

[NeurIPS'21] "AugMax: Adversarial Composition of Random Augmentations for Robust Training" by Haotao Wang, Chaowei Xiao, Jean Kossaifi, Zhiding Yu, Animashree Anandkumar, and Zhangyang Wang.

VITA 112 Nov 07, 2022
Occlusion robust 3D face reconstruction model in CFR-GAN (WACV 2022)

Occlusion Robust 3D face Reconstruction Yeong-Joon Ju, Gun-Hee Lee, Jung-Ho Hong, and Seong-Whan Lee Code for Occlusion Robust 3D Face Reconstruction

Yeongjoon 31 Dec 19, 2022
Code accompanying the paper Shared Independent Component Analysis for Multi-subject Neuroimaging

ShICA Code accompanying the paper Shared Independent Component Analysis for Multi-subject Neuroimaging Install Move into the ShICA directory cd ShICA

8 Nov 07, 2022