Group project for MFIN7036. Our goal is to predict firm profitability with text-based competition measures.

Overview

NLP_0-project

Group project for MFIN7036. Our goal is to predict firm profitability with text-based competition measures1. We are a "democratic" and collaborative group of five, and I mentioned our names based on our initial work division below 😄 .

Here is the outline of our project:

Data collection.

@LeiyuanHuo, jyang130, FanFanShark, xdc1999, gaojiamin1116

  • Based on file data-WRDS-list.csv, write a web-scraping algorithm to download all 10-Ks (html format) these companies filed to the SEC within 2010 to 2022 at Historical EDGAR documents, and rename them data-10K-COMPNAME-Year.html.
  • Parse html files to extract Business and MD&A sections.

Text Processing: feature extraction2

  • Part of Speech Tagging (POS) (mainly this method) to get product name, descriptions. Store these for each company.
  • Named Entity Recognition (NER) (also mainly this method) to get mentioned competitor names. Store these for each company.
  • Product texts: BoW and tf-idf for each company's product(s), and hopefully we have a term-product matrix then.
  • Competitor texts: definitely BoW, as we care about the frequency of being mentioned.
  • ‼️ We also need to combine sector and firm size/market power into competitor texts and re-count.

Text Processing: feature transformation and representation2

  • Term-product matrix: calculate cosine similarity scores for products pairwise; use score threshold to cluster products into similar groups.
  • Term-product matrix: directly apply clustering method (e.g., KMeans clustering) to product vectors, and cluster them.

Econometric Analysis and Hypothesis Testing2

  • Multivariate regression: DV is profitability (e.g., sales, revenue, Tobin's q), IV is competition measures (one from similar product count, one from mentions as competitors), also include relevant control variables.
  • Cross-section portfolios: our competition measures are cross-sectional (one for each year), so we can create long-short portfolios for both measures, and examine stock return effects.

Footnotes

  1. Two papers inspired this project. Citations: Eisdorfer, A., Froot, K., Ozik, G., & Sadka, R. (2021). Competition Links and Stock Returns. The Review of Financial Studies, The Review of financial studies, 2021-12-20. && Hoberg, G., & Phillips, G. (2016). Text-Based Network Industries and Endogenous Product Differentiation. The Journal of Political Economy, 124(5), 1423-1465.

  2. Text processing processes are based on MFIN7036 Lecture_Notes and a review paper. Citation: Marty, T., Vanstone, B., & Hahn, T. (2020). News media analytics in finance: A survey. Accounting and Finance (Parkville), 60(2), 1385-1434. 2 3

Code for "Learning Graph Cellular Automata"

Learning Graph Cellular Automata This code implements the experiments from the NeurIPS 2021 paper: "Learning Graph Cellular Automata" Daniele Grattaro

Daniele Grattarola 37 Oct 26, 2022
DeepLab-ResNet rebuilt in TensorFlow

DeepLab-ResNet-TensorFlow This is an (re-)implementation of DeepLab-ResNet in TensorFlow for semantic image segmentation on the PASCAL VOC dataset. Fr

Vladimir 1.2k Nov 04, 2022
Wide Residual Networks (WideResNets) in PyTorch

Wide Residual Networks (WideResNets) in PyTorch WideResNets for CIFAR10/100 implemented in PyTorch. This implementation requires less GPU memory than

Jason Kuen 296 Dec 27, 2022
Code for training and evaluation of the model from "Language Generation with Recurrent Generative Adversarial Networks without Pre-training"

Language Generation with Recurrent Generative Adversarial Networks without Pre-training Code for training and evaluation of the model from "Language G

Amir Bar 253 Sep 14, 2022
Detectron2 is FAIR's next-generation platform for object detection and segmentation.

Detectron2 is Facebook AI Research's next generation software system that implements state-of-the-art object detection algorithms. It is a ground-up r

Facebook Research 23.3k Jan 08, 2023
(CVPR 2022) Energy-based Latent Aligner for Incremental Learning

Energy-based Latent Aligner for Incremental Learning Accepted to CVPR 2022 We illustrate an Incremental Learning model trained on a continuum of tasks

Joseph K J 37 Jan 03, 2023
Lipschitz-constrained Unsupervised Skill Discovery

Lipschitz-constrained Unsupervised Skill Discovery This repository is the official implementation of Seohong Park, Jongwook Choi*, Jaekyeom Kim*, Hong

Seohong Park 17 Dec 18, 2022
Highly comparative time-series analysis

〰️ hctsa 〰️ : highly comparative time-series analysis hctsa is a software package for running highly comparative time-series analysis using Matlab (fu

Ben Fulcher 569 Dec 21, 2022
Generalized Data Weighting via Class-level Gradient Manipulation

Generalized Data Weighting via Class-level Gradient Manipulation This repository is the official implementation of Generalized Data Weighting via Clas

18 Nov 12, 2022
Attentive Implicit Representation Networks (AIR-Nets)

Attentive Implicit Representation Networks (AIR-Nets) Preprint | Supplementary | Accepted at the International Conference on 3D Vision (3DV) teaser.mo

29 Dec 07, 2022
Demo project for real time anomaly detection using kafka and python

kafkaml-anomaly-detection Project for real time anomaly detection using kafka and python It's assumed that zookeeper and kafka are running in the loca

Rodrigo Arenas 36 Dec 12, 2022
The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".

Codebase for learning control flow in transformers The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformer

Csordás Róbert 24 Oct 15, 2022
All materials of Cassandra Event, Udyam'22

Cassandra 2022 Workspace Workshop Materials Workshop-1 Workshop-2 Workshop-3 Workshop-4 Assignments Assignment-1 Assignment-2 Assignment-3 Resources P

36 Dec 31, 2022
Referring Video Object Segmentation

Awesome-Referring-Video-Object-Segmentation Welcome to starts ⭐ & comments 💹 & sharing 😀 !! - 2021.12.12: Recent papers (from 2021) - welcome to ad

Explorer 57 Dec 11, 2022
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation

This is the official PyTorch implementation of the ALBEF paper [Blog]. This repository supports pre-training on custom datasets, as well as finetuning on VQA, SNLI-VE, NLVR2, Image-Text Retrieval on

Salesforce 805 Jan 09, 2023
Official Implementation of 'UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers' ICLR 2021(spotlight)

UPDeT Official Implementation of UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers (ICLR 2021 spotlight) The

hhhusiyi 96 Dec 22, 2022
Graph WaveNet apdapted for brain connectivity analysis.

Graph WaveNet for brain network analysis This is the implementation of the Graph WaveNet model used in our manuscript: S. Wein , A. Schüller, A. M. To

4 Dec 17, 2022
LeViT a Vision Transformer in ConvNet's Clothing for Faster Inference

LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference This repository contains PyTorch evaluation code, training code and pretrained

Facebook Research 504 Jan 02, 2023
Implementation of paper "DeepTag: A General Framework for Fiducial Marker Design and Detection"

Implementation of paper DeepTag: A General Framework for Fiducial Marker Design and Detection. Project page: https://herohuyongtao.github.io/research/

Yongtao Hu 46 Dec 12, 2022
lightweight python wrapper for vowpal wabbit

vowpal_porpoise Lightweight python wrapper for vowpal_wabbit. Why: Scalable, blazingly fast machine learning. Install Install vowpal_wabbit. Clone and

Joseph Reisinger 163 Nov 24, 2022