NLP_0-project

Group project for MFIN7036. Our goal is to predict firm profitability with text-based competition measures¹. We are a "democratic" and collaborative group of five, and I mentioned our names based on our initial work division below 😄 .

Here is the outline of our project:

Data collection.

@LeiyuanHuo, jyang130, FanFanShark, xdc1999, gaojiamin1116

Based on file data-WRDS-list.csv, write a web-scraping algorithm to download all 10-Ks (html format) these companies filed to the SEC within 2010 to 2022 at Historical EDGAR documents, and rename them data-10K-COMPNAME-Year.html.
Parse html files to extract Business and MD&A sections.

Text Processing: feature extraction²

Part of Speech Tagging (POS) (mainly this method) to get product name, descriptions. Store these for each company.
Named Entity Recognition (NER) (also mainly this method) to get mentioned competitor names. Store these for each company.
Product texts: BoW and tf-idf for each company's product(s), and hopefully we have a term-product matrix then.
Competitor texts: definitely BoW, as we care about the frequency of being mentioned.
‼️ We also need to combine sector and firm size/market power into competitor texts and re-count.

Text Processing: feature transformation and representation²

Term-product matrix: calculate cosine similarity scores for products pairwise; use score threshold to cluster products into similar groups.
Term-product matrix: directly apply clustering method (e.g., KMeans clustering) to product vectors, and cluster them.

Econometric Analysis and Hypothesis Testing²

Multivariate regression: DV is profitability (e.g., sales, revenue, Tobin's q), IV is competition measures (one from similar product count, one from mentions as competitors), also include relevant control variables.
Cross-section portfolios: our competition measures are cross-sectional (one for each year), so we can create long-short portfolios for both measures, and examine stock return effects.

Two papers inspired this project. Citations: Eisdorfer, A., Froot, K., Ozik, G., & Sadka, R. (2021). Competition Links and Stock Returns. The Review of Financial Studies, The Review of financial studies, 2021-12-20. && Hoberg, G., & Phillips, G. (2016). Text-Based Network Industries and Endogenous Product Differentiation. The Journal of Political Economy, 124(5), 1423-1465. ↩
Text processing processes are based on MFIN7036 Lecture_Notes and a review paper. Citation: Marty, T., Vanstone, B., & Hahn, T. (2020). News media analytics in finance: A survey. Accounting and Finance (Parkville), 60(2), 1385-1434. ↩ ↩ ² ↩ ³

Group project for MFIN7036. Our goal is to predict firm profitability with text-based competition measures.

Related tags

Overview

NLP_0-project

Data collection.

Text Processing: feature extraction²

Text Processing: feature transformation and representation²

Econometric Analysis and Hypothesis Testing²

Owner

A visualisation tool for Deep Reinforcement Learning

EgoNN: Egocentric Neural Network for Point Cloud Based 6DoF Relocalization at the City Scale

Quantized models with python

This repository contains the code for our paper VDA (public in EMNLP2021 main conference)

A benchmark framework for Tensorflow

METER: Multimodal End-to-end TransformER

ADSPM: Attribute-Driven Spontaneous Motion in Unpaired Image Translation

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

SCALoss: Side and Corner Aligned Loss for Bounding Box Regression (AAAI2022).

Code implementation from my Medium blog post: [Transformers from Scratch in PyTorch]

MPI Interest Group on Algorithms on 1st semester 2021

Alpha-Zero - Telegram Group Manager Bot Written In Python Using Pyrogram

CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss

In this project we use both Resnet and Self-attention layer for cat, dog and flower classification.

Official implementation of Representer Point Selection via Local Jacobian Expansion for Post-hoc Classifier Explanation of Deep Neural Networks and Ensemble Models at NeurIPS 2021

Lightweight, Python library for fast and reproducible experimentation :microscope:

A PaddlePaddle implementation of Time Interval Aware Self-Attentive Sequential Recommendation.

EsViT: Efficient self-supervised Vision Transformers

(JMLR' 19) A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)

Projects for AI/ML and IoT integration for games and other presented at re:Invent 2021.

Group project for MFIN7036. Our goal is to predict firm profitability with text-based competition measures.

Related tags

Overview

NLP_0-project

Data collection.

Text Processing: feature extraction2

Text Processing: feature transformation and representation2

Econometric Analysis and Hypothesis Testing2

Footnotes

Owner

A visualisation tool for Deep Reinforcement Learning

EgoNN: Egocentric Neural Network for Point Cloud Based 6DoF Relocalization at the City Scale

Quantized models with python

This repository contains the code for our paper VDA (public in EMNLP2021 main conference)

A benchmark framework for Tensorflow

METER: Multimodal End-to-end TransformER

ADSPM: Attribute-Driven Spontaneous Motion in Unpaired Image Translation

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

SCALoss: Side and Corner Aligned Loss for Bounding Box Regression (AAAI2022).

Code implementation from my Medium blog post: [Transformers from Scratch in PyTorch]

MPI Interest Group on Algorithms on 1st semester 2021

Alpha-Zero - Telegram Group Manager Bot Written In Python Using Pyrogram

CAMoE + Dual SoftMax Loss (DSL): Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss

In this project we use both Resnet and Self-attention layer for cat, dog and flower classification.

Official implementation of Representer Point Selection via Local Jacobian Expansion for Post-hoc Classifier Explanation of Deep Neural Networks and Ensemble Models at NeurIPS 2021

Lightweight, Python library for fast and reproducible experimentation :microscope:

A PaddlePaddle implementation of Time Interval Aware Self-Attentive Sequential Recommendation.

EsViT: Efficient self-supervised Vision Transformers

(JMLR' 19) A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)

Projects for AI/ML and IoT integration for games and other presented at re:Invent 2021.

Text Processing: feature extraction²

Text Processing: feature transformation and representation²

Econometric Analysis and Hypothesis Testing²