Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

Last update: Jan 12, 2022

Overview

Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

The main part of the work focuses on the exploration and study of different approaches which are used for Sentiment Analysis (e.g. Bag of Words, TF-IDF, Word Embeddings). In addition, the work utilizes and compares different classification algorithms for Sentiment Analysis tasks in Natural Language Processing (e.g. Tree based Algorithms, Linear Models and Support Vector Machines).

Author: Nikolas Petrou, MSc in Data Science

Technical-Report and Code Availability

The complete text and analysis of the work is available and located in EDA-and-Sentiment-Analysis-on IMDB-Dataset.pdf file
The implementation and code of the project is located in the Implementation-Python Files folder.

Overview

The goal of this work focuses on the exploration and study of different approaches which are used for Sentiment Analysis (e.g. Bag of Words, TF-IDF, Word Embeddings). In addition, the work utilizes and compares different classification algorithms for Sentiment Analysis tasks in Natural Language Processing (e.g. Tree based Algorithms, Linear Models and Support Vector Machines).

Dataset

For this work, a large dataset which consists of movie reviews was used. Specifically, the publicly available Internet Movie Database (IMDB) review dataset

The data can be obtained from Kaggle or direcetly from Stanford

Methodology

An abstract methodology scheme of the work is illustrated in the following Figure.

Summarizing, firstly the initial questions were set in respect to the used dataset. Subsequentially, the data scrapping and data collection were performed. In addition, after the data preprocessing steps were performed, different data analytics and analysis were ,employed in order to better understand the data insights. Finally, during the final analysis, different methodologies and models were utilized in order to classify the textual data based on the sentiment. It is crucial to mention that the whole processed followed a cyclical scheme.

Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

Related tags

Overview

Sentiment-Analysis and EDA on the IMDB Movie Review Dataset

Technical-Report and Code Availability

Overview

Dataset

Methodology

Owner

Nikolas Petrou

Unofficial PyTorch implementation of Google AI's VoiceFilter system

Tokenizer - Module python d'analyse syntaxique et de grammaire, tokenization

Implementation for paper BLEU: a Method for Automatic Evaluation of Machine Translation

A 30000+ Chinese MRC dataset - Delta Reading Comprehension Dataset

Code of paper: A Recurrent Vision-and-Language BERT for Navigation

End-2-end speech synthesis with recurrent neural networks

Towards Nonlinear Disentanglement in Natural Data with Temporal Sparse Coding

Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement

🤕 spelling exceptions builder for lazy people

Textlesslib - Library for Textless Spoken Language Processing

Generating Korean Slogans with phonetic and structural repetition

A unified tokenization tool for Images, Chinese and English.

HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools

Minimal GUI for accessing the Watson Text to Speech service.

AI Assistant for Building Reliable, High-performing and Fair Multilingual NLP Systems

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

A Fast Command Analyser based on Dict and Pydantic

Practical Machine Learning with Python

An extension for asreview implements a version of the tf-idf feature extractor that saves the matrix and the vocabulary.

Mapping a variable-length sentence to a fixed-length vector using BERT model