Sentiment Analysis Project

This project contains two sentiment analysis programs for Hotel Reviews using a Hotel Reviews dataset from Datafiniti. The training models for this Machine Learning project are built through Count Vectorizer (for the countvectorizer.py program) and TF-IDF Vectorizer (for the tdidf.py program). You can see the difference in implementation and accuracy results through both types of Vectorizers by running the programs separately (usually, TF-IDF Vectorizer is considered more accurate).

System Requirements

Use the pip install command to install the following imports:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
from sklearn import svm
from sklearn.neighbors import KNeighborsClassifier

Usage (description of actions performed)

1. dataset imported
2. null values deleted
3. 30% representative sample is taken to avoid slow down of system
4. sentiments column added
5. input training features and labels defined
6. dataset split into training sets and testing sets
7. text data vectorizer (using CountVectorizer or TF-IDF Vectorizer)
8. models trained:
 -  Logistic Regression (linear clasification)
 -  Support Vector Machine (linear/non-linear data separated into classes by a line/hyperplane)
 -  K Nearest Neighbor (local approximation)
9. print Accuracy Scores, Confusion Matrix, Ture Positive and Negative Rates for all three models

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Sentiment Analysis Project using Count Vectorizer and TF-IDF Vectorizer

Related tags

Overview

Sentiment Analysis Project

System Requirements

Usage (description of actions performed)

Contributing

License

Owner

Simran Farrukh

Legal text retrieval for python

[WWW 2021 GLB] New Benchmarks for Learning on Non-Homophilous Graphs

LeBenchmark: a reproducible framework for assessing SSL from speech

Pytorch implementation of Tacotron

Code of paper: A Recurrent Vision-and-Language BERT for Navigation

Wikipedia-Utils: Preprocessing Wikipedia Texts for NLP

Beyond Accuracy: Behavioral Testing of NLP models with CheckList

Pretty-doc - Composable text objects with python

BERTopic is a topic modeling technique that leverages 🤗 transformers and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions

Linking data between GBIF, Biodiverse, and Open Tree of Life

A modular Karton Framework service that unpacks common packers like UPX and others using the Qiling Framework.

neural network based speaker embedder

gaiic2021-track3-小布助手对话短文本语义匹配复赛rank3、决赛rank4

Natural language Understanding Toolkit

Diaformer: Automatic Diagnosis via Symptoms Sequence Generation

华为商城抢购手机的Python脚本 Python script of Huawei Store snapping up mobile phones

Unsupervised text tokenizer focused on computational efficiency

Watson Natural Language Understanding and Knowledge Studio

NLP and Text Generation Experiments in TensorFlow 2.x / 1.x

Predict the spans of toxic posts that were responsible for the toxic label of the posts