IMDB Success Predictor

Project involves Web Scraping custom IMDB data between 2020 and 2021 of 10000 movies and shows sorted by number of votes ,fine tuning a pre trained DistilBERT Transformer using Transfer Learning and then saving and reusing the saved model for further use.

Stack

DistilBERT Transformer
Tensorflow
Numpy and Pandas
Selenium, BeautifulSoup4 and requests

Metrics

Accuracy achieved: 81.3492%
ROC_AUC_Score achieved: 0.7217

Installation

1) Ensure Python and Jupyter Notebook are installed. Optionally Conda environment can also be used.

Install the required modules using

pip install -r requirements.txt 

or conda install -r requirements.txt

or !pip install -r requirements.txt for Google Colab.

Selenium requires browser specific drivers. Guides for Chrome and Firefox are mentioned below. Alternatively,this step is optional if the notebook is run on Google Colab.
Chrome: https://chromedriver.chromium.org/getting-started
Firefox: https://www.lambdatest.com/blog/selenium-firefox-driver-tutorial/

Training

1)(Optional) Run the IMDB Web scraper . This generates the already provided csv file and imdb_movies pickle file.

Run the IMDB Web scraper on an environment which has GPU acceleration. Here it is used with Google Colab where Nvidia Tesla T4 or Nvidia Tesla K80 are allocated.
```
Training Time: Roughly 20-25 mins
Epochs: 10
Training Batch Size: 8
Max length of each Sentence: 512 
```
A Movie_prediction_model directory is created with config.json file(provided) and a tf_model.h5 (not provided due to space constraints).

Usage

1) Ensure the model has been created inside Movie_prediction_model directory.

Run the python file using python DistilBERT_Movie_Classifier.py
Enter the description of the movie or TV show you want to predict for. An output will be generated with the binary prediction of success based of IMDB Ratings.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Movie_prediction_model		Movie_prediction_model
.gitattributes		.gitattributes
DistilBERT_Movie_Classifier.py		DistilBERT_Movie_Classifier.py
IMDB Web scraper.ipynb		IMDB Web scraper.ipynb
IMDB_Movie_Analysis.ipynb		IMDB_Movie_Analysis.ipynb
README.md		README.md
Released between 2020-01-01 and 2020-12-31 (Sorted by Number of Votes Descending).csv		Released between 2020-01-01 and 2020-12-31 (Sorted by Number of Votes Descending).csv
imdb_movies		imdb_movies
model_metrics		model_metrics
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Movie_prediction_model

Movie_prediction_model

.gitattributes

.gitattributes

DistilBERT_Movie_Classifier.py

DistilBERT_Movie_Classifier.py

IMDB Web scraper.ipynb

IMDB Web scraper.ipynb

IMDB_Movie_Analysis.ipynb

IMDB_Movie_Analysis.ipynb

README.md

README.md

Released between 2020-01-01 and 2020-12-31 (Sorted by Number of Votes Descending).csv

Released between 2020-01-01 and 2020-12-31 (Sorted by Number of Votes Descending).csv

imdb_movies

imdb_movies

model_metrics

model_metrics

requirements.txt

requirements.txt

Repository files navigation

IMDB Success Predictor

Stack

Metrics

Installation

Training

Usage

About

Releases

Packages

Languages

Gautam-Diwan/IMDB-Success-Predictor

Folders and files

Latest commit

History

Repository files navigation

IMDB Success Predictor

Stack

Metrics

Installation

Training

Usage

About

Resources

Stars

Watchers

Forks

Languages