Custom IMDB Dataset is extracted between 2020-2021 and custom distilBERT model is trained for movie success probability prediction

Last update: Jan 18, 2022

Related tags

Deep Learning IMDB-Success-Predictor

Overview

IMDB Success Predictor

Project involves Web Scraping custom IMDB data between 2020 and 2021 of 10000 movies and shows sorted by number of votes ,fine tuning a pre trained DistilBERT Transformer using Transfer Learning and then saving and reusing the saved model for further use.

Stack

DistilBERT Transformer
Tensorflow
Numpy and Pandas
Selenium, BeautifulSoup4 and requests

Metrics

Accuracy achieved: 81.3492%
ROC_AUC_Score achieved: 0.7217

Installation

1) Ensure Python and Jupyter Notebook are installed. Optionally Conda environment can also be used.

Install the required modules using

pip install -r requirements.txt 

or conda install -r requirements.txt

or !pip install -r requirements.txt for Google Colab.

Selenium requires browser specific drivers. Guides for Chrome and Firefox are mentioned below. Alternatively,this step is optional if the notebook is run on Google Colab.
Chrome: https://chromedriver.chromium.org/getting-started
Firefox: https://www.lambdatest.com/blog/selenium-firefox-driver-tutorial/

Training

1)(Optional) Run the IMDB Web scraper . This generates the already provided csv file and imdb_movies pickle file.

Run the IMDB Web scraper on an environment which has GPU acceleration. Here it is used with Google Colab where Nvidia Tesla T4 or Nvidia Tesla K80 are allocated.
```
Training Time: Roughly 20-25 mins
Epochs: 10
Training Batch Size: 8
Max length of each Sentence: 512 
```
A Movie_prediction_model directory is created with config.json file(provided) and a tf_model.h5 (not provided due to space constraints).

Usage

1) Ensure the model has been created inside Movie_prediction_model directory.

Run the python file using python DistilBERT_Movie_Classifier.py
Enter the description of the movie or TV show you want to predict for. An output will be generated with the binary prediction of success based of IMDB Ratings.

Custom IMDB Dataset is extracted between 2020-2021 and custom distilBERT model is trained for movie success probability prediction

Related tags

Overview

IMDB Success Predictor

Stack

Metrics

Installation

Training

Usage

Owner

Gautam Diwan

Code for the paper "Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks"

End-To-End Crowdsourcing

Yolact-keras实例分割模型在keras当中的实现

This Jupyter notebook shows one way to implement a simple first-order low-pass filter on sampled data in discrete time.

DeepConsensus uses gap-aware sequence transformers to correct errors in Pacific Biosciences (PacBio) Circular Consensus Sequencing (CCS) data.

Identifying a Training-Set Attack’s Target Using Renormalized Influence Estimation

[BMVC'21] Official PyTorch Implementation of Grounded Situation Recognition with Transformers

Automatically erase objects in the video, such as logo, text, etc.

Self-supervised learning (SSL) is a method of machine learning

Books, Presentations, Workshops, Notebook Labs, and Model Zoo for Software Engineers and Data Scientists wanting to learn the TF.Keras Machine Learning framework

Semiconductor Machine learning project

:fire: 2D and 3D Face alignment library build using pytorch

Convolutional neural network web app trained to track our infant’s sleep schedule using our Google Nest camera.

Project NII pytorch scripts

Language models are open knowledge graphs ( non official implementation )

The Noise Contrastive Estimation for softmax output written in Pytorch

Image restoration with neural networks but without learning.

This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".

Official repository for the paper "Instance-Conditioned GAN"

COPA-SSE contains crowdsourced explanations for the Balanced COPA dataset