Project: Netflix Data Analysis and Visualization with Python

Overview

Project: Netflix Data Analysis and Visualization with Python

MyNetflixDashboard

Table of Contents

  1. General Info
  2. Installation
  3. Demo
  4. Usage and Main Functionalities
  5. Contributing

General Info

This is a compact Data Visualization project I worked on for fun and to deepen my knowledge about visualizations and graphs using python libraries.

From conception and design to every line of code, the entire Dashboard was worked on by myself. During this project, I was able to repeat and deepen what I had previously learned in my Data Science course of study. Especially, I was able to familiarize myself with pandas and work on my data visualization skills, which I greatly enjoied!

The dataset I used for the Netflix data analytics task consists of my personal Netflix data, which I requested through their website. You can get access to your own data through this link. Feel free to download it and use my code to look into your own viewing behaviour :)

Installation

Requirements: Make sure you have Python 3.7+ installed on your computer. You can download the latest version of Python here.

Req. Packages:

  • pandas
  • dash
  • dash_bootstrap_components
  • ploty.express
  • plotly.graph_objects

Demo

Demo_MyNetflixDashboard_komprimiert.mov

Usage and Main Functionalities

Want to know more about your own Netflix behaviour? For test usage you can download your own Netflix data. Just follow this link and Netflix will send you your personal data.

Please also refer to the comments within the code itself to get more information on the functionalities of the program.


0. Preparing the data for analysis

  • This part cleans up the original data and prepares it for analysis.
  • In the process, columns that are not needed are dropped.
  • Time data is converted into appropriate time formats and split into several columns. The days of the week are added.
  • In addition, the titles of the movies/series are split (title, season number, episode name).

1. Analysis

  • This part of the code is about analyzing the data.
  • We find out how many movies or series were watched over the entire period. We also count the total number of hours Netflix was watched.
  • A pie chart is created that shows which days of the week are watched.
  • In addition, the top 10 series that were watched the longest (in terms of total duration) are displayed.
  • A line chart shows Netflix viewing behavior over the years, counting the total number of hours Netflix was watched.

NetflixOverTime

2. Dash App Layout

  • plotly's Dash is now used to create an Interactive Dashboard of Netflix data.
  • The individual graphics and texts are arranged in rows and containers.
  • This part also includes a dropdown menu that the user can interact with.

3. App Callback

  • Here we connect an interactive bar chart to the Dash Components.
  • The chart represents our total annual hours of Netflix watched, grouped by month. The chart is filterable by year.

MonthlyViews

Contributing

Your comments, suggestions, and contributions are welcome. Please feel free to contribute pull requests or create issues for bugs and feature requests.

Owner
Kathrin Hälbich
Data Science Student and PR- & Marketing-Expert
Kathrin Hälbich
CPSPEC is an astrophysical data reduction software for timing

CPSPEC manual Introduction CPSPEC is an astrophysical data reduction software for timing. Various timing properties, such as power spectra and cross s

Tenyo Kawamura 1 Oct 20, 2021
Template for a Dataflow Flex Template in Python

Dataflow Flex Template in Python This repository contains a template for a Dataflow Flex Template written in Python that can easily be used to build D

STOIX 5 Apr 28, 2022
talkbox is a scikit for signal/speech processing, to extend scipy capabilities in that domain.

talkbox is a scikit for signal/speech processing, to extend scipy capabilities in that domain.

David Cournapeau 76 Nov 30, 2022
A distributed block-based data storage and compute engine

Nebula is an extremely-fast end-to-end interactive big data analytics solution. Nebula is designed as a high-performance columnar data storage and tabular OLAP engine.

Columns AI 131 Dec 26, 2022
CINECA molecular dynamics tutorial set

High Performance Molecular Dynamics Logging into CINECA's computer systems To logon to the M100 system use the following command from an SSH client ss

J. W. Dell 0 Mar 13, 2022
Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano

PyMC3 is a Python package for Bayesian statistical modeling and Probabilistic Machine Learning focusing on advanced Markov chain Monte Carlo (MCMC) an

PyMC 7.2k Dec 30, 2022
A Python and R autograding solution

Otter-Grader Otter Grader is a light-weight, modular open-source autograder developed by the Data Science Education Program at UC Berkeley. It is desi

Infrastructure Team 93 Jan 03, 2023
Cold Brew: Distilling Graph Node Representations with Incomplete or Missing Neighborhoods

Cold Brew: Distilling Graph Node Representations with Incomplete or Missing Neighborhoods Introduction Graph Neural Networks (GNNs) have demonstrated

37 Dec 15, 2022
General Assembly's 2015 Data Science course in Washington, DC

DAT8 Course Repository Course materials for General Assembly's Data Science course in Washington, DC (8/18/15 - 10/29/15). Instructor: Kevin Markham (

Kevin Markham 1.6k Jan 07, 2023
Describing statistical models in Python using symbolic formulas

Patsy is a Python library for describing statistical models (especially linear models, or models that have a linear component) and building design mat

Python for Data 866 Dec 16, 2022
The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

About The ROOT system provides a set of OO frameworks with all the functionality needed to handle and analyze large amounts of data in a very efficien

ROOT 2k Dec 29, 2022
Python dataset creator to construct datasets composed of OpenFace extracted features and Shimmer3 GSR+ Sensor datas

Python dataset creator to construct datasets composed of OpenFace extracted features and Shimmer3 GSR+ Sensor datas

Gabriele 3 Jul 05, 2022
Port of dplyr and other related R packages in python, using pipda.

Unlike other similar packages in python that just mimic the piping syntax, datar follows the API designs from the original packages as much as possible, and is tested thoroughly with the cases from t

179 Dec 21, 2022
Integrate bus data from a variety of sources (batch processing and real time processing).

Purpose: This is integrate bus data from a variety of sources such as: csv, json api, sensor data ... into Relational Database (batch processing and r

1 Nov 25, 2021
Deep universal probabilistic programming with Python and PyTorch

Getting Started | Documentation | Community | Contributing Pyro is a flexible, scalable deep probabilistic programming library built on PyTorch. Notab

7.7k Dec 30, 2022
API>local_db>AWS_RDS - Disclaimer! All data used is for educational purposes only.

APIlocal_dbAWS_RDS Disclaimer! All data used is for educational purposes only. ETL pipeline diagram. Aim of project By creating a fully working pipe

0 Apr 25, 2022
Pipetools enables function composition similar to using Unix pipes.

Pipetools Complete documentation pipetools enables function composition similar to using Unix pipes. It allows forward-composition and piping of arbit

186 Dec 29, 2022
Data collection, enhancement, and metrics calculation.

l3_data_collection Data collection, enhancement, and metrics calculation. Summary Repository containing code for QuantDAO's JDT data collection task.

Ruiwyn 3 Dec 23, 2022
Statistical & Probabilistic Analysis of Store Sales, University Survey, & Manufacturing data

Statistical_Modelling Statistical & Probabilistic Analysis of Store Sales, University Survey, & Manufacturing data Statistical Methods for Decision Ma

Avnika Mehta 1 Jan 27, 2022
A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms

MatrixProfile MatrixProfile is a Python 3 library, brought to you by the Matrix Profile Foundation, for mining time series data. The Matrix Profile is

Matrix Profile Foundation 302 Dec 29, 2022