A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.

Last update: Mar 28, 2022

Overview

New to Streaming Scraper

An in-progress web scraping project built with Python, R, and SQL.

The scraped data are movie and TV show information. The goal of the project is to show new to streaming titles that arrive on Netflix monthly with additional details, such as critic and audience ratings.

Current stage: Preparing how to present data with R Markdown.

Testing at: https://charlesdungy.github.io/new-to-streaming-scraper/

Future stage: Complete documentation, comments.

Description

Data are retrieved from two different data sources: What's on Netflix (WON) and Rotten Tomatoes (RT). RT data are cleaned and transformed with Python, while WON data are cleaned and transformed with R.

All data are piped into a MySQL database, then retrieved for presentation in R.

Here is a high-level look at the pipeline:

Data Source 1 is WON data. Data Source 2 is RT data.

Main Packages/Tools

Python

R

SQL

MySQL

Current Directory Tree

License

MIT

A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.

Related tags

Overview

New to Streaming Scraper

Description

Data Source 1 is WON data. Data Source 2 is RT data.

Main Packages/Tools

Python

R

SQL

Current Directory Tree

License

Owner

Charles Dungy

优化版本的京东茅台抢购神器

News, full-text, and article metadata extraction in Python 3. Advanced docs:

tweet random sand cat pictures

Web scraped S&P 500 Data from Wikipedia using Pandas and performed Exploratory Data Analysis on the data.

Introduction to WebScraping Workshop - Semcomp 24 Beta

This tool crawls a list of websites and download all PDF and office documents

Semplice scraper realizzato in Python tramite la libreria BeautifulSoup

Basic-html-scraper - A complete how to of web scraping with Python for beginners

Automated Linkedin bot that will improve your visibility and increase your network.

让中国用户使用git从github下载的速度提高1000倍!

Scrap the 42 Intranet's elearning videos in a single click

Web Content Retrieval for Humans™

基于Github Action的定时HITsz疫情上报脚本，开箱即用

Google Developer Profile Badge Scraper

Incredibly fast crawler designed for OSINT.

A webdriver-based script for reserving Tsinghua badminton courts.

Scraping weather data using Python to receive umbrella reminders

An application that on a given url, crowls a web page and gets all words, sorts and counts them.

Anonymously scrapes onlinesim.ru for new usable phone numbers.

This is a web scraper, using Python framework Scrapy, built to extract data from the Deals of the Day section on Mercado Livre website.