Skip to content

ZainNaqavi/Media-WebScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Media-WebScraper

This program scrapes information and images for movies and TV shows.

Summary

For more information on the program, read the WebScrape_help text file (this can also be accessed while running the program).

For a given list of media, the program will scrape and save general information, images and any episode information for each media.

General Information (default):

Saved as a .txt file

This will scrape general information:

  • Title
  • Release date
  • Runtime
  • Genre
  • Director
  • Cast
  • Plot description

Additional information saved:

  • Source database used for scrape
  • ID for media in source database
  • Poster image link

Images (default):

Saved as a .jpg file

This will scrape the poster.

Episode Information (if specified):

Saved as a .csv file

This will scrape information for each episode for a TV show:

  • Season number
  • Episode number
  • Episode title
  • Episode air date
  • Episode description

Features:

  • Multithreaded scraping for media in list to greatly improve the time taken when scraping for large media lists.
  • Can generate a media list from folders and files in a specified directory or from user input.
  • Can specify save location for scraped data.
  • Can specify search tags for media list for a more accurate scrape.
  • Can choose to scrape all episode information for a TV show.
  • Can detect if data is already scraped which allows for scraping new media from an already scraped list of media very efficient.
  • Can recover missing scraped files if one or more are missing without rescraping all data.
  • Can retry the scrape before exiting the program if there were any incomplete scrapes (successfully scraped files will not be altered or rescraped).
  • Currently only supports scraping data from IMDb.

Usage:

For more information on the program, read the WebScrape_help text file (this can also be accessed while running the program).

Currently a terminal-based program.

Running the program using python:

  • Requirements: Python 3.2+ (additional libraries: requests, beautifulsoup4)

Running the program from bundled executable file (created using pyinstaller):

  • Requirements: Windows 10
  • Creates a 'temp' folder containing extracted libraries and support files in the same location as the program while running.
    • The temporary files will delete automatically but if the program is closed abruptly, the files will remain.
    • The 'temp' folder can be manually deleted after closing the program.
    • (As of pyinstaller v4.7, a one-file bundled executable will leave any temp '_MEIxxxxxx' folders if the program is force closed)

Updates:

For information on version history, read the HISTORY markdown file.

About

This program scrapes information and images for movies and TV shows.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages