A simple, configurable and expandable combined shop scraper to minimize the costs of ordering several items

Last update: Dec 13, 2021

Overview

combined-shop-scraper

A simple, configurable and expandable combined shop scraper to minimize the costs of ordering several items.

Features

Define an input file components.json with components to be scraped and the source urls
Find the cheapest order combination including the shipping prices
Get alarm prices when single components are below a defined price
Easily expand for new shops (scraping basic know-how required). Default basic support for notebooksbilliger, cyberport and future-x

Usage

JSON file definition

The default name of the input JSON file is components.json and must be located in the same folder as scraper.py. This is the basic structure of the file:

{
  "component1": {
    "alarm_price": 260,
    "quantity": 1,
    "urls": [
      "https://www.someshop.com/component1",
      "https://www.someshop.com/component1-alternative",
      "https://www.anothershop.com/component1-alternative"]
  },
  "component2": {
    "urls": [
      "https://www.someshop.com/component2",
      "https://www.anothershop.com/component2",
      "https://www.onemoreshop.com/component2"]
  }

The component name and at least one url are mandatory. It is possible to add several urls from the same shop for the same component if there are some alternatives for this. The quantity of each component defaults to 1, the alarm price is optional.

Execution

Just call the script scraper.py from within the folder, so the components.json file can be found. It will print an overview of the ideal order to minimize the overall cost. The program runs just once and does not keep tracking prices in the background. As usual with scraping, be gentle and fair and don't abuse this program.

Addition of new shops

If you want to add a new shop, you need to edit the file shops.py and:

Enter the significant part of the shop url in the method Shop._get_shops_dict and define a new class type (child of Shop)
Implement the methods _process_soup and get_shipping_cost for the new class. Use the existing classes as reference for the data you need to scrap.
Add your new urls to the input file!

License

See the LICENSE for license details.

A simple, configurable and expandable combined shop scraper to minimize the costs of ordering several items

Related tags

Overview

combined-shop-scraper

Features

Usage

JSON file definition

Execution

Addition of new shops

License

Owner

An utility library to scrape data from TikTok, Instagram, Twitch, Youtube, Twitter or Reddit in one line!

Scrapes mcc-mnc.com and outputs 3 files with the data (JSON, CSV & XLSX)

This is a module that I had created along with my friend. It's a basic web scraping module

A spider for Universal Online Judge(UOJ) system, converting problem pages to PDFs.

Create crawler get some new products with maximum discount in banimode website

京东茅台抢购最新优化版本，京东秒杀，添加误差时间调整，优化了茅台抢购进程队列

Scrapping Connections' info on Linkedin

A Spider for BiliBili comments with a simple API server.

Scrapegoat is a python library that can be used to scrape the websites from internet based on the relevance of the given topic irrespective of language using Natural Language Processing

A web crawler script that crawls the target website and lists its links

Using Selenium with Python to Web Scrap Popular Youtube Tech Channels.

Tool to scan for secret files on HTTP servers

Minecraft Item Scraper

Web Scraping Practica With Python

Web Scraping OLX with Python and Bsoup.

Instagram_scrapper - This project allow you to scrape the list of followers, following or both from a public Instagram account, and create a csv or excel file easily.

京东茅台抢购

Explore scraping with BeautifulSoup!

Scrapy uses Request and Response objects for crawling web sites.

A dead simple crawler to get books information from Douban.