Amazon Scraper: A command-line tool for scraping Amazon product data

Overview

Amazon Product Scraper: 2021

Description

A command-line tool for scraping Amazon product data to CSV or JSON format(s).

Requirements

  • Python 3
  • pip3

Installation

Using git clone (you'll need git installed for this):

git clone https://github.com/scrapewalrus/amazon-scraper-python-2021

Or download and extract the zip file of the project manually

You'll also need to install requirements for the project to run. Locate amazon-product-scraper folder via terminal and type pip install -r requirements.txt:

Usage

To launch the Amazon scraper locate the amazon-product-scraper folder via terminal and type python amazon_scraper.py -k "your keyword". This will start the program.

NOTE: you must declare either -k or --keyword before entering your keyword. It's a required argument.

Example:

amazon_scraper.py - the name of a scraper file.

-k or --keyword - required argument to pass before entering your keyword.

-p or --proxies - optional argument to enable proxies. To avoid getting blocked I highly recommend using proxies. I'm using Residential Proxies from Oxylabs. For highest success rate, I suggest Residential Proxies over Datacenter as they're almost impossible to detect and have the smallest footprint. If you decide to use different proxy provider services keep in mind that you'll have to make some minor adjustments in get-proxies.py file.

-j or --json - optional argument for storing extracted data in .json format. Default output format is .csv.

Example of product data #1: JSON

[
    {
        "SOURCE_URL": "https://www.amazon.com/s?k=funny+t+shirt+for+women&page=1",
        "PAGE": 1,
        "KEYWORD": "funny t shirt for women",
        "PRODUCT_LINK": "https://www.amazon.com/Mostly-T-Shirt-Womens-Letter-Printed/dp/B07QN2NQ59/ref=sr_1_3?dchild=1&keywords=funny+t+shirt+for+women&qid=1627833682&sr=8-3",
        "PRODUCT_NAME": "I'm Mostly Peace Love and Light Funny T-Shirt Womens Graphic Printed Short Sleeve Tops Tee",
        "PRICE": "$21.99",
        "PRODUCT_RATING": "4.6",
        "NUMBER_OF_RATINGS": "1,637"
    },
    {
        "SOURCE_URL": "https://www.amazon.com/s?k=funny+t+shirt+for+women&page=1",
        "PAGE": 1,
        "KEYWORD": "funny t shirt for women",
        "PRODUCT_LINK": "https://www.amazon.com/YITAN-Women-Graphic-Funny-X-Large/dp/B074QMG4D7/ref=sr_1_4?dchild=1&keywords=funny+t+shirt+for+women&qid=1627833682&sr=8-4",
        "PRODUCT_NAME": "YITAN Women's Cute Juniors Tops Teen Girl Tee Funny T Shirt",
        "PRICE": "$12.99",
        "PRODUCT_RATING": "4.6",
        "NUMBER_OF_RATINGS": "12,281"
    },
    {
        "SOURCE_URL": "https://www.amazon.com/s?k=funny+t+shirt+for+women&page=1",
        "PAGE": 1,
        "KEYWORD": "funny t shirt for women",
        "PRODUCT_LINK": "https://www.amazon.com/DANVOUY-Womens-V-Neck-Doesnt-Definitely/dp/B07V55ZXVS/ref=sr_1_5?dchild=1&keywords=funny+t+shirt+for+women&qid=1627833682&sr=8-5",
        "PRODUCT_NAME": "DANVOUY Womens If My Mouth Doesn't Say It My Face Definitely Will T Shirt",
        "PRICE": "$12.99",
        "PRODUCT_RATING": "4.6",
        "NUMBER_OF_RATINGS": "6,787"
    }]

Example of product data #2: CSV

amazon-csv-product-data-example

Doing set operations on files considered as sets of lines

CLI tool that can be used to do set operations like union on files considering them as a set of lines. Notes It ignores all empty lines with whitespac

Partho 11 Sep 06, 2022
Standalone Tailwind CSS CLI, installable via pip

Standalone Tailwind CSS CLI, installable via pip Use Tailwind CSS without Node.j

Tim Kamanin 144 Dec 22, 2022
A VIM-inspired filemanager for the console

ranger 1.9.3 ranger is a console file manager with VI key bindings. It provides a minimalistic and nice curses interface with a view on the directory

12.6k Dec 30, 2022
Microsoft Azure CLI - Azure Command-Line Interface

A great cloud needs great tools; we're excited to introduce Azure CLI, our next generation multi-platform command line experience for Azure.

Microsoft Azure 3.4k Dec 30, 2022
Tncli - TON smart contract command line interface

Tncli TON smart contract command line interface State Not working, in active dev

Disintar IO 100 Dec 18, 2022
gcptree - Like the unix tree command but for GCP Org Heirarchy

gcptree Like the unix tree command but for GCP Org Heirarchy. For a note on coloring, the org node is green, folders and blue, and projects that are n

Ryan Canty 25 Sep 06, 2022
CLI client for FerrisChat

A CLI Client for @FerrisChat using FerrisWheel

FerrisChat 2 Apr 01, 2022
Salesforce object access auditor

Salesforce object access auditor Released as open source by NCC Group Plc - https://www.nccgroup.com/ Developed by Jerome Smith @exploresecurity (with

NCC Group Plc 90 Sep 19, 2022
Proman is a simple tool for managing projects through cli.

proman proman is a project manager. It helps you manage your projects from a terminal. The features are listed below. Installation Step 1: Download or

Arjun Somvanshi 2 Dec 06, 2021
Lsp Plugin for working with Python virtual environments

py_lsp.nvim What is py_lsp? py_lsp.nvim is a neovim plugin that helps with using the lsp feature for python development. It tackles the problem about

Patrick Haller 55 Dec 27, 2022
Bringing emacs' greatest feature to neovim - Tetris!

nvim-tetris Bringing emacs' greatest feature to neovim - Tetris! This plugin is written in Fennel using Olical's project Aniseed for creating the proj

129 Dec 26, 2022
Command Line (CLI) Application to automate creation of tasks in Redmine, issues on Github and the sync process of them.

Task Manager Automation Tool (TMAT) CLI Command Line (CLI) Application to automate creation of tasks in Redmine, issues on Github and the sync process

Tiamat 5 Apr 12, 2022
Interactive Python interpreter for executing commands within Node.js

Python Interactive Interactive Python interpreter for executing commands within Node.js. This module provides a means of using the Python interactive

Louis Lefevre 2 Sep 21, 2022
Python Command Line Application (CLI) using Typer, SQLModel, Async-PostgrSQL, and FastAPI

pyflycli is a command-line interface application built with Typer that allows you to view flights above your location.

Kevin Zehnder 14 Oct 01, 2022
Gamma ion pump QPC ethernet Python library & CLI utility

Unofficial Gamma ion pump ethernet control CLI utility and library This is a mini Python 3 library and utility that exposes some of the functions of t

2 Jul 18, 2022
Centauro - a command line tool with some network management functionality

Centauro Ferramenta de rede O Centauro é uma ferramenta de linha de comando com

1 Jan 01, 2022
CLI based Crunchyroll Account Checker Proxyless written in python from scratch.

A tool for checking Combolist of Crunchyroll accounts without proxies, It is written in Python from Scratch ,i.e, no external module is used rather than inbuilt Python modules.

Abhijeet 8 Dec 13, 2022
stonky is a simple command line dashboard for monitoring stocks.

stonky is a simple command line dashboard for monitoring stocks.

Jessy Williams 228 Dec 14, 2022
A simple command line dumper written in Python 3.

A simple command line dumper written in Python 3.

ImFatF1sh 1 Oct 10, 2021
Automaton - python script to execute bash command based on changes in size of a file.

automaton python script to execute given command = everytime size of a given file changes,hence everytime a file is modified.(almost) download automa

asrar bhat 1 Jan 03, 2022