Shopee Scraper - A web scraper in python that extract sales, price, avaliable stock, location and more of a given seller in Brazil

Overview

Shopee Scraper

A web scraper in python that extract sales, price, avaliable stock, location and more of a given seller in Brazil.

The project was created in python 3 and requires only 3 libraries that may need to be installed (in case you don't have any of them).

They are: requests, date and time. Date and Time are default libraries for Linux and Mac users, but if you're running Windows, make sure to install them using pip.

You can easily install requests using the following command: $ pip install requests

The script runs based on Shopee's public API. Shopee generates a dynamic page that shows products and its information calling a json file. Since it's an API and it's public, it's easier to just call the json file and extract the data instead of selecting divs, classes and scrolling through the results and using Selenium to simulate a web browser.

How to use it

  1. The first thing you have to do is to find the seller's id. It's present in the product link.

Exemple: https://shopee.com.br/Camisetas-Bandas-Rock-RHCP-Red-Hot-Chili-Peppers-100-Algodao!!-i.409068735.3983196792

  • 409068735 is the seller's id. That's required to run the script.
  • 3983196792 is the product's id
  1. Before running the code, change the file directory where you want to save the csv file generated what will contain all the data extracted.
  • file=open("/YOUR-DIRECTORY/%s-YOUR-FILE-NAME.csv" % data, "a"))
  • The %s- right before the file name prints the date when the csv was generated. It's recommended to keep it that way, in order to track down your files.
  1. Using the terminal, go to the script's folder and run:
  • python3 shopee-scraper.py
  • Type in the seller's id you just got from the product link.
  • The script will scrape 999 products published and the scraper will take 1 sec. per ad. So it may take some time depending on the number of products.

Why I created this project and who I am?

  • I'm a Computer Engineering and Mathematics major in Brazil. I already got a bachelors degree in Marketing and I'm looking for a Data Engineer and Data Scientist position.
  • Currently working for a small company in Brazil as a comercial manager and my main role is to increse the online sales of hydraulic and brass connectors for gas and petroleum
  • I love data and statistics. Finding new possibilities and ways of doing things better and faster through the data is a facinating thing, and quoting Carl Sagan I would say that "it's a pleasure to share a planet and an epoch with you", because the humankind don't even know yet what we're capable of. AI and machine learning will show us a new world, a new age.
  • I really like the feeling of helping companies to make better data-driven decisions on online sales, marketing and purchasing. Solving problems is pretty much the main motivation of any mathematician or engineer
Owner
Paulo DaRosa
Computer Engineer, Mathematician and Marketer.
Paulo DaRosa
An experiment to deploy a serverless infrastructure for a scrapy project.

Serverless Scrapy project This project aims to evaluate the feasibility of an architecture based on serverless technology for a web crawler using scra

José Ferraz Neto 5 Jul 08, 2022
Displays market info for the LUNI token on the Terra Blockchain

LuniBot for Discord Displays market info for the LUNI/LUNA token on the Terra Blockchain (Webscrape method currently scraping CoinMarketCap). Will evo

0 Jan 22, 2022
Python Web Scrapper Project

Web Scrapper Projeto desenvolvido em python, sobre tudo com Selenium, BeautifulSoup e Pandas é um web scrapper que puxa uma tabela com as principais e

Jordan Ítalo Amaral 2 Jan 04, 2022
Web scrapper para cotizar articulos

WebScrapper Este web scrapper esta desarrollado en python 3.10.0 para buscar en la pagina de cyber puerta articulos dentro del catalogo. El programa t

Jordan Gaona 1 Oct 27, 2021
A Python library for automating interaction with websites.

Home page https://mechanicalsoup.readthedocs.io/ Overview A Python library for automating interaction with websites. MechanicalSoup automatically stor

4.3k Jan 07, 2023
This repo has the source code for the crawler and data crawled from auto-data.net

This repo contains the source code for crawler and crawled data of cars specifications from autodata. The data has roughly 45k cars

Tô Đức Anh 5 Nov 22, 2022
Complete pipeline for crawling online newspaper article.

Complete pipeline for crawling online newspaper article. The articles are stored to MongoDB. The whole pipeline is dockerized, thus the user does not need to worry about dependencies. Additionally, d

newspipe 4 May 27, 2022
This is a module that I had created along with my friend. It's a basic web scraping module

QuickInfo PYPI link : https://pypi.org/project/quickinfo/ This is the library that you've all been searching for, it's built for developers and allows

OneBit 2 Dec 13, 2021
a high-performance, lightweight and human friendly serving engine for scrapy

a high-performance, lightweight and human friendly serving engine for scrapy

Speakol Ads 30 Mar 01, 2022
Examine.com supplement research scraper!

ExamineScraper Examine.com supplement research scraper! Why I want to be able to search pages for a specific term. For example, I want to be able to s

Tyler 15 Dec 06, 2022
Scrapping the data from each page of biocides listed on the BAUA website into a csv file

Scrapping the data from each page of biocides listed on the BAUA website into a csv file

Eric DE MARIA 1 Nov 30, 2021
一些爬虫相关的签名、验证码破解

cracking4crawling 一些爬虫相关的签名、验证码破解,目前已有脚本: 小红书App接口签名(shield)(2020.12.02) 小红书滑块(数美)验证破解(2020.12.02) 海南航空App接口签名(hnairSign)(2020.12.05) 说明: 脚本按目标网站、App命

XNFA 90 Feb 09, 2021
Shopee Scraper - A web scraper in python that extract sales, price, avaliable stock, location and more of a given seller in Brazil

Shopee Scraper A web scraper in python that extract sales, price, avaliable stock, location and more of a given seller in Brazil. The project was crea

Paulo DaRosa 5 Nov 29, 2022
Bigdata - This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster

Scrapy Cluster This Scrapy project uses Redis and Kafka to create a distributed

Hanh Pham Van 0 Jan 06, 2022
Html Content / Article Extractor, web scrapping lib in Python

Python-Goose - Article Extractor Intro Goose was originally an article extractor written in Java that has most recently (Aug2011) been converted to a

Xavier Grangier 3.8k Jan 02, 2023
A python script to extract answers to any question on Quora (Quora+ included)

quora-plus-bypass A python script to extract answers to any question on Quora (Quora+ included) Requirements Python 3.x

Nitin Narayanan 10 Aug 18, 2022
Collection of code files to scrap different kinds of websites.

STW-Collection Scrap The Web Collection; blog posts. This repo contains Scrapy sample code to scrap the following kind of websites: Do you want to lea

Tapasweni Pathak 15 Jun 08, 2022
A web service for scanning media hosted by a Matrix media repository

Matrix Content Scanner A web service for scanning media hosted by a Matrix media repository Installation TODO Development In a virtual environment wit

Brendan Abolivier 5 Dec 01, 2022
Using Python and Pushshift.io to Track stocks on the WallStreetBets subreddit

wallstreetbets-tracker Using Python and Pushshift.io to Track stocks on the WallStreetBets subreddit.

91 Dec 08, 2022
A Scrapper with python

Scrapper-en-python Scrapper des données signifie récuperer des données pour les traiter ou les analyser. En python, il y'a 2 grands moyens de scrapper

Lun4rIum 1 Dec 05, 2021