A universal package of scraper scripts for humans

Related tags

Web CrawlingScrapera
Overview

Logo

MIT License version-shield release-shield python-shield

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Contributing
  5. Sponsors
  6. License
  7. Contact
  8. Acknowledgements

About The Project

Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains. Scrapera directly and asynchronously scrapes from public API endpoints, thereby removing the heavy browser overhead which makes Scrapera extremely fast and robust to DOM changes. Currently, Scrapera supports the following crawlers:

  • Images
  • Text
  • Audio
  • Videos
  • Miscellaneous

  • The main aim of this package is to cluster common scraping tasks so as to make it more convenient for ML researchers and engineers to focus on their models rather than worrying about the data collection process

    DISCLAIMER: Owner or Contributors do not take any responsibility for misuse of data obtained through Scrapera. Contact the owner if copyright terms are violated due to any module provided by Scrapera.

    Prerequisites

    Prerequisites can be installed separately through the requirements.txt file as below

    pip install -r requirements.txt

    Installation

    Scrapera is built with Python 3 and can be pip installed directly

    pip install scrapera

    Alternatively, if you wish to install the latest version directly through GitHub then run

    pip install git+https://github.com/DarshanDeshpande/Scrapera.git

    Usage

    To use any sub-module, you just need to import, instantiate and execute

    from scrapera.video.vimeo import VimeoScraper
    scraper = VimeoScraper()
    scraper.scrape('https://vimeo.com/191955190', '540p')

    For more examples, please refer to the individual test folders in respective modules

    Contributing

    Scrapera welcomes any and all contributions and scraper requests. Please raise an issue if the scraper fails at any instance. Feel free to fork the repository and add your own scrapers to help the community!
    For more guidelines, refer to CONTRIBUTING

    License

    Distributed under the MIT License. See LICENSE for more information.

    Sponsors

    Logo

    Contact

    Feel free to reach out for any issues or requests related to Scrapera

    Darshan Deshpande (Owner) - Email | LinkedIn

    Acknowledgements

    Owner
    Helping Machines Learn Better 💻😃
    A package designed to scrape data from Yahoo Finance.

    yahoostock A package designed to scrape data from Yahoo Finance. Installation The most simple installation method is through PIP. pip install yahoosto

    Rohan Singh 2 May 28, 2022
    Minimal set of tools to conduct stealthy scraping.

    Stealthy Scraping Tools Do not use puppeteer and playwright for scraping. Explanation. We only use the CDP to obtain the page source and to get the ab

    Nikolai Tschacher 88 Jan 04, 2023
    An experiment to deploy a serverless infrastructure for a scrapy project.

    Serverless Scrapy project This project aims to evaluate the feasibility of an architecture based on serverless technology for a web crawler using scra

    José Ferraz Neto 5 Jul 08, 2022
    A tool can scrape product in aliexpress: Title, Price, and URL Product.

    Scrape-Product-Aliexpress A tool can scrape product in aliexpress: Title, Price, and URL Product. Usage: 1. Install Python 3.8 3.9 padahal halaman ins

    Rahul Joshua Damanik 1 Dec 30, 2021
    A list of Python Bots used to extract data from several websites

    A list of Python Bots used to extract data from several websites. Data extraction is for products on e-commerce (ecommerce) websites. Data fetched i

    Sahil Ladhani 1 Jan 14, 2022
    API to parse tibia.com content into python objects.

    Tibia.py An API to parse Tibia.com content into object oriented data. No fetching is done by this module, you must provide the html content. Features:

    Allan Galarza 25 Oct 31, 2022
    a way to scrape a database of all of the isef projects

    ISEF Database This is a simple web scraper which gets all of the projects and abstract information from here. My goal for this is for someone to get i

    William Kaiser 1 Mar 18, 2022
    🐞 Douban Movie / Douban Book Scarpy

    Python3-based Douban Movie/Douban Book Scarpy crawler for cover downloading + data crawling + review entry.

    Xingbo Jia 1 Dec 03, 2022
    Telegram Group Scrapper

    this programe is make your work so much easy on telegrame. do you want to send messages on everyone to your group or others group. use this script it will do your work automatically with one click. a

    HackArrOw 3 Dec 03, 2022
    抢京东茅台脚本,定时自动触发,自动预约,自动停止

    jd_maotai 抢京东茅台脚本,定时自动触发,自动预约,自动停止 小白信用 99.6,暂时还没抢到过,朋友 80 多抢到了一瓶,所以我感觉是跟信用分没啥关系,完全是看运气的。

    Aruelius.L 117 Dec 22, 2022
    a small library for extracting rich content from urls

    A small library for extracting rich content from urls. what does it do? micawber supplies a few methods for retrieving rich metadata about a variety o

    Charles Leifer 588 Dec 27, 2022
    Scrapes Every Email Address of Every Society in Every University

    society-email-scrape Site Live at https://kcsoc.github.io/society-email-scrape/ How to automatically generate new data Go to unis.yml Add your uni Cre

    Krishna Consciousness Society 18 Dec 14, 2022
    Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)

    trafilatura: Web scraping tool for text discovery and retrieval Description Trafilatura is a Python package and command-line tool which seamlessly dow

    Adrien Barbaresi 704 Jan 06, 2023
    Raspi-scraper is a configurable python webscraper that checks raspberry pi stocks from verified sellers

    Raspi-scraper is a configurable python webscraper that checks raspberry pi stocks from verified sellers.

    Louie Cai 13 Oct 15, 2022
    Create crawler get some new products with maximum discount in banimode website

    crawler-banimode create crawler and get some new products with maximum discount in banimode website. این پروژه کوچک جهت یادگیری و کار با ابزار سلنیوم

    nourollah rezaei 2 Feb 17, 2022
    This scrapper scrapes the mail ids of faculty members from a given linl/page and stores it in a csv file

    This scrapper scrapes the mail ids of faculty members from a given linl/page and stores it in a csv file

    Devansh Singh 1 Feb 10, 2022
    Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

    Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation This repository provides two web crawlers to label domain nam

    1 Nov 05, 2021
    Telegram group scraper tool

    Telegram Group Scrapper

    Wahyusaputra 2 Jan 11, 2022
    Web-Scraping using Selenium Master

    Web-Scraping using Selenium What is the need of Selenium? Some websites don't like to be scrapped and in that case you need to disguise your webscrapi

    Md Rashidul Islam 1 Oct 26, 2021
    A database scraper created with mechanical soup and sqlite

    WebscrapingDatabases a database scraper created with mechanical soup and sqlite author: Mariya Sha Watch on YouTube: This repository was created to su

    Mariya 30 Aug 08, 2022