A universal package of scraper scripts for humans

Related tags

Web CrawlingScrapera
Overview

Logo

MIT License version-shield release-shield python-shield

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Contributing
  5. Sponsors
  6. License
  7. Contact
  8. Acknowledgements

About The Project

Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains. Scrapera directly and asynchronously scrapes from public API endpoints, thereby removing the heavy browser overhead which makes Scrapera extremely fast and robust to DOM changes. Currently, Scrapera supports the following crawlers:

  • Images
  • Text
  • Audio
  • Videos
  • Miscellaneous

  • The main aim of this package is to cluster common scraping tasks so as to make it more convenient for ML researchers and engineers to focus on their models rather than worrying about the data collection process

    DISCLAIMER: Owner or Contributors do not take any responsibility for misuse of data obtained through Scrapera. Contact the owner if copyright terms are violated due to any module provided by Scrapera.

    Prerequisites

    Prerequisites can be installed separately through the requirements.txt file as below

    pip install -r requirements.txt

    Installation

    Scrapera is built with Python 3 and can be pip installed directly

    pip install scrapera

    Alternatively, if you wish to install the latest version directly through GitHub then run

    pip install git+https://github.com/DarshanDeshpande/Scrapera.git

    Usage

    To use any sub-module, you just need to import, instantiate and execute

    from scrapera.video.vimeo import VimeoScraper
    scraper = VimeoScraper()
    scraper.scrape('https://vimeo.com/191955190', '540p')

    For more examples, please refer to the individual test folders in respective modules

    Contributing

    Scrapera welcomes any and all contributions and scraper requests. Please raise an issue if the scraper fails at any instance. Feel free to fork the repository and add your own scrapers to help the community!
    For more guidelines, refer to CONTRIBUTING

    License

    Distributed under the MIT License. See LICENSE for more information.

    Sponsors

    Logo

    Contact

    Feel free to reach out for any issues or requests related to Scrapera

    Darshan Deshpande (Owner) - Email | LinkedIn

    Acknowledgements

    Owner
    Helping Machines Learn Better 💻😃
    A simple python script to fetch the latest covid info

    covid-tracker-script A simple python script to fetch the latest covid info How it works First, get the current date in MM-DD-YYYY format. Check if the

    Dot 0 Dec 15, 2021
    A web crawler for recording posts in "sina weibo"

    Web Crawler for "sina weibo" A web crawler for recording posts in "sina weibo" Introduction This script helps collect attributes of posts in "sina wei

    4 Aug 20, 2022
    Danbooru scraper with python

    Danbooru Version: 0.0.1 License under: MIT License Dependencies Python: = 3.9.7 beautifulsoup4 cloudscraper Example of use Danbooru from danbooru imp

    Sugarbell 2 Oct 27, 2022
    Lovely Scrapper

    Lovely Scrapper

    Tushar Gadhe 2 Jan 01, 2022
    Bulk download tool for the MyMedia platform

    MyMedia Bulk Content Downloader This is a bulk download tool for the MyMedia platform. USE ONLY WHERE ALLOWED BY THE COPYRIGHT OWNER. NOT AFFILIATED W

    Ege Feyzioglu 3 Oct 14, 2022
    An experiment to deploy a serverless infrastructure for a scrapy project.

    Serverless Scrapy project This project aims to evaluate the feasibility of an architecture based on serverless technology for a web crawler using scra

    José Ferraz Neto 5 Jul 08, 2022
    Google Scholar Web Scraping

    Google Scholar Web Scraping This is a python script that asks for a user to input the url for a google scholar profile, and then it writes publication

    Suzan M 1 Dec 12, 2021
    Linkedin webscraping - Linkedin web scraping with python

    linkedin_webscraping This is the first step of a full project called "LinkedIn J

    Pedro Dib 4 Apr 24, 2022
    ✂️🕷️ Spider-Cut is a Network Mapper Framework (NMAP Framework)

    Spider-Cut is a Network Mapper Framework (NMAP Framework) Installation | Usage | Creators | Donate Installation # Kali Linux | WSL

    XforWorks 3 Mar 07, 2022
    此脚本为 python 脚本,实现原理为利用 selenium 定位相关元素,再配合点击事件完成浏览器的自动化.

    此脚本为 python 脚本,实现原理为利用 selenium 定位相关元素,再配合点击事件完成浏览器的自动化.

    N0el4kLs 5 Nov 19, 2021
    Demonstration on how to use async python to control multiple playwright browsers for web-scraping

    Playwright Browser Pool This example illustrates how it's possible to use a pool of browsers to retrieve page urls in a single asynchronous process. i

    Bernardas Ališauskas 8 Oct 27, 2022
    Script for scrape user data like "id,username,fullname,followers,tweets .. etc" by Twitter's search engine .

    TwitterScraper Script for scrape user data like "id,username,fullname,followers,tweets .. etc" by Twitter's search engine . Screenshot Data Users Only

    Remax Alghamdi 19 Nov 17, 2022
    The open-source web scrapers that feed the Los Angeles Times California coronavirus tracker.

    The open-source web scrapers that feed the Los Angeles Times' California coronavirus tracker. Processed data ready for analysis is available at datade

    Los Angeles Times Data and Graphics Department 51 Dec 14, 2022
    Quick Project made to help scrape Lexile and Atos(AR) levels from ISBN

    Lexile-Atos-Scraper Quick Project made to help scrape Lexile and Atos(AR) levels from ISBN You will need to install the chrome webdriver if you have n

    1 Feb 11, 2022
    Ebay Webscraper for Getting Average Product Price

    Ebay-Webscraper-for-Getting-Average-Product-Price The code in this repo is used to determine the average price of an item on Ebay given a valid search

    17 Jan 05, 2023
    Scrape puzzle scrambles from csTimer.net

    Scroodle Selenium script to scrape scrambles from csTimer.net csTimer runs locally in your browser, so this doesn't strain the servers any more than i

    Jason Nguyen 1 Oct 29, 2021
    Works very well and you can ask for the type of image you want the scrapper to collect.

    Works very well and you can ask for the type of image you want the scrapper to collect. Also follows a specific urls path depending on keyword selection.

    Memo Sim 1 Feb 17, 2022
    A web service for scanning media hosted by a Matrix media repository

    Matrix Content Scanner A web service for scanning media hosted by a Matrix media repository Installation TODO Development In a virtual environment wit

    Brendan Abolivier 5 Dec 01, 2022
    A leetcode scraper to compile all questions in leetcode free tier to text file. pdf also available.

    A leetcode scraper to compile all questions in leetcode free tier to text file, pdf also available. if new questions get added, run again to get new questions.

    3 Dec 07, 2021
    Scrapping the data from each page of biocides listed on the BAUA website into a csv file

    Scrapping the data from each page of biocides listed on the BAUA website into a csv file

    Eric DE MARIA 1 Nov 30, 2021