A universal package of scraper scripts for humans

Related tags

Web CrawlingScrapera
Overview

Logo

MIT License version-shield release-shield python-shield

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Contributing
  5. Sponsors
  6. License
  7. Contact
  8. Acknowledgements

About The Project

Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains. Scrapera directly and asynchronously scrapes from public API endpoints, thereby removing the heavy browser overhead which makes Scrapera extremely fast and robust to DOM changes. Currently, Scrapera supports the following crawlers:

  • Images
  • Text
  • Audio
  • Videos
  • Miscellaneous

  • The main aim of this package is to cluster common scraping tasks so as to make it more convenient for ML researchers and engineers to focus on their models rather than worrying about the data collection process

    DISCLAIMER: Owner or Contributors do not take any responsibility for misuse of data obtained through Scrapera. Contact the owner if copyright terms are violated due to any module provided by Scrapera.

    Prerequisites

    Prerequisites can be installed separately through the requirements.txt file as below

    pip install -r requirements.txt

    Installation

    Scrapera is built with Python 3 and can be pip installed directly

    pip install scrapera

    Alternatively, if you wish to install the latest version directly through GitHub then run

    pip install git+https://github.com/DarshanDeshpande/Scrapera.git

    Usage

    To use any sub-module, you just need to import, instantiate and execute

    from scrapera.video.vimeo import VimeoScraper
    scraper = VimeoScraper()
    scraper.scrape('https://vimeo.com/191955190', '540p')

    For more examples, please refer to the individual test folders in respective modules

    Contributing

    Scrapera welcomes any and all contributions and scraper requests. Please raise an issue if the scraper fails at any instance. Feel free to fork the repository and add your own scrapers to help the community!
    For more guidelines, refer to CONTRIBUTING

    License

    Distributed under the MIT License. See LICENSE for more information.

    Sponsors

    Logo

    Contact

    Feel free to reach out for any issues or requests related to Scrapera

    Darshan Deshpande (Owner) - Email | LinkedIn

    Acknowledgements

    Owner
    Helping Machines Learn Better 💻😃
    抢京东茅台脚本,定时自动触发,自动预约,自动停止

    jd_maotai 抢京东茅台脚本,定时自动触发,自动预约,自动停止 小白信用 99.6,暂时还没抢到过,朋友 80 多抢到了一瓶,所以我感觉是跟信用分没啥关系,完全是看运气的。

    Aruelius.L 117 Dec 22, 2022
    A leetcode scraper to compile all questions in leetcode free tier to text file. pdf also available.

    A leetcode scraper to compile all questions in leetcode free tier to text file, pdf also available. if new questions get added, run again to get new questions.

    3 Dec 07, 2021
    Binance Smart Chain Contract Scraper + Contract Evaluator

    Pulls Binance Smart Chain feed of newly-verified contracts every 30 seconds, then checks their contract code for links to socials.Returns only those with socials information included, and then submit

    14 Dec 09, 2022
    FilmMikirAPI - A simple rest-api which is used for scrapping on the Kincir website using the Python and Flask package

    FilmMikirAPI - A simple rest-api which is used for scrapping on the Kincir website using the Python and Flask package

    UserGhost411 1 Nov 17, 2022
    A module for CME that spiders hashes across the domain with a given hash.

    hash_spider A module for CME that spiders hashes across the domain with a given hash. Installation Simply copy hash_spider.py to your CME module folde

    37 Sep 08, 2022
    Web Scraping OLX with Python and Bsoup.

    webScrap WebScraping first step. Authors: Paulo, Claudio M. First steps in Web Scraping. Project carried out for training in Web Scrapping. The export

    claudio paulo 5 Sep 25, 2022
    Crawl BookCorpus

    These are scripts to reproduce BookCorpus by yourself.

    Sosuke Kobayashi 590 Jan 03, 2023
    SearchifyX, predecessor to Searchify, is a fast Quizlet, Quizizz, and Brainly webscraper with various stealth features.

    SearchifyX SearchifyX, predecessor to Searchify, is a fast Quizlet, Quizizz, and Brainly webscraper with various stealth features. SearchifyX lets you

    28 Dec 20, 2022
    Scrape plants scientific name information from Agroforestry Species Switchboard 2.0.

    Agroforestry Species Switchboard 2.0 Scraper Scrape plants scientific name information from Species Switchboard 2.0. Requirements python = 3.10 (you

    Mgs. M. Rizqi Fadhlurrahman 2 Dec 23, 2021
    Auto Join: A GitHub action script to automatically invite everyone to the organization who star your repository.

    Auto Invite To The Organization By Star A GitHub Action script to automatically invite everyone to your organization that stars your repository. What

    Max Base 11 Dec 11, 2022
    Automated data scraper for Thailand COVID-19 data

    The Researcher COVID data Automated data scraper for Thailand COVID-19 data Accessing the Data 1st Dose Provincial Vaccination Data 2nd Dose Provincia

    Porames Vatanaprasan 31 Apr 17, 2022
    Complete pipeline for crawling online newspaper article.

    Complete pipeline for crawling online newspaper article. The articles are stored to MongoDB. The whole pipeline is dockerized, thus the user does not need to worry about dependencies. Additionally, d

    newspipe 4 May 27, 2022
    A package that provides you Latest Cyber/Hacker News from website using Web-Scraping.

    cybernews A package that provides you Latest Cyber/Hacker News from website using Web-Scraping. Latest Cyber/Hacker News Using Webscraping Developed b

    Hitesh Rana 4 Jun 02, 2022
    Web Scraping images using Selenium and Python

    Web Scraping images using Selenium and Python A propos de ce document This is a markdown document about Web scraping images and videos using Selenium

    Nafaa BOUGRAINE 3 Jul 01, 2022
    Binance Smart Chain Contract Scraper + Contract Evaluator

    Pulls Binance Smart Chain feed of newly-verified contracts every 30 seconds, then checks their contract code for links to socials.Returns only those with socials information included, and then submit

    14 Dec 09, 2022
    京东秒杀商品抢购Python脚本

    Jd_Seckill 非常感谢原作者 https://github.com/zhou-xiaojun/jd_mask 提供的代码 也非常感谢 https://github.com/wlwwu/jd_maotai 进行的优化 主要功能 登陆京东商城(www.jd.com) cookies登录 (需要自

    Andy Zou 1.5k Jan 03, 2023
    A simple python web scraper.

    Dissec A simple python web scraper. It gets a website and its contents and parses them with the help of bs4. Installation To install the requirements,

    11 May 06, 2022
    A crawler of doubamovie

    豆瓣电影 A crawler of doubamovie 一个小小的入门级scrapy框架的应用,选取豆瓣电影对排行榜前1000的电影数据进行爬取。 spider.py start_requests方法为scrapy的方法,我们对它进行重写。 def start_requests(self):

    Cats without dried fish 1 Oct 05, 2021
    Python Web Scrapper Project

    Web Scrapper Projeto desenvolvido em python, sobre tudo com Selenium, BeautifulSoup e Pandas é um web scrapper que puxa uma tabela com as principais e

    Jordan Ítalo Amaral 2 Jan 04, 2022
    Scrape puzzle scrambles from csTimer.net

    Scroodle Selenium script to scrape scrambles from csTimer.net csTimer runs locally in your browser, so this doesn't strain the servers any more than i

    Jason Nguyen 1 Oct 29, 2021