A universal package of scraper scripts for humans

Related tags

Web CrawlingScrapera
Overview

Logo

MIT License version-shield release-shield python-shield

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Contributing
  5. Sponsors
  6. License
  7. Contact
  8. Acknowledgements

About The Project

Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains. Scrapera directly and asynchronously scrapes from public API endpoints, thereby removing the heavy browser overhead which makes Scrapera extremely fast and robust to DOM changes. Currently, Scrapera supports the following crawlers:

  • Images
  • Text
  • Audio
  • Videos
  • Miscellaneous

  • The main aim of this package is to cluster common scraping tasks so as to make it more convenient for ML researchers and engineers to focus on their models rather than worrying about the data collection process

    DISCLAIMER: Owner or Contributors do not take any responsibility for misuse of data obtained through Scrapera. Contact the owner if copyright terms are violated due to any module provided by Scrapera.

    Prerequisites

    Prerequisites can be installed separately through the requirements.txt file as below

    pip install -r requirements.txt

    Installation

    Scrapera is built with Python 3 and can be pip installed directly

    pip install scrapera

    Alternatively, if you wish to install the latest version directly through GitHub then run

    pip install git+https://github.com/DarshanDeshpande/Scrapera.git

    Usage

    To use any sub-module, you just need to import, instantiate and execute

    from scrapera.video.vimeo import VimeoScraper
    scraper = VimeoScraper()
    scraper.scrape('https://vimeo.com/191955190', '540p')

    For more examples, please refer to the individual test folders in respective modules

    Contributing

    Scrapera welcomes any and all contributions and scraper requests. Please raise an issue if the scraper fails at any instance. Feel free to fork the repository and add your own scrapers to help the community!
    For more guidelines, refer to CONTRIBUTING

    License

    Distributed under the MIT License. See LICENSE for more information.

    Sponsors

    Logo

    Contact

    Feel free to reach out for any issues or requests related to Scrapera

    Darshan Deshpande (Owner) - Email | LinkedIn

    Acknowledgements

    Owner
    Helping Machines Learn Better 💻😃
    🤖 Threaded Scraper to get discord servers from disboard.org written in python3

    Disboard-Scraper Threaded Scraper to get discord servers from disboard.org written in python3. Setup. One thread / tag If you whant to look for multip

    Ѵιcнч 11 Nov 01, 2022
    Amazon web scraping using Scrapy Framework

    Amazon-web-scraping-using-Scrapy-Framework Scrapy Scrapy is an application framework for crawling web sites and extracting structured data which can b

    Sejal Rajput 1 Jan 25, 2022
    tweet random sand cat pictures

    sandcatbot setup pip3 install --user -r requirements.txt cp sandcatbot.example.conf sandcatbot.conf vim sandcatbot.conf running the first parameter i

    jess 8 Aug 07, 2022
    AssistScraper - program for /r/nba to use to find list of all players a player assisted and how many assists each player recieved

    AssistScraper - program for /r/nba to use to find list of all players a player assisted and how many assists each player recieved

    5 Nov 25, 2021
    UsernameScraperTool - Username Scraper Tool With Python

    UsernameScraperTool Username Scraper for 40+ Social sites. How To use git clone

    E4crypt3d 1 Dec 20, 2022
    Get paper names from dblp.org

    scraper-dblp Get paper names from dblp.org and store them in a .txt file Useful for a related literature :) Install libraries pip3 install -r requirem

    Daisy Lab 1 Dec 07, 2021
    This app will let you continuously scrape certain parts of LeasePlan and extract data of cars becoming available for lease.

    LeasePlan - Scraper This app will let you continuously scrape certain parts of LeasePlan and extract data of cars becoming available for lease. It has

    Rodney 4 Nov 18, 2022
    Instagram profile scrapper with python

    IG Profile Scrapper Instagram profile Scrapper Just type the username, and boo! :D Instalation clone this repo to your computer git clone https://gith

    its Galih 6 Nov 07, 2022
    This is my CS 20 final assesment.

    eeeeeSpider This is my CS 20 final assesment. How to use: Open program Run to your hearts content! There are no external dependancies that you will ha

    1 Jan 17, 2022
    Jobinja.ir jobs scraper.

    Jobinja.ir Dataset Introduction This project is a simple web scraper that scraps pages of jobinja.ir concurrently and writes and update (if file gets

    Iman Kermani 3 Apr 15, 2022
    Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.

    Pattern Pattern is a web mining module for Python. It has tools for: Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM par

    Computational Linguistics Research Group 8.4k Jan 08, 2023
    PS5 bot to find a console in france for chrismas 🎄🎅🏻 NOT FOR SCALPERS

    Une PS5 pour Noël Python + Chrome --headless = une PS5 pour noël MacOS Installer chrome Tweaker le .yaml pour la listes sites a scrap et les criteres

    Olivier Giniaux 3 Feb 13, 2022
    Comment Webpage Screenshot is a GitHub Action that captures screenshots of web pages and HTML files located in the repository

    Comment Webpage Screenshot is a GitHub Action that helps maintainers visually review HTML file changes introduced on a Pull Request by adding comments with the screenshots of the latest HTML file cha

    Maksudul Haque 21 Sep 29, 2022
    爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、百度指数、维普万方、Zlibraty、Oalib、小说、招标网、采购网、小红书》

    lxSpider 爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、百度指数、维普万方、Zlibraty、Oalib、小说网站、招标采购网》 简介: 时光荏苒,记不清写了多少案例了。

    lx 793 Jan 05, 2023
    LSpider 一个为被动扫描器定制的前端爬虫

    LSpider LSpider - 一个为被动扫描器定制的前端爬虫 什么是LSpider? 一款为被动扫描器而生的前端爬虫~ 由Chrome Headless、LSpider主控、Mysql数据库、RabbitMQ、被动扫描器5部分组合而成。

    Knownsec, Inc. 321 Dec 12, 2022
    爱奇艺会员,腾讯视频,哔哩哔哩,百度,各类签到

    My-Actions 个人收集并适配Github Actions的各类签到大杂烩 不要fork了 ⭐️ star就行 使用方式 新建仓库并同步代码 点击Settings - Secrets - 点击绿色按钮 (如无绿色按钮说明已激活。直接到下一步。) 新增 new secret 并设置 Secr

    280 Dec 30, 2022
    Web scrapper para cotizar articulos

    WebScrapper Este web scrapper esta desarrollado en python 3.10.0 para buscar en la pagina de cyber puerta articulos dentro del catalogo. El programa t

    Jordan Gaona 1 Oct 27, 2021
    Web and PDF Scraper Refactoring

    Web and PDF Scraper Refactoring This repository contains the example code of the Web and PDF scraper code roast. Here are the links to the videos: Par

    18 Dec 31, 2022
    The open-source web scrapers that feed the Los Angeles Times California coronavirus tracker.

    The open-source web scrapers that feed the Los Angeles Times' California coronavirus tracker. Processed data ready for analysis is available at datade

    Los Angeles Times Data and Graphics Department 51 Dec 14, 2022
    Luis M. Capdevielle 1 Jan 14, 2022