A universal package of scraper scripts for humans

Related tags

Web CrawlingScrapera
Overview

Logo

MIT License version-shield release-shield python-shield

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Contributing
  5. Sponsors
  6. License
  7. Contact
  8. Acknowledgements

About The Project

Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains. Scrapera directly and asynchronously scrapes from public API endpoints, thereby removing the heavy browser overhead which makes Scrapera extremely fast and robust to DOM changes. Currently, Scrapera supports the following crawlers:

  • Images
  • Text
  • Audio
  • Videos
  • Miscellaneous

  • The main aim of this package is to cluster common scraping tasks so as to make it more convenient for ML researchers and engineers to focus on their models rather than worrying about the data collection process

    DISCLAIMER: Owner or Contributors do not take any responsibility for misuse of data obtained through Scrapera. Contact the owner if copyright terms are violated due to any module provided by Scrapera.

    Prerequisites

    Prerequisites can be installed separately through the requirements.txt file as below

    pip install -r requirements.txt

    Installation

    Scrapera is built with Python 3 and can be pip installed directly

    pip install scrapera

    Alternatively, if you wish to install the latest version directly through GitHub then run

    pip install git+https://github.com/DarshanDeshpande/Scrapera.git

    Usage

    To use any sub-module, you just need to import, instantiate and execute

    from scrapera.video.vimeo import VimeoScraper
    scraper = VimeoScraper()
    scraper.scrape('https://vimeo.com/191955190', '540p')

    For more examples, please refer to the individual test folders in respective modules

    Contributing

    Scrapera welcomes any and all contributions and scraper requests. Please raise an issue if the scraper fails at any instance. Feel free to fork the repository and add your own scrapers to help the community!
    For more guidelines, refer to CONTRIBUTING

    License

    Distributed under the MIT License. See LICENSE for more information.

    Sponsors

    Logo

    Contact

    Feel free to reach out for any issues or requests related to Scrapera

    Darshan Deshpande (Owner) - Email | LinkedIn

    Acknowledgements

    Owner
    Helping Machines Learn Better 💻😃
    Web scraping library and command-line tool for text discovery and extraction (main content, metadata, comments)

    trafilatura: Web scraping tool for text discovery and retrieval Description Trafilatura is a Python package and command-line tool which seamlessly dow

    Adrien Barbaresi 704 Jan 06, 2023
    Scrap-mtg-top-8 - A top 8 mtg scraper using python

    Scrap-mtg-top-8 - A top 8 mtg scraper using python

    1 Jan 24, 2022
    A web scraper that exports your entire WhatsApp chat history.

    WhatSoup 🍲 A web scraper that exports your entire WhatsApp chat history. Table of Contents Overview Demo Prerequisites Instructions Frequen

    Eddy Harrington 87 Jan 06, 2023
    Web-scraping - Program that scrapes a website for a collection of quotes, picks one at random and displays it

    web-scraping Program that scrapes a website for a collection of quotes, picks on

    Manvir Mann 1 Jan 07, 2022
    SearchifyX, predecessor to Searchify, is a fast Quizlet, Quizizz, and Brainly webscraper with various stealth features.

    SearchifyX SearchifyX, predecessor to Searchify, is a fast Quizlet, Quizizz, and Brainly webscraper with various stealth features. SearchifyX lets you

    28 Dec 20, 2022
    Screen scraping and web crawling framework

    Pomp Pomp is a screen scraping and web crawling framework. Pomp is inspired by and similar to Scrapy, but has a simpler implementation that lacks the

    Evgeniy Tatarkin 61 Jun 21, 2021
    Find papers by keywords and venues. Then download it automatically

    paper finder Find papers by keywords and venues. Then download it automatically. How to use this? Search CLI python search.py -k "knowledge tracing,kn

    Jiahao Chen (TabChen) 2 Dec 15, 2022
    A Python web scraper to scrape latest posts from official Coinbase's Blog.

    Coinbase Blog Scraper A Python web scraper to scrape latest posts from official Coinbase's Blog. IDEA It scrapes up latest blog posts from https://blo

    Lucas Villela 3 Feb 18, 2022
    A python module to parse the Open Graph Protocol

    OpenGraph is a module of python for parsing the Open Graph Protocol, you can read more about the specification at http://ogp.me/ Installation $ pip in

    Erik Rivera 213 Nov 12, 2022
    This project was created using Python technology and flask tools to scrape a music site

    python-scrapping This project was created using Python technology and flask tools to scrape a music site You need to install the following packages to

    hosein moradi 1 Dec 07, 2021
    🥫 The simple, fast, and modern web scraping library

    About gazpacho is a simple, fast, and modern web scraping library. The library is stable, actively maintained, and installed with zero dependencies. I

    Max Humber 692 Dec 22, 2022
    Ebay Webscraper for Getting Average Product Price

    Ebay-Webscraper-for-Getting-Average-Product-Price The code in this repo is used to determine the average price of an item on Ebay given a valid search

    17 Jan 05, 2023
    This is a web crawler that works on employ email data by gmane.org and visualizes it in different ways.

    crawler_to_visual_gmane Analyzing an EMAIL Archive from gmane and vizualizing the data using the D3 JavaScript library. This is a set of tools that al

    Saim Zafar 1 Dec 20, 2021
    Html Content / Article Extractor, web scrapping lib in Python

    Python-Goose - Article Extractor Intro Goose was originally an article extractor written in Java that has most recently (Aug2011) been converted to a

    Xavier Grangier 3.8k Jan 02, 2023
    🕷 Phone Crawler with multi-thread functionality

    Phone Crawler: Phone Crawler with multi-thread functionality Disclaimer: I'm not responsible for any illegal/misuse actions, this program was made for

    Kmuv1t 3 Feb 10, 2022
    Poolbooru gelscraper - a simple python script for scraping images off gelbooru pools.

    poolbooru_gelscraper a simple python script for scraping images off gelbooru pools. modules required:requests_html, and os by default saves files with

    savantshuia 1 Jan 02, 2022
    Web scraper build using python.

    Web Scraper This project is made in pyhthon. It took some info. from website list then add them into data.json file. The dependencies used are: reques

    Shashwat Harsh 2 Jul 22, 2022
    WebScraper - A script that prints out a list of all EXTERNAL references in the HTML response to an HTTP/S request

    Project A: WebScraper A script that prints out a list of all EXTERNAL references

    2 Apr 26, 2022
    学习强国 自动化 百分百正确、瞬间答题,分值45分

    项目简介 学习强国自动化脚本,解放你的时间! 使用Selenium、requests、mitmpoxy、百度智能云文字识别开发而成 使用说明 注:Chrome版本 驱动会自动下载 首次使用会生成数据库文件db.db,用于提高文章、视频任务效率。 依赖安装 pip install -r require

    lisztomania 359 Dec 30, 2022
    Nekopoi scraper using python3

    Features Scrap from url Todo [+] Search by genre [+] Search by query [+] Scrap from homepage Example # Hentai Scraper from nekopoi import Hent

    MhankBarBar 9 Apr 06, 2022