A universal package of scraper scripts for humans

Related tags

Web CrawlingScrapera
Overview

Logo

MIT License version-shield release-shield python-shield

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Contributing
  5. Sponsors
  6. License
  7. Contact
  8. Acknowledgements

About The Project

Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains. Scrapera directly and asynchronously scrapes from public API endpoints, thereby removing the heavy browser overhead which makes Scrapera extremely fast and robust to DOM changes. Currently, Scrapera supports the following crawlers:

  • Images
  • Text
  • Audio
  • Videos
  • Miscellaneous

  • The main aim of this package is to cluster common scraping tasks so as to make it more convenient for ML researchers and engineers to focus on their models rather than worrying about the data collection process

    DISCLAIMER: Owner or Contributors do not take any responsibility for misuse of data obtained through Scrapera. Contact the owner if copyright terms are violated due to any module provided by Scrapera.

    Prerequisites

    Prerequisites can be installed separately through the requirements.txt file as below

    pip install -r requirements.txt

    Installation

    Scrapera is built with Python 3 and can be pip installed directly

    pip install scrapera

    Alternatively, if you wish to install the latest version directly through GitHub then run

    pip install git+https://github.com/DarshanDeshpande/Scrapera.git

    Usage

    To use any sub-module, you just need to import, instantiate and execute

    from scrapera.video.vimeo import VimeoScraper
    scraper = VimeoScraper()
    scraper.scrape('https://vimeo.com/191955190', '540p')

    For more examples, please refer to the individual test folders in respective modules

    Contributing

    Scrapera welcomes any and all contributions and scraper requests. Please raise an issue if the scraper fails at any instance. Feel free to fork the repository and add your own scrapers to help the community!
    For more guidelines, refer to CONTRIBUTING

    License

    Distributed under the MIT License. See LICENSE for more information.

    Sponsors

    Logo

    Contact

    Feel free to reach out for any issues or requests related to Scrapera

    Darshan Deshpande (Owner) - Email | LinkedIn

    Acknowledgements

    Owner
    Helping Machines Learn Better 💻😃
    a Scrapy spider that utilizes Postgres as a DB, Squid as a proxy server, Redis for de-duplication and Splash to render JavaScript. All in a microservices architecture utilizing Docker and Docker Compose

    This is George's Scraping Project To get started cd into the theZoo file and run: chmod +x script.sh then: ./script.sh This will spin up a Postgres co

    George Reyes 7 Nov 27, 2022
    Dex-scrapper - Hobby project for scrapping dex data on VeChain

    Folders /zumo_abis # abi extracted from zumo repo /zumo_pools # runtime e

    3 Jan 20, 2022
    Semplice scraper realizzato in Python tramite la libreria BeautifulSoup

    Semplice scraper realizzato in Python tramite la libreria BeautifulSoup

    2 Nov 22, 2021
    download NCERT books using scrapy

    download_ncert_books download NCERT books using scrapy Downloading Books: You can either use the spider by cloning this repo and following the instruc

    1 Dec 02, 2022
    Dude is a very simple framework for writing web scrapers using Python decorators

    Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-lea

    Ronie Martinez 326 Dec 15, 2022
    Web scrapping tool written in python3, using regex, to get CVEs, Source and URLs.

    searchcve Web scrapping tool written in python3, using regex, to get CVEs, Source and URLs. Generates a CSV file in the current directory. Uses the NI

    32 Oct 10, 2022
    Web scraper build using python.

    Web Scraper This project is made in pyhthon. It took some info. from website list then add them into data.json file. The dependencies used are: reques

    Shashwat Harsh 2 Jul 22, 2022
    Collection of code files to scrap different kinds of websites.

    STW-Collection Scrap The Web Collection; blog posts. This repo contains Scrapy sample code to scrap the following kind of websites: Do you want to lea

    Tapasweni Pathak 15 Jun 08, 2022
    Twitter Claimer / Swapper / Turbo - Proxyless - Multithreading

    Twitter Turbo / Auto Claimer / Swapper Version: 1.0 Last Update: 01/26/2022 Use this at your own descretion. I've only used this on test accounts and

    Underscores 6 May 02, 2022
    A package designed to scrape data from Yahoo Finance.

    yahoostock A package designed to scrape data from Yahoo Finance. Installation The most simple installation method is through PIP. pip install yahoosto

    Rohan Singh 2 May 28, 2022
    Web Content Retrieval for Humans™

    Lassie Lassie is a Python library for retrieving basic content from websites. Usage import lassie lassie.fetch('http://www.youtube.com/watch?v

    Mike Helmick 570 Dec 19, 2022
    Web Scraping images using Selenium and Python

    Web Scraping images using Selenium and Python A propos de ce document This is a markdown document about Web scraping images and videos using Selenium

    Nafaa BOUGRAINE 3 Jul 01, 2022
    This project was created using Python technology and flask tools to scrape a music site

    python-scrapping This project was created using Python technology and flask tools to scrape a music site You need to install the following packages to

    hosein moradi 1 Dec 07, 2021
    Video Games Web Scraper is a project that crawls websites and APIs and extracts video game related data from their pages.

    Video Games Web Scraper Video Games Web Scraper is a project that crawls websites and APIs and extracts video game related data from their pages. This

    Albert Marrero 1 Jan 12, 2022
    Scraping news from Ucsal portal with Scrapy.

    NewsScraping Esse é um projeto de raspagem das últimas noticias, de 2021, do portal da universidade Ucsal http://noosfero.ucsal.br/institucional Tecno

    Crissiano Pires 0 Sep 30, 2021
    让中国用户使用git从github下载的速度提高1000倍!

    序言 github上有很多好项目,但是国内用户连github却非常的慢.每次都要用插件或者其他工具来解决. 这次自己做一个小工具,输入github原地址后,就可以自动替换为代理地址,方便大家更快速的下载. 安装 pip install cit 主要功能与用法 主要功能 change 将目标地址转换为

    35 Aug 29, 2022
    爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、百度指数、维普万方、Zlibraty、Oalib、小说、招标网、采购网、小红书》

    lxSpider 爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、百度指数、维普万方、Zlibraty、Oalib、小说网站、招标采购网》 简介: 时光荏苒,记不清写了多少案例了。

    lx 793 Jan 05, 2023
    Web scraped S&P 500 Data from Wikipedia using Pandas and performed Exploratory Data Analysis on the data.

    Web scraped S&P 500 Data from Wikipedia using Pandas and performed Exploratory Data Analysis on the data. Then used Yahoo Finance to get the related stock data and displayed them in the form of chart

    Samrat Mitra 3 Sep 09, 2022
    Scrape plants scientific name information from Agroforestry Species Switchboard 2.0.

    Agroforestry Species Switchboard 2.0 Scraper Scrape plants scientific name information from Species Switchboard 2.0. Requirements python = 3.10 (you

    Mgs. M. Rizqi Fadhlurrahman 2 Dec 23, 2021
    Python scrapper scrapping torrent website and download new movies Automatically.

    torrent-scrapper Python scrapper scrapping torrent website and download new movies Automatically. If you like it Put a ⭐ on this repo 😇 Run this git

    Fazil vk 1 Jan 08, 2022