A universal package of scraper scripts for humans

Related tags

Web CrawlingScrapera
Overview

Logo

MIT License version-shield release-shield python-shield

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Contributing
  5. Sponsors
  6. License
  7. Contact
  8. Acknowledgements

About The Project

Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains. Scrapera directly and asynchronously scrapes from public API endpoints, thereby removing the heavy browser overhead which makes Scrapera extremely fast and robust to DOM changes. Currently, Scrapera supports the following crawlers:

  • Images
  • Text
  • Audio
  • Videos
  • Miscellaneous

  • The main aim of this package is to cluster common scraping tasks so as to make it more convenient for ML researchers and engineers to focus on their models rather than worrying about the data collection process

    DISCLAIMER: Owner or Contributors do not take any responsibility for misuse of data obtained through Scrapera. Contact the owner if copyright terms are violated due to any module provided by Scrapera.

    Prerequisites

    Prerequisites can be installed separately through the requirements.txt file as below

    pip install -r requirements.txt

    Installation

    Scrapera is built with Python 3 and can be pip installed directly

    pip install scrapera

    Alternatively, if you wish to install the latest version directly through GitHub then run

    pip install git+https://github.com/DarshanDeshpande/Scrapera.git

    Usage

    To use any sub-module, you just need to import, instantiate and execute

    from scrapera.video.vimeo import VimeoScraper
    scraper = VimeoScraper()
    scraper.scrape('https://vimeo.com/191955190', '540p')

    For more examples, please refer to the individual test folders in respective modules

    Contributing

    Scrapera welcomes any and all contributions and scraper requests. Please raise an issue if the scraper fails at any instance. Feel free to fork the repository and add your own scrapers to help the community!
    For more guidelines, refer to CONTRIBUTING

    License

    Distributed under the MIT License. See LICENSE for more information.

    Sponsors

    Logo

    Contact

    Feel free to reach out for any issues or requests related to Scrapera

    Darshan Deshpande (Owner) - Email | LinkedIn

    Acknowledgements

    Owner
    Helping Machines Learn Better 💻😃
    This is a python api to scrape search results from a url.

    googlescrape Installation Installation is simple! # Stable version pip install googlescrape Examples from googlescrape import client scrapeClient=cli

    1 Dec 15, 2022
    Web scraped S&P 500 Data from Wikipedia using Pandas and performed Exploratory Data Analysis on the data.

    Web scraped S&P 500 Data from Wikipedia using Pandas and performed Exploratory Data Analysis on the data. Then used Yahoo Finance to get the related stock data and displayed them in the form of chart

    Samrat Mitra 3 Sep 09, 2022
    This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster

    This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.

    IST Research 1.1k Jan 06, 2023
    A Python package that scrapes Google News article data while remaining undetected by Google.

    A Python package that scrapes Google News article data while remaining undetected by Google. Our scraper can scrape page data up until the last page and never trigger a CAPTCHA (download stats: https

    Geminid Systems, Inc 6 Aug 10, 2022
    Subscrape - A Python scraper for substrate chains

    subscrape A Python scraper for substrate chains that uses Subscan. Usage copy co

    ChaosDAO 14 Dec 15, 2022
    12306抢票脚本

    12306抢票脚本

    罐子里的茶 457 Jan 05, 2023
    A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.

    New to Streaming Scraper An in-progress web scraping project built with Python, R, and SQL. The scraped data are movie and TV show information. The go

    Charles Dungy 1 Mar 28, 2022
    Divar.ir Ads scrapper

    Divar.ir Ads Scrapper Introduction This project first asynchronously grab Divar.ir Ads and then save to .csv and .xlsx files named data.csv and data.x

    Iman Kermani 4 Aug 29, 2022
    a way to scrape a database of all of the isef projects

    ISEF Database This is a simple web scraper which gets all of the projects and abstract information from here. My goal for this is for someone to get i

    William Kaiser 1 Mar 18, 2022
    An introduction to free, automated web scraping with GitHub’s powerful new Actions framework.

    An introduction to free, automated web scraping with GitHub’s powerful new Actions framework Published at palewi.re/docs/first-github-scraper/ Contrib

    Ben Welsh 15 Nov 24, 2022
    Generate a repository with mirror links for DriveDroid app

    DriveDroid Repository Generator Generate a repository for the app that allow boot a PC using ISO files stored on your Android phone Check also an offi

    Evgeny 11 Nov 19, 2022
    Binance harvester - A Python 3 script to harvest data from the Binance socket stream and calculate popular TA indicators and produce lists of top trending coins

    Binance harvester - A Python 3 script to harvest data from the Binance socket stream and calculate popular TA indicators and produce lists of top trending coins

    68 Oct 08, 2022
    Telegram Group Scrapper

    this programe is make your work so much easy on telegrame. do you want to send messages on everyone to your group or others group. use this script it will do your work automatically with one click. a

    HackArrOw 3 Dec 03, 2022
    Instagram profile scrapper with python

    IG Profile Scrapper Instagram profile Scrapper Just type the username, and boo! :D Instalation clone this repo to your computer git clone https://gith

    its Galih 6 Nov 07, 2022
    VG-Scraper is a python program using the module called BeautifulSoup which allows anyone to scrape something off an website. This program lets you put in a number trough an input and a number is 1 news article.

    VG-Scraper VG-Scraper is a convinient program where you can find all the news articles instead of finding one yourself. Installing [Linux] Open a term

    3 Feb 13, 2022
    Automated Linkedin bot that will improve your visibility and increase your network.

    LinkedinSpider LinkedinSpider is a small project using browser automating to increase your visibility and network of connections on Linkedin. DISCLAIM

    Frederik 2 Nov 26, 2021
    Goblyn is a Python tool focused to enumeration and capture of website files metadata.

    Goblyn Metadata Enumeration What's Goblyn? Goblyn is a tool focused to enumeration and capture of website files metadata. How it works? Goblyn will se

    Gustavo 46 Nov 22, 2022
    New World Market Scraper

    Bean Seller A New Worlds market scraper. Deployment This must be installed on Windows as it uses the Windows api to do its stuff Install Prerequisites

    4 Sep 21, 2022
    Open Crawl Vietnamese Text

    Open Crawl Vietnamese Text This repo contains crawled Vietnamese text from multiple sources. This list of a topic-centric public data sources in high

    QAI Research 4 Jan 05, 2022
    Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

    Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation This repository provides two web crawlers to label domain nam

    1 Nov 05, 2021