Screen scraping and web crawling framework

Last update: Jun 21, 2021

Related tags

Web Crawling python crawler framework scraping crawling asyncio

Overview

Pomp

circleci

codecov

Latest PyPI version

python versions

Have wheel

License

Pomp is a screen scraping and web crawling framework. Pomp is inspired by and similar to Scrapy, but has a simpler implementation that lacks the hard Twisted dependency.

Features:

Pure python
Only one dependency for Python 2.x - concurrent.futures (backport of package for Python 2.x)
Supports one file applications; Pomps doesn't force a specific project layout or other restrictions.
Pomp is a meta framework like Paste: you may use it to create your own scraping framework.
Extensible networking: you may use any sync or async method.
No parsing libraries in the core; use you preferred approach.
Pomp instances may be distributed and are designed to work with an external queue.

Pomp makes no attempt to accomodate:

redirects
proxies
caching
database integration
cookies
authentication
etc.

If you want proxies, redirects, or similar, you may use the excellent requests library as the Pomp downloader.

Pomp is written and maintained by Evgeniy Tatarkin and is licensed under the BSD license.

Owner

Evgeniy Tatarkin

Evgeniy Tatarkin

GitHub Repository https://pomp.readthedocs.org

Nekopoi scraper using python3

Features Scrap from url Todo [+] Search by genre [+] Search by query [+] Scrap from homepage Example # Hentai Scraper from nekopoi import Hent

9 Apr 06, 2022

Get paper names from dblp.org

scraper-dblp Get paper names from dblp.org and store them in a .txt file Useful for a related literature :) Install libraries pip3 install -r requirem

1 Dec 07, 2021

A crawler of doubamovie

豆瓣电影 A crawler of doubamovie 一个小小的入门级scrapy框架的应用，选取豆瓣电影对排行榜前1000的电影数据进行爬取。 spider.py start_requests方法为scrapy的方法，我们对它进行重写。 def start_requests(self):

1 Oct 05, 2021

Explore scraping with BeautifulSoup!

beautifulsoup-scrape Explore scraping with BeautifulSoup! Part One: Start from Shakespeare As my professor is a poet (yes, and he teaches me data and

2 Oct 05, 2022

联通手机营业厅自动做任务、签到、领流量、领积分等。

联通手机营业厅自动完成每日任务，领流量、签到获取积分等，月底流量不发愁。功能沃之树领流量、浇水(12M日流量) 每日签到(1积分+翻倍4积分+第七天1G流量日包) 天天抽奖，每天三次免费机会(随机奖励) 游戏中心每日打卡(连续打卡，积分递增至最高

2k May 06, 2021

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation This repository provides two web crawlers to label domain nam

1 Nov 05, 2021

Scraping script for stats on covid19 pandemic status in Chiba prefecture, Japan

About 千葉県の地域別の詳細感染者統計(Excelファイル) をCSVに変換し、かつ地域別の日時感染者集計値を出力するスクリプトです。 Requirement POSIX互換なシェル, e.g. GNU Bash (1) curl (1) python = 3.8 pandas = 1.1.

1 Nov 29, 2021

This is a python api to scrape search results from a url.

googlescrape Installation Installation is simple! # Stable version pip install googlescrape Examples from googlescrape import client scrapeClient=cli

1 Dec 15, 2022

An application that on a given url, crowls a web page and gets all words, sorts and counts them.

Web-Scrapping-1 An application that on a given url, crowls a web page and gets all words, sorts and counts them. Installation Using the package manage

1 Jan 16, 2022

A pure-python HTML screen-scraping library

Scrapely Scrapely is a library for extracting structured data from HTML pages. Given some example web pages and the data to be extracted, scrapely con

1.8k Dec 31, 2022

Docker containerized Python Flask API that uses selenium to scrape and interact with websites

Docker containerized Python Flask API that uses selenium to scrape and interact with websites

0 Jan 22, 2022

Libextract: extract data from websites

Libextract is a statistics-enabled data extraction library that works on HTML and XML documents and written in Python

499 Dec 09, 2022

New World Market Scraper

Bean Seller A New Worlds market scraper. Deployment This must be installed on Windows as it uses the Windows api to do its stuff Install Prerequisites

4 Sep 21, 2022

:arrow_double_down: Dumb downloader that scrapes the web

You-Get NOTICE: Read this if you are looking for the conventional "Issues" tab. You-Get is a tiny command-line utility to download media contents (vid

46.4k Jan 03, 2023

Web crawling framework based on asyncio.

Web crawling framework for everyone. Written with asyncio, uvloop and aiohttp. Requirements Python3.5+ Installation pip install gain pip install uvloo

2k Jan 05, 2023

Bulk download tool for the MyMedia platform

MyMedia Bulk Content Downloader This is a bulk download tool for the MyMedia platform. USE ONLY WHERE ALLOWED BY THE COPYRIGHT OWNER. NOT AFFILIATED W

3 Oct 14, 2022

Scrapy-based cyber security news finder

Cyber-Security-News-Scraper Scrapy-based cyber security news finder Goal To keep up to date on the constant barrage of information within the field of

2 Nov 01, 2021

A distributed crawler for weibo, building with celery and requests.

A distributed crawler for weibo, building with celery and requests.

4.8k Jan 03, 2023

Python script that reads Aliexpress offers urls from a Excel filename (.csv) and post then in a Telegram channel using a bot

Aliexpress to telegram post Python script that reads Aliexpress offers urls from a Excel filename (.csv) and post then in a Telegram channel using a b

6 Dec 06, 2022

👁️ Tool for Data Extraction and Web Requests.

httpmapper 👁️ Project • Technologies • Installation • How it works • License Project 🚧 For educational purposes. This is a project that I developed,

15 Dec 05, 2021