a high-performance, lightweight and human friendly serving engine for scrapy

Last update: Mar 01, 2022

Related tags

Overview

scrapy-x (X)

a distributed, scalable and lightweight environment for deploying and running scrapy spiders/projects with no-hassle on commodity hardware, also it is compatible with scrapyd /schedule.json and /daemonstatus.json.

Installation

$ pip install -U git+git://github.com/speakol-ads/scrapy-x.git

Usage

let's assume that you have a project called TestCrawler

cd to TestCrawler
run scrapy x
that is all!

Default Settings

it utilizes your default project settings.py file

# whether to enable debug mode or not
X_DEBUG = True

# the default queue name that the system will use
# actually it will be used as a prefix for its internal
# queues, currently there is only one queue called `X_QUEUE_NAME + '.BACKLOG'`
# which holds all jobs that should be crawled.
X_QUEUE_NAME = 'SCRAPY_X_QUEUE'

# the queue workers
# by default it uses the cpu cores count
# try to adjust it based on your resources & needs
X_QUEUE_WORKERS_COUNT = os.cpu_count()

# the webserver workers count
# the workers count required from uvicorn to spwan
# defaults to the available cpu count
# try to adjust it based on your resources & needs
X_SERVER_WORKERS_COUNT = os.cpu_count()

# the port the http server should listen on
X_SERVER_LISTEN_PORT = 6800

# the host used by the http server to listen on
X_SERVER_LISTEN_HOST = '0.0.0.0'

# whether to enable access log or not
X_ENABLE_ACCESS_LOG = True

# redis host
X_REDIS_HOST = 'localhost'

# redis port
X_REDIS_PORT = 6379

# redis db
X_REDIS_DB = 0

# redis password
X_REDIS_PASSWORD = ''

# the maximum allowed wait time for a running task
# it will be killed after that time.
X_TASK_TIMEOUT = 25

Available Endpoints

as well scrapyd core endpoints like (schedule.json, daemonstatus.json), you have the following too:

GET /

returns some info about the engine like the available spiders and backlog queue length

GET|POST /run/{spider_name}

execute the specified spider in {spider_name} and wait for it to return its result, P.S: any query param and json post data will be passed to the spider as argument -a key=value

GET|POST /enqueue/{spider_name}

adding the specified spider in {spider_name} to the backlog to be executed later, P.S: any query param and json post data will be used as spider argument

Technologies Used

Author

I'm Mohamed, a software engineer who enjoys writing code in his free time, I'm speaking python, php, go, rust and js

My Similar Projects

P.S: star the project if you liked it ^_^

a high-performance, lightweight and human friendly serving engine for scrapy

Related tags

Overview

scrapy-x (X)

Installation

Usage

Default Settings

Available Endpoints

Technologies Used

Author

My Similar Projects

Owner

Speakol Ads

Get paper names from dblp.org

This is my CS 20 final assesment.

Semplice scraper realizzato in Python tramite la libreria BeautifulSoup

Google Developer Profile Badge Scraper

Scraping Thailand COVID-19 data from the DDC's tableau dashboard

Scrape all the media from an OnlyFans account - Updated regularly

Basic-html-scraper - A complete how to of web scraping with Python for beginners

Scrape Twitter for Tweets

Dex-scrapper - Hobby project for scrapping dex data on VeChain

Proxy scraper. Format: IP | PORT | COUNTRY | TYPE

学习强国自动化百分百正确、瞬间答题，分值45分

12306抢票脚本

Web Scraping images using Selenium and Python

Minecraft Item Scraper

Twitter Scraper

A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.

Web Content Retrieval for Humans™

A Simple Web Scraper made to Extract Download Links from Todaytvseries2.com

A package designed to scrape data from Yahoo Finance.

This is a sport analytics project that combines the knowledge of OOP and Webscraping

a high-performance, lightweight and human friendly serving engine for scrapy

Related tags

Overview

scrapy-x (X)

Installation

Usage

Default Settings

Available Endpoints

Technologies Used

Author

My Similar Projects

Owner

Speakol Ads

Get paper names from dblp.org

This is my CS 20 final assesment.

Semplice scraper realizzato in Python tramite la libreria BeautifulSoup

Google Developer Profile Badge Scraper

Scraping Thailand COVID-19 data from the DDC's tableau dashboard

Scrape all the media from an OnlyFans account - Updated regularly

Basic-html-scraper - A complete how to of web scraping with Python for beginners

Scrape Twitter for Tweets

Dex-scrapper - Hobby project for scrapping dex data on VeChain

Proxy scraper. Format: IP | PORT | COUNTRY | TYPE

学习强国 自动化 百分百正确、瞬间答题，分值45分

12306抢票脚本

Web Scraping images using Selenium and Python

Minecraft Item Scraper

Twitter Scraper

A web scraping pipeline project that retrieves TV and movie data from two sources, then transforms and stores data in a MySQL database.

Web Content Retrieval for Humans™

A Simple Web Scraper made to Extract Download Links from Todaytvseries2.com

A package designed to scrape data from Yahoo Finance.

This is a sport analytics project that combines the knowledge of OOP and Webscraping

学习强国自动化百分百正确、瞬间答题，分值45分