A web crawler for recording posts in "sina weibo"

Last update: Aug 20, 2022

Overview

Web Crawler for "sina weibo"

A web crawler for recording posts in "sina weibo"

Introduction

This script helps collect attributes of posts in "sina weibo". Users can record posts in different lists (or flows, or collections), like the searching results. The supported lists (or flows, or collections) are listed in "Functions" section.

Functions

Scripts currently available:

Name Description

search.py Search for a word and specific time interval and record all posts, the search result.
Parameters: (Edit these parameters at the head of the script.)
search_string: The string to search for. All posts containing this string will be recorded, 50 pages at most.
start_time: Only posts which are posted after this time will be recorded. (Accurate to hour level)
end_time: Only posts which are posted before this time will be recorded. (Accurate to hour level)
rest_time: The interval between two requests, where the unit is second.
Results are saved as Python pickle format at results/weibo-{search_string}-{start_time}-{end_time}.pkl. The start_time and end_time in filename are formatted as Unix timestamp (the unit is second).

Name	Description
`search.py`	Search for a word and specific time interval and record all posts, the search result. Parameters: (Edit these parameters at the head of the script.) `search_string`: The string to search for. All posts containing this string will be recorded, 50 pages at most. `start_time`: Only posts which are posted after this time will be recorded. (Accurate to hour level) `end_time`: Only posts which are posted before this time will be recorded. (Accurate to hour level) `rest_time`: The interval between two requests, where the unit is second. Results are saved as Python pickle format at `results/weibo-{search_string}-{start_time}-{end_time}.pkl`. The `start_time` and `end_time` in filename are formatted as Unix timestamp (the unit is second).

Installation

Run pip install -r requirements.txt.
According to "Function" section, find the script you need.
Edit parameters at the head of the script.
Run the script with Python.

A web crawler for recording posts in "sina weibo"

Related tags

Overview

Web Crawler for "sina weibo"

Introduction

Functions

Installation

Owner

A simple, configurable and expandable combined shop scraper to minimize the costs of ordering several items

Divar.ir Ads scrapper

Find papers by keywords and venues. Then download it automatically

LSpider 一个为被动扫描器定制的前端爬虫

Works very well and you can ask for the type of image you want the scrapper to collect.

Scrape puzzle scrambles from csTimer.net

Crawler in Python 3.7, 3.8. 3.9. Pypy3

Fundamentus scrapy

A scalable frontier for web crawlers

A Telegram crawler to search groups and channels automatically and collect any type of data from them.

This script is intended to crawl license information of repositories through the GitHub API.

This is a webscraper for a specific website

TarkovScrappy - A nifty little bot that lets you know if a queried item might be required for a quest at some point in the land of Tarkov!

CRI Scrape is a tool for get general info about Italian Red Cross in GAIA Platform

Bulk download tool for the MyMedia platform

Automated data scraper for Thailand COVID-19 data

用python爬取江苏几大高校的就业网站，并提供3种方式通知给用户，分别是通过微信发送、命令行直接输出、windows气泡通知。

mlscraper: Scrape data from HTML pages automatically with Machine Learning

Scrape Twitter for Tweets

A Python web scraper to scrape latest posts from official Coinbase's Blog.