一些爬虫相关的签名、验证码破解

Overview

cracking4crawling

一些爬虫相关的签名、验证码破解,目前已有脚本:

说明:

脚本按目标网站、App命名归档,每个脚本一般都是可以单独导入使用(除非调用了额外的用于加解密的js文件),使用方法可阅读文档或参考其中的test函数。

使用方法:

小红书

小红书App接口签名(shield)

shield是小红书App接口主要的签名,由path、params、xy_common_params、xy_platform_info、data拼接并加密生成。原始加密在libshield.so中,已用python复现。

from urllib import parse

from xiaohongshu.shield import get_sign

# 对接口路径、url参数、header中的xy-common-params、xy-platform-info、请求的data进行签名
path = '/api/sns/v4/note/user/posted'

params = parse.urlencode({'user_id': '5eeb209d000000000101d84a'})

xy_common_params = parse.urlencode({})
    
xy_platform_info = parse.urlencode({})

data = parse.urlencode({})

# 生成签名
sign = get_sign(path=path, 
                params=params, 
                xy_common_params=xy_common_params, 
                xy_platform_info=xy_platform_info,
                data=data)
print(sign)

小红书滑块(数美)验证破解

小红书使用数美滑块验证码,验证过程(获取验证码配置>获取验证码>提交验证)在数美的服务器(数美使用organization来识别被验证的网站、App)上进行,完成后将通过的rid提交到小红书的接口。

具体实现细节:

  • 协议更新:数美会定期自动更新js和接口参数字段(接口里所有两个字母组成的字段名都会在更新修改),通过"/ca/v1/conf"接口返回的js路径可以判断协议版本(如"/pr/auto-build/v1.0.1-33/captcha-sdk.min.js",表示协议版本号为33),脚本会加载js,并通过匹配确认字段名,用于后续的接口请求。
  • 验证参数:验证主要需要三个参数:位移比率、时间、轨迹,使用opencv中的matchTemplate函数计算距离,并随机生成相应的轨迹。
  • 调用加密:提交验证的主要参数都需要加密,使用DES加密。
  • 加密过程:"/ca/v1/register"接口会返回一个参数k,使用"sshummei"作为key对它解密,结果为加密参数所需的key,再对参数进行加密。

注:当前的验证参数全部按照小红书App调整,用于其他验证(如小红书Web或其他网站、App),可能需要调整其中参数。

from xiaohongshu.shumei_slide_captcha import get_verify

# 表示小红书
organization = 'eR46sBuqF0fdw7KWFLYa'

# rid是验证过程中响应的标示,r是最后提交验证返回的响应
rid, r = get_verify(organization)

print(rid, r)

# riskLevel为PASS说明验证通过
if r['riskLevel'] == 'PASS':
    # 这里需要向小红书提交rid
    # 具体可抓包查看,接口:/api/sns/v1/system_service/slide_captcha_check
    pass

海南航空

海南航空App接口签名(hnairSign)

签名对象主要是请求的data,取common、data下的全部参数,按字典序排序进行拼接(list、dict类型不参与拼接),结尾加上slat,进行HMAC_SHA1加密生成。

注:"/user/"下的接口加签时,会在拼接的内容前加上token,同时HMAC_SHA1加密会使用服务器返回的secret

from hnair.hna_signature

# 对请求的data进行签名
data = {
    'common': {
        # common的内容
    },
    'data': {
        'adultCount': 1,
        'cabins': ['*'],
        'childCount': 0,
        'depDate': '2020-12-09',
        'dstCode': 'PEK',
        'infantCount': 0,
        'orgCode': 'YYZ',
        'tripType': 1,
        'type': 3
    }
}

# /user/ 路径下的接口需要登录,同时加签要传入token、secret(都由服务器返回)
# token = ''
# secret = ''

# 生成签名
sign = get_sign(data=data)
print(sign)
Owner
XNFA
XNFA
Generate a repository with mirror links for DriveDroid app

DriveDroid Repository Generator Generate a repository for the app that allow boot a PC using ISO files stored on your Android phone Check also an offi

Evgeny 11 Nov 19, 2022
Scraping Top Repositories for Topics on GitHub,

0.-Webscrapping-using-python Scraping Top Repositories for Topics on GitHub, Web scraping is the process of extracting and parsing data from websites

Dev Aravind D Satprem 2 Mar 18, 2022
OSTA web scraper, for checking the status of school buses in Ottawa

OSTA-La-Vista OSTA web scraper, for checking the status of school buses in Ottawa. Getting Started Using a Raspberry Pi, download Python 3, and option

1 Jan 28, 2022
Pyrics is a tool to scrape lyrics, get rhymes, generate relevant lyrics with rhymes.

Pyrics Pyrics is a tool to scrape lyrics, get rhymes, generate relevant lyrics with rhymes. ./test/run.py provides the full function in terminal cmd

MisterDK 1 Feb 12, 2022
Create crawler get some new products with maximum discount in banimode website

crawler-banimode create crawler and get some new products with maximum discount in banimode website. این پروژه کوچک جهت یادگیری و کار با ابزار سلنیوم

nourollah rezaei 2 Feb 17, 2022
a Scrapy spider that utilizes Postgres as a DB, Squid as a proxy server, Redis for de-duplication and Splash to render JavaScript. All in a microservices architecture utilizing Docker and Docker Compose

This is George's Scraping Project To get started cd into the theZoo file and run: chmod +x script.sh then: ./script.sh This will spin up a Postgres co

George Reyes 7 Nov 27, 2022
Auto Join: A GitHub action script to automatically invite everyone to the organization who star your repository.

Auto Invite To The Organization By Star A GitHub Action script to automatically invite everyone to your organization that stars your repository. What

Max Base 11 Dec 11, 2022
Audio media crawler for lbry.

Audio media crawler for lbry. Requirements Python 3.8 Poetry 1.1.7 Elasticsearch 7.14.0 Lbry-sdk 0.99.0 Development This project uses poetry as a depe

Hound.fm 4 Dec 03, 2022
A web scraper for nomadlist.com, made to avoid website restrictions.

Gypsylist gypsylist.py is a web scraper for nomadlist.com, made to avoid website restrictions. nomadlist.com is a website with a lot of information fo

Alessio Greggi 5 Nov 24, 2022
A Python Covid-19 cases tracker that scrapes data off the web and presents the number of Cases, Recovered Cases, and Deaths that occurred because of the pandemic.

A Python Covid-19 cases tracker that scrapes data off the web and presents the number of Cases, Recovered Cases, and Deaths that occurred because of the pandemic.

Alex Papadopoulos 1 Nov 13, 2021
WebScrapping Project - G1 Latest News

Web Scrapping com Python Esse projeto consiste em um código para o usuário buscar as últimas nóticias sobre um termo qualquer, no site G1. Para esse p

Eduardo Henrique 2 Feb 13, 2022
A Telegram crawler to search groups and channels automatically and collect any type of data from them.

Introduction This is a crawler I wrote in Python using the APIs of Telethon months ago. This tool was not intended to be publicly available for a numb

39 Dec 28, 2022
Raspi-scraper is a configurable python webscraper that checks raspberry pi stocks from verified sellers

Raspi-scraper is a configurable python webscraper that checks raspberry pi stocks from verified sellers.

Louie Cai 13 Oct 15, 2022
Google Developer Profile Badge Scraper

Google Developer Profile Badge Scraper GDev Profile Badge Scraper is a Google Developer Profile Web Scraper which scrapes for specific badges in a use

Siddhant Lad 7 Jan 10, 2022
Find thumbnails and original images from URL or HTML file.

Haul Find thumbnails and original images from URL or HTML file. Demo Hauler on Heroku Installation on Ubuntu $ sudo apt-get install build-essential py

Vinta Chen 150 Oct 15, 2022
中国大学生在线 四史自动答题刷分(现仅支持英雄篇)

中国大学生在线 “四史”学习教育竞答 自动答题 刷分 (现仅支持英雄篇,已更新可用) 若对您有所帮助,记得点个Star 🌟 !!! 中国大学生在线 “四史”学习教育竞答 自动答题 刷分 (现仅支持英雄篇,已更新可用) 🥰 🥰 🥰 依赖 本项目依赖的第三方库: requests 在终端执行以下

XWhite 229 Dec 12, 2022
This script is intended to crawl license information of repositories through the GitHub API.

GithubLicenseCrawler This script is intended to crawl license information of repositories through the GitHub API. Taking a csv file with requirements.

schutera 4 Oct 25, 2022
Anonymously scrapes onlinesim.ru for new usable phone numbers.

phone-scraper Anonymously scrapes onlinesim.ru for new usable phone numbers. Usage Clone the repository $ git clone https://github.com/thomasgruebl/ph

16 Oct 08, 2022
Danbooru scraper with python

Danbooru Version: 0.0.1 License under: MIT License Dependencies Python: = 3.9.7 beautifulsoup4 cloudscraper Example of use Danbooru from danbooru imp

Sugarbell 2 Oct 27, 2022
An helper library to scrape data from TikTok in one line, using the Influencer Hunters APIs.

TikTok Scraper An utility library to scrape data from TikTok hassle-free Go to the website » View Demo · Report Bug · Request Feature About The Projec

6 Jan 08, 2023