download NCERT books using scrapy

Overview

download_ncert_books

download NCERT books using scrapy

NCERT_CLASS_1 NCERT_CLASS_2 NCERT_CLASS_3 NCERT_CLASS_4 NCERT_CLASS_5 NCERT_CLASS_6 NCERT_CLASS_7 NCERT_CLASS_8 NCERT_CLASS_9 NCERT_CLASS_10 NCERT_CLASS_11 NCERT_CLASS_12

Downloading Books:

You can either use the spider by cloning this repo and following the instructions given below
or
You can download the books direcly from the release section or by clicking on the badges above

There are 2 different kind of zips in the release section for every class

  1. Book wise NCERT_CLASS_ClassNo_Subject_BookName.zip : These zips contain the Chapters of the BookName for the Subject of the ClassNo
  2. Books Text Class_ClassNo_Text.zip : These zips contain the text extracted from all the books of the ClassNo

How to use the spider

Initial Setup

git clone https://github.com/nit-in/download_ncert_books.git
cd download_ncert_books
pip install -r requirements.txt

to run the spider

scrapy crawl --nolog ncert

and follow the prompts

for example if you want to download Class 11th Economics Book

 scrapy crawl  --nolog ncert                                                                                                                                      ─╯

Enter the class:        11

Select one the subjects:
Enter 1 for Sanskrit
Enter 2 for Accountancy
Enter 3 for Chemistry
Enter 4 for Mathematics
Enter 5 for Economics
Enter 6 for Psychology
Enter 7 for Geography

and so on ...

Enter subject number:   5

Select one the books:
Enter 1 for Indian Economic Development
Enter 2 for Statistics for Economics
Enter 3 for Sankhyiki
Enter 4 for Bhartiya Airthryavstha Ka Vikas 
Enter 5 for Hindustan Ki Moaashi Tarraqqi(Urdu)
Enter 6 for Shumariyaat Bar-e-Mushiyat(Urdu)

Enter book number:      1

Downloading...  Class: Class11  Subject: Economics      Book: Indian_Economic_Development       Chapters: 10


downloading keec1ps.pdf to  /home/user/ncert/Class11/Economics/Indian_Economic_Development/keec1ps.pdf
downloading keec101.pdf to  /home/user/ncert/Class11/Economics/Indian_Economic_Development/keec101.pdf
downloading keec102.pdf to  /home/user/ncert/Class11/Economics/Indian_Economic_Development/keec102.pdf

			OR 

to download multiple books

enter their numbers separated by commas

e.g. 

Select one the books:
Enter 1 for Indian Economic Development
Enter 2 for Statistics for Economics
Enter 3 for Sankhyiki
Enter 4 for Bhartiya Airthryavstha Ka Vikas 
Enter 5 for Hindustan Ki Moaashi Tarraqqi(Urdu)
Enter 6 for Shumariyaat Bar-e-Mushiyat(Urdu)

Enter book number:      1,2

if you want to see scrapy spider log

scrapy shell ncert
You might also like...
Snowflake database loading utility with Scrapy integration

Snowflake Stage Exporter Snowflake database loading utility with Scrapy integration. Meant for streaming ingestion of JSON serializable objects into S

Scraping news from Ucsal portal with Scrapy.

NewsScraping Esse é um projeto de raspagem das últimas noticias, de 2021, do portal da universidade Ucsal http://noosfero.ucsal.br/institucional Tecno

a Scrapy spider that utilizes Postgres as a DB, Squid as a proxy server, Redis for de-duplication and Splash to render JavaScript. All in a microservices architecture utilizing Docker and Docker Compose

This is George's Scraping Project To get started cd into the theZoo file and run: chmod +x script.sh then: ./script.sh This will spin up a Postgres co

Fundamentus scrapy

Fundamentus_scrapy Baixa informacões que os outros scrapys do fundamentus não realizam. Para iniciar (python main.py), sera criado um arquivo chamado

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo.

Crawler do site Fundamentus.com com o uso do framework scrapy, tanto da aba detalhada como a de resumo. (Todas as infomações)

Scrapy-based cyber security news finder

Cyber-Security-News-Scraper Scrapy-based cyber security news finder Goal To keep up to date on the constant barrage of information within the field of

Scrapy uses Request and Response objects for crawling web sites.

Requests and Responses¶ Scrapy uses Request and Response objects for crawling web sites. Typically, Request objects are generated in the spiders and p

Bigdata - This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster

Scrapy Cluster This Scrapy project uses Redis and Kafka to create a distributed

Iptvcrawl - A scrapy project for crawl IPTV playlist

iptvcrawl a scrapy project for crawl IPTV playlist. Dependency Python3 pip insta

Comments
  • Bump requests from 2.26.0 to 2.28.1

    Bump requests from 2.26.0 to 2.28.1

    Bumps requests from 2.26.0 to 2.28.1.

    Release notes

    Sourced from requests's releases.

    v2.28.1

    2.28.1 (2022-06-29)

    Improvements

    • Speed optimization in iter_content with transition to yield from. (#6170)

    Dependencies

    • Added support for chardet 5.0.0 (#6179)
    • Added support for charset-normalizer 2.1.0 (#6169)

    New Contributors

    Full Changelog: https://github.com/psf/requests/blob/main/HISTORY.md#2281-2022-06-29

    v2.28.0

    2.28.0 (2022-06-09)

    Deprecations

    • ⚠️ Requests has officially dropped support for Python 2.7. ⚠️ (#6091)
    • Requests has officially dropped support for Python 3.6 (including pypy3). (#6091)

    Improvements

    • Wrap JSON parsing issues in Request's JSONDecodeError for payloads without an encoding to make json() API consistent. (#6097)
    • Parse header components consistently, raising an InvalidHeader error in all invalid cases. (#6154)
    • Added provisional 3.11 support with current beta build. (#6155)
    • Requests got a makeover and we decided to paint it black. (#6095)

    Bugfixes

    • Fixed bug where setting CURL_CA_BUNDLE to an empty string would disable cert verification. All Requests 2.x versions before 2.28.0 are affected. (#6074)
    • Fixed urllib3 exception leak, wrapping urllib3.exceptions.SSLError with requests.exceptions.SSLError for content and iter_content. (#6057)
    • Fixed issue where invalid Windows registry entires caused proxy resolution to raise an exception rather than ignoring the entry. (#6149)
    • Fixed issue where entire payload could be included in the error message for JSONDecodeError. (#6079)

    New Contributors

    ... (truncated)

    Changelog

    Sourced from requests's changelog.

    2.28.1 (2022-06-29)

    Improvements

    • Speed optimization in iter_content with transition to yield from. (#6170)

    Dependencies

    • Added support for chardet 5.0.0 (#6179)
    • Added support for charset-normalizer 2.1.0 (#6169)

    2.28.0 (2022-06-09)

    Deprecations

    • ⚠️ Requests has officially dropped support for Python 2.7. ⚠️ (#6091)
    • Requests has officially dropped support for Python 3.6 (including pypy3.6). (#6091)

    Improvements

    • Wrap JSON parsing issues in Request's JSONDecodeError for payloads without an encoding to make json() API consistent. (#6097)
    • Parse header components consistently, raising an InvalidHeader error in all invalid cases. (#6154)
    • Added provisional 3.11 support with current beta build. (#6155)
    • Requests got a makeover and we decided to paint it black. (#6095)

    Bugfixes

    • Fixed bug where setting CURL_CA_BUNDLE to an empty string would disable cert verification. All Requests 2.x versions before 2.28.0 are affected. (#6074)
    • Fixed urllib3 exception leak, wrapping urllib3.exceptions.SSLError with requests.exceptions.SSLError for content and iter_content. (#6057)
    • Fixed issue where invalid Windows registry entires caused proxy resolution to raise an exception rather than ignoring the entry. (#6149)
    • Fixed issue where entire payload could be included in the error message for JSONDecodeError. (#6036)

    2.27.1 (2022-01-05)

    Bugfixes

    • Fixed parsing issue that resulted in the auth component being dropped from proxy URLs. (#6028)

    2.27.0 (2022-01-03)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Bump itemadapter from 0.4.0 to 0.7.0

    Bump itemadapter from 0.4.0 to 0.7.0

    Bumps itemadapter from 0.4.0 to 0.7.0.

    Release notes

    Sourced from itemadapter's releases.

    v0.7.0

    What's Changed

    New Contributors

    Full Changelog: https://github.com/scrapy/itemadapter/compare/v0.6.0...v0.7.0

    v0.6.0

    What's Changed

    Full Changelog: https://github.com/scrapy/itemadapter/compare/v0.5.0...v0.6.0

    v0.5.0

    What's Changed

    Full Changelog: https://github.com/scrapy/itemadapter/compare/v0.4.0...v0.5.0

    Changelog

    Sourced from itemadapter's changelog.

    0.7.0 (2022-08-02)

    ItemAdapter.get_field_names_from_class (#64)

    0.6.0 (2022-05-12)

    Slight performance improvement (#62)

    0.5.0 (2022-03-18)

    Improve performance by removing imports inside functions (#60)

    Commits
    • 0bd037c Bump version: 0.6.0 → 0.7.0
    • 8f3826a Update changelog for 0.7.0
    • 900ae14 ItemAdapter.get_field_names_from_class (#64)
    • 927ee25 Bump version: 0.5.0 → 0.6.0
    • 86f82ea Update changelog for 0.6.0
    • 8f239bc Merge pull request #62 from scrapy/performance
    • 60c9ccc Merge pull request #61 from scrapy/fix-repr
    • 8733014 Replace 'any' ocurrences
    • d66aa62 Remove hardcoded class name in ItemAdapter.repr
    • 1203b5e Bump version: 0.4.0 → 0.5.0
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Bump scrapy from 2.5.0 to 2.7.1

    Bump scrapy from 2.5.0 to 2.7.1

    Bumps scrapy from 2.5.0 to 2.7.1.

    Release notes

    Sourced from scrapy's releases.

    2.7.1

    • Relaxed the restriction introduced in 2.6.2 so that the Proxy-Authentication header can again be set explicitly in certain cases, restoring compatibility with scrapy-zyte-smartproxy 2.1.0 and older
    • Bug fixes

    See the full changelog

    2.7.0

    See the full changelog

    2.6.3

    Makes pip install Scrapy work again.

    It required making changes to support pyOpenSSL 22.1.0. We had to drop support for SSLv3 as a result.

    We also upgraded the minimum versions of some dependencies.

    See the changelog.

    2.6.2

    Fixes a security issue around HTTP proxy usage, and addresses a few regressions introduced in Scrapy 2.6.0.

    See the changelog.

    2.6.1

    Fixes a regression introduced in 2.6.0 that would unset the request method when following redirects.

    2.6.0

    • Security fixes for cookie handling (see details below)
    • Python 3.10 support
    • asyncio support is no longer considered experimental, and works out-of-the-box on Windows regardless of your Python version
    • Feed exports now support pathlib.Path output paths and per-feed item filtering and post-processing

    See the full changelog

    Security bug fixes

    • When a Request object with cookies defined gets a redirect response causing a new Request object to be scheduled, the cookies defined in the original Request object are no longer copied into the new Request object.

      If you manually set the Cookie header on a Request object and the domain name of the redirect URL is not an exact match for the domain of the URL of the original Request object, your Cookie header is now dropped from the new Request object.

      The old behavior could be exploited by an attacker to gain access to your cookies. Please, see the cjvr-mfj7-j4j8 security advisory for more information.

    ... (truncated)

    Changelog

    Sourced from scrapy's changelog.

    Scrapy 2.7.1 (2022-11-02)

    New features

    
    -   Relaxed the restriction introduced in 2.6.2 so that the
        ``Proxy-Authentication`` header can again be set explicitly, as long as the
        proxy URL in the :reqmeta:`proxy` metadata has no other credentials, and
        for as long as that proxy URL remains the same; this restores compatibility
        with scrapy-zyte-smartproxy 2.1.0 and older (:issue:`5626`).
    

    Bug fixes

    
    -   Using ``-O``/``--overwrite-output`` and ``-t``/``--output-format`` options
        together now produces an error instead of ignoring the former option
        (:issue:`5516`, :issue:`5605`).
    
    • Replaced deprecated :mod:asyncio APIs that implicitly use the current event loop with code that explicitly requests a loop from the event loop policy (:issue:5685, :issue:5689).

    • Fixed uses of deprecated Scrapy APIs in Scrapy itself (:issue:5588, :issue:5589).

    • Fixed uses of a deprecated Pillow API (:issue:5684, :issue:5692).

    • Improved code that checks if generators return values, so that it no longer fails on decorated methods and partial methods (:issue:5323, :issue:5592, :issue:5599, :issue:5691).

    Documentation </code></pre> <ul> <li> <p>Upgraded the Code of Conduct to Contributor Covenant v2.1 (:issue:<code>5698</code>).</p> </li> <li> <p>Fixed typos (:issue:<code>5681</code>, :issue:<code>5694</code>).</p> </li> </ul> <p>Quality assurance</p> <pre><code>

    • Re-enabled some erroneously disabled flake8 checks (:issue:5688).

    • Ignored harmless deprecation warnings from :mod:typing in tests (:issue:5686, :issue:5697).

    • Modernized our CI configuration (:issue:5695, :issue:5696).

    &lt;/tr&gt;&lt;/table&gt; </code></pre> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary>

    <ul> <li><a href="https://github.com/scrapy/scrapy/commit/6ded3cf4cd134b615239babe28bb28c3ff524b05"><code>6ded3cf</code></a> Bump version: 2.7.0 → 2.7.1</li> <li><a href="https://github.com/scrapy/scrapy/commit/95880c5de1b1909bf03303fb9c02cddb0508fe1a"><code>95880c5</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/scrapy/scrapy/issues/5701">#5701</a> from scrapy/relnotes-2.7.1</li> <li><a href="https://github.com/scrapy/scrapy/commit/5ec175b8bb08f93c431d7d64d2389b90ec7a1f37"><code>5ec175b</code></a> Small relnotes fixes.</li> <li><a href="https://github.com/scrapy/scrapy/commit/940a73863bf7dcb16b3f2d9f5efb83efe4599712"><code>940a738</code></a> Release notes for 2.7.1.</li> <li><a href="https://github.com/scrapy/scrapy/commit/a95a338eeada7275a5289cf036136610ebaf07eb"><code>a95a338</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/scrapy/scrapy/issues/5599">#5599</a> from tonal/patch-1</li> <li><a href="https://github.com/scrapy/scrapy/commit/9077d0f9b490114f117c668f115240c16afccedf"><code>9077d0f</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/scrapy/scrapy/issues/5698">#5698</a> from pankali/patch-1</li> <li><a href="https://github.com/scrapy/scrapy/commit/76c2cb070e4efe3ae33a4b3d72a5bcac6709f48f"><code>76c2cb0</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/scrapy/scrapy/issues/5697">#5697</a> from iamkaushal/<a href="https://github-redirect.dependabot.com/scrapy/scrapy/issues/5686">#5686</a>_fix</li> <li><a href="https://github.com/scrapy/scrapy/commit/9f45be439de8a3b9a6d201c33e98b408a73c02bb"><code>9f45be4</code></a> Update Code of Conduct to Contributor Covenant v2.1</li> <li><a href="https://github.com/scrapy/scrapy/commit/bd9e482c2f0db92065708c8291be6e8bc1f05218"><code>bd9e482</code></a> added typing.io and typing.re in pytest warning filter to ignore</li> <li><a href="https://github.com/scrapy/scrapy/commit/fd692f309105d917f5f46bd00a88c550d6cc7da3"><code>fd692f3</code></a> Prevent running the -O and -t command-line options together (<a href="https://github-redirect.dependabot.com/scrapy/scrapy/issues/5605">#5605</a>)</li> <li>Additional commits viewable in <a href="https://github.com/scrapy/scrapy/compare/2.5.0...2.7.1">compare view</a></li> </ul> </details>

    <br />

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
Releases(class_9)
Owner
coding is a hobby; Not professionally educated in programming; If you find issues or mistake DO tell me ;-)
Automated data scraper for Thailand COVID-19 data

The Researcher COVID data Automated data scraper for Thailand COVID-19 data Accessing the Data 1st Dose Provincial Vaccination Data 2nd Dose Provincia

Porames Vatanaprasan 31 Apr 17, 2022
A high-level distributed crawling framework.

Cola: high-level distributed crawling framework Overview Cola is a high-level distributed crawling framework, used to crawl pages and extract structur

Xuye (Chris) Qin 1.5k Jan 04, 2023
Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

Gerapy Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js. Documentation Documentation

Gerapy 2.9k Jan 03, 2023
Iptvcrawl - A scrapy project for crawl IPTV playlist

iptvcrawl a scrapy project for crawl IPTV playlist. Dependency Python3 pip insta

Zhijun 18 May 05, 2022
An Web Scraping API for MDL(My Drama List) for Python.

PyMDL An API for MyDramaList(MDL) based on webscraping for python. Description An API for MDL to make your life easier in retriving and working on dat

6 Dec 10, 2022
This code will be able to scrape movies from a movie website and also provide download links to newly uploaded movies.

Movies-Scraper You are probably tired of navigating through a movie website to get the right movie you'd want to watch during the weekend. There may e

1 Jan 31, 2022
Meme-videos - Scrapes memes and turn them into a video compilations

Meme Videos Scrapes memes from reddit using praw and request and then converts t

Partho 12 Oct 28, 2022
A web scraper for nomadlist.com, made to avoid website restrictions.

Gypsylist gypsylist.py is a web scraper for nomadlist.com, made to avoid website restrictions. nomadlist.com is a website with a lot of information fo

Alessio Greggi 5 Nov 24, 2022
The core packages of security analyzer web crawler

Security Analyzer 🐍 A large scale web crawler (considered also as vulnerability scanner tool) to take an overview about security of Moroccan sites Cu

Security Analyzer 10 Jul 03, 2022
A dead simple crawler to get books information from Douban.

Introduction A dead simple crawler to get books information from Douban. Pre-requesites Python 3 Install dependencies from requirements.txt (Optional)

Yun Wang 1 Jan 10, 2022
爱奇艺会员,腾讯视频,哔哩哔哩,百度,各类签到

My-Actions 个人收集并适配Github Actions的各类签到大杂烩 不要fork了 ⭐️ star就行 使用方式 新建仓库并同步代码 点击Settings - Secrets - 点击绿色按钮 (如无绿色按钮说明已激活。直接到下一步。) 新增 new secret 并设置 Secr

280 Dec 30, 2022
Twitter Scraper

Twitter's API is annoying to work with, and has lots of limitations — luckily their frontend (JavaScript) has it's own API, which I reverse–engineered. No API rate limits. No restrictions. Extremely

Tayyab Kharl 45 Dec 30, 2022
An Automated udemy coupons scraper which scrapes coupons and autopost the result in blogspot post

Autoscraper-n-blogger An Automated udemy coupons scraper which scrapes coupons and autopost the result in blogspot post and notifies via Telegram bot

GOKUL A.P 13 Dec 21, 2022
Scraping script for stats on covid19 pandemic status in Chiba prefecture, Japan

About 千葉県の地域別の詳細感染者統計(Excelファイル) をCSVに変換し、かつ地域別の日時感染者集計値を出力するスクリプトです。 Requirement POSIX互換なシェル, e.g. GNU Bash (1) curl (1) python = 3.8 pandas = 1.1.

Conv4Japan 1 Nov 29, 2021
AssistScraper - program for /r/nba to use to find list of all players a player assisted and how many assists each player recieved

AssistScraper - program for /r/nba to use to find list of all players a player assisted and how many assists each player recieved

5 Nov 25, 2021
SearchifyX, predecessor to Searchify, is a fast Quizlet, Quizizz, and Brainly webscraper with various stealth features.

SearchifyX SearchifyX, predecessor to Searchify, is a fast Quizlet, Quizizz, and Brainly webscraper with various stealth features. SearchifyX lets you

28 Dec 20, 2022
CreamySoup - a helper script for automated SourceMod plugin updates management.

CreamySoup/"Creamy SourceMod Updater" (or just soup for short), a helper script for automated SourceMod plugin updates management.

3 Jan 03, 2022
爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、百度指数、维普万方、Zlibraty、Oalib、小说、招标网、采购网、小红书》

lxSpider 爬虫案例合集。包括但不限于《淘宝、京东、天猫、豆瓣、抖音、快手、微博、微信、阿里、头条、pdd、优酷、爱奇艺、携程、12306、58、搜狐、百度指数、维普万方、Zlibraty、Oalib、小说网站、招标采购网》 简介: 时光荏苒,记不清写了多少案例了。

lx 793 Jan 05, 2023
PaperRobot: a paper crawler that can quickly download numerous papers, facilitating paper studying and management

PaperRobot PaperRobot 是一个论文抓取工具,可以快速批量下载大量论文,方便后期进行持续的论文管理与学习。 PaperRobot通过多个接口抓取论文,目前抓取成功率维持在90%以上。通过配置Config文件,可以抓取任意计算机领域相关会议的论文。 Installation Down

moxiaoxi 47 Nov 23, 2022
Lovely Scrapper

Lovely Scrapper

Tushar Gadhe 2 Jan 01, 2022