This is a simple website crawler which asks for a website link from the user to crawl and find specific data from the given website address.

Last update: Jan 10, 2022

Related tags

Web Crawling Website-Crawler-Python-

Overview

Website-Crawler-Python

This is a simple website crawler which asks for a website link from the user to crawl and find specific data from the given website address. After getting the website address, it asks for how much crawling depth the user wants in between the number of links has been found after providing the website address.

Website Crawler takes 3 inputs:

A website address
Integer value for the crawling depth
A user specified regular expression to find user specific data

General tasks:

Find all the Nowgegian mobile numbers and saves into a text file.
Find all the sub-links inside the given website and saves into a text file.
Saves the website's raw HTML code into a text file.
Find all email addresses and save into a text file.
Find all the comments used in the website and saves it into a text file.
Find five most used words and print it into the terminal.

This is a Python based project and used some dependent libraries to execute the functionalities.

RegEx
Urllib3
BeautifulSoup 4
Counter in Collections

This is a simple website crawler which asks for a website link from the user to crawl and find specific data from the given website address.

Related tags

Overview

Website-Crawler-Python

Owner

Faisal Ahmed

淘宝、天猫半价抢购，抢电视、抢茅台，干死黄牛党

A web scraper for nomadlist.com, made to avoid website restrictions.

Automatically scrapes all menu items from the Taco Bell website

HappyScrapper - Google news web scrapper with python

A list of Python Bots used to extract data from several websites

tweet random sand cat pictures

Script used to download data for stocks.

Comment Webpage Screenshot is a GitHub Action that captures screenshots of web pages and HTML files located in the repository

Scrapping the data from each page of biocides listed on the BAUA website into a csv file

Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

Bigdata - This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster

An Automated udemy coupons scraper which scrapes coupons and autopost the result in blogspot post

feapder 是一款简单、快速、轻量级的爬虫框架。以开发快速、抓取快速、使用简单、功能强大为宗旨。支持分布式爬虫、批次爬虫、多模板爬虫，以及完善的爬虫报警机制。

Python web scrapper

Snowflake database loading utility with Scrapy integration

WebScraping - Scrapes Job website for python developer jobs and exports the data to a csv file

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster

Automated Linkedin bot that will improve your visibility and increase your network.

A package designed to scrape data from Yahoo Finance.

Find thumbnails and original images from URL or HTML file.