Scraping news from Ucsal portal with Scrapy.

Last update: Sep 30, 2021

Overview

NewsScraping

Esse é um projeto de raspagem das últimas noticias, de 2021, do portal da universidade Ucsal http://noosfero.ucsal.br/institucional

Tecnologias Utilizadas:

Com Framework Scrapy

Dados Extraidos

O projeto conta com um único spider que extrai titulo, data e o link de cada notícia e disponibiliza os dados em um arquivo, no formato json.

Exemplo de dado extraido:

{

"title": "INSCRIÇÕES ABERTAS PARA O PROGRAMA DE MONITORIA SOLIDÁRIA DA GRADUAÇÃO 2021.2",
"date": "18 de Agosto de 2021, 18:34",
"link": "http://noosfero.ucsal.br/institucional/noticias/inscricoes-abertas-para-o-programa-de-monitoria-solidaria-da-graduacao-2021.2"

}

Rodar o spider:

Entre no diretorio do arquivo:

  cd crawler/crawler/spiders

Execute o comando:

  scrapy crawl noticias

Owner

Crissiano Pires

Software engineer student - Ucsal

GitHub Repository

优化版本的京东茅台抢购神器

1.8k Mar 18, 2022

Scrape all the media from an OnlyFans account - Updated regularly

3.2k Dec 29, 2022

Python script that reads Aliexpress offers urls from a Excel filename (.csv) and post then in a Telegram channel using a bot

Aliexpress to telegram post Python script that reads Aliexpress offers urls from a Excel filename (.csv) and post then in a Telegram channel using a b

6 Dec 06, 2022

Webservice wrapper for hhursev/recipe-scrapers (python library to scrape recipes from websites)

recipe-scrapers-webservice This is a wrapper for hhursev/recipe-scrapers which provides the api as a webservice, to be consumed as a microservice by o

1 Jul 09, 2022

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.

1.1k Jan 06, 2023

Deep Web Miner Python | Spyder Crawler

Webcrawler written in Python. This crawler does dig in till the 3 level of inside addressed and mine the respective data accordingly

17 Jan 24, 2022

Subscrape - A Python scraper for substrate chains

subscrape A Python scraper for substrate chains that uses Subscan. Usage copy co

14 Dec 15, 2022

Scraping web pages to get data

Scraping Data Get public data and save in database This is project use Python How to run a project 1 - Clone the repository 2 - Install beautifulsoup4

2 Nov 01, 2021

12306抢票脚本

457 Jan 05, 2023

A simple python web scraper.

Dissec A simple python web scraper. It gets a website and its contents and parses them with the help of bs4. Installation To install the requirements,

11 May 06, 2022

此脚本为 python 脚本,实现原理为利用 selenium 定位相关元素,再配合点击事件完成浏览器的自动化.

5 Nov 19, 2021

a small library for extracting rich content from urls

A small library for extracting rich content from urls. what does it do? micawber supplies a few methods for retrieving rich metadata about a variety o

588 Dec 27, 2022

A multithreaded tool for searching and downloading images from popular search engines. It is straightforward to set up and run!

🕳️ CygnusX1 Code by Trong-Dat Ngo. Overviews 🕳️ CygnusX1 is a multithreaded tool 🛠️ , used to search and download images from popular search engine

32 Dec 31, 2022