rsscrape
Python script to extract news from RSS feeds and save it as json.
Usage
$ python3 rsscrape.py
[INFO] Found 51 in ' feeds.txt'
[INFO] Requests 51 XMLs content
[INFO] Scrape 10 items
[INFO] Write 1250 json files to ' ./items'
[INFO] 1648 json files in ' ./items'
Generates a directory items
with the results:
./items
0a1c2b2da6e40ab4e54b8247bbbc1422.json
fc8ddcf4cc0725bfa35564fb19e4a407.json
fe15bf1383c382101984ea4fdc6a33ae.json
...
Each json file correspondends to a single RSS item:
USA: Corona, war da was?
https://de.nachrichten.yahoo.com/usa-corona-war-135203870.html
2021-11-23T13:52:03Z
ZEIT ONLINE
usa-corona-war-135203870.html
" } ">
// f8b40f2bb091e41c53eb35528c433d7f.json
{
"title" : " USA: Corona, war da was?" ,
"link" : " https://de.nachrichten.yahoo.com/usa-corona-war-135203870.html" ,
"pubDate" : " 2021-11-23T13:52:03Z" ,
"source" : " ZEIT ONLINE" ,
"guid" : " usa-corona-war-135203870.html" ,
"raw" : "
-
\"
http://search.yahoo.com/mrss/\" >USA: Corona, war da was?
https://de.nachrichten.yahoo.com/usa-corona-war-135203870.html
2021-11-23T13:52:03Z
\"http://www.zeit.de/index\" >ZEIT ONLINE
\"false
\" >usa-corona-war-135203870.html
\"86
\" url=
\" https://s.yimg.com/uu/api/res/1.2/_rdWs7VS_33DY3PJWhkh6Q--~B/aD04MTA7dz0xNDQwO2FwcGlkPXl0YWNoeW9u/https://media.zenfs.com/de/zeit_921/2c35cfd59ae80f62a1ecb89623d2a47f
\" width=
\" 130
\" />
\"publishing company
\" />
"
}