Newsemble is an API that provides easy access to the current news for programmatic analysis

Overview

📰 Newsemble 📰


Logo
An API for fetching the current news.

python   Flask   MongoDB  Heroku

GitHub release Visits Badge Stars Badge Fork Badge Github all releases watchers Badge

🔖 About 🔖


Blog Post

Newsemble is an API that provides easy access to the current news for programmatic analysis. It has been built using Python, BeautifulSoup and MongoDB.
The data is scraped from these news websites every hour, stored in a database on the cloud and whenever requested, the most recent articles are promptly served.
Developers can make use of this API to fetch current data with each article having the following fields:
Headlines, Content, Source, Link and Time.



🗒️ Table of contents

💻 Technologies

Newsemble is created with:

  • Python 3
  • Flask
  • PyMongo
  • BeautifulSoup

📂 File Structure and Description

  • app.py - Flask code for the API
  • scraper.py - Collection of scrapers for the various news sites.
  • db.py - Connecting and Using MongoDB
  • utils.py - Utility Functions
  • scheduler.py - Scheduler
  • Procfile - For Deployment
  • requirements.txt - Python Requirments

🛠️ Pipeline

Newsemble pipeline

🚀 Getting-started

This project can be accessed by using following setup

Links

Links Description
www.newsemble.ml/news Link to fetch all the data from all sources
www.newsemble.ml/news/toi Link to fetch data from Times of India
www.newsemble.ml/news/th Link to fetch data from The Hindu
www.newsemble.ml/news/tie Link to fetch data from The Indian Express
www.newsemble.ml/news/ndtv Link to fetch data from NDTV news
www.newsemble.ml/news/it Link to fetch data from India Today

Request format

$ import requests
$ url = "http://www.newsemble.ml/news/"
$ requests.get(url).json()

Response format

{   
    ‘link’      :  $source_link$,
    ‘content’   :  $content_text$,    
    ‘source’    :  $news_source$,
    ‘title’     :  $headline$, 
    ‘time       :  $date_time_of_article$  
 }

Sample output

image

⚙️ Currently Supported Sites



🙏 Thanks!

All contributions are welcome and appreciated. 👍
If you liked this project, or found it useful in any way, please drop a 🌟 !

✍️ Authors ✍️

✒️ Rishabh Gupta
✒️ Vishal Singhania
✒️ Roshan Kumar

You might also like...
music downloader written in python.   (Uses jiosaavn API)
music downloader written in python. (Uses jiosaavn API)

music downloader written in python. (Uses jiosaavn API)

Mobile based API for Crunchyroll BETA (and Downloader).

Mobile based API for Crunchyroll BETA (and Downloader). Not restricted on servers and NO CLOUDFLARE

Pypixiv - A fully-typed, asynchronous api wrapper for pixiv

pypixiv this library is a fully-typed, asynchronous api wrapper for pixiv. featu

This project is helps to download contents from Streamtape by utilizing the API

It scrapes Streamtape api and download contents from the site.

A python script that discovers hidden YouTube API clients. Just a research project.

YouTube-Internal-Clients A script that discovers hidden internal clients of the YouTube (Innertube) API using bruteforce methods. The script tries cli

Simple Python script to download images and videos from public subreddits without using Reddit's API 😎
Simple Python script to download images and videos from public subreddits without using Reddit's API 😎

Subreddit Media Downloader Download images and videos from any public subreddit without using Reddit's API Made with ❤ by Nico 💬 About: This script a

This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular intervals.It sends out the most recent news at random!

Nepali-news-notifier This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular in

News-app - This is a news web app for reading news from different sources and topics
News-app - This is a news web app for reading news from different sources and topics

News-app - This is a news web app for reading news from different sources and topics

Nasdaq Cloud Data Service (NCDS) provides a modern and efficient method of delivery for realtime exchange data and other financial information. This repository provides an SDK for developing applications to access the NCDS.

Nasdaq Cloud Data Service (NCDS) Nasdaq Cloud Data Service (NCDS) provides a modern and efficient method of delivery for realtime exchange data and ot

The windML framework provides an easy-to-use access to wind data sources within the Python world, building upon numpy, scipy, sklearn, and matplotlib. Renewable Wind Energy, Forecasting, Prediction

windml Build status : The importance of wind in smart grids with a large number of renewable energy resources is increasing. With the growing infrastr

Repo Home WPDrawBot - (Repo, Home, WP) A powerful programmatic 2D drawing application for MacOS X which generates graphics from Python scripts. (graphics, dev, mac)

DrawBot DrawBot is a powerful, free application for macOS that invites you to write Python scripts to generate two-dimensional graphics. The built-in

A curated list of programmatic weak supervision papers and resources
A curated list of programmatic weak supervision papers and resources

A curated list of programmatic weak supervision papers and resources

Learning trajectory representations using self-supervision and programmatic supervision.
Learning trajectory representations using self-supervision and programmatic supervision.

Trajectory Embedding for Behavior Analysis (TREBA) Implementation from the paper: Jennifer J. Sun, Ann Kennedy, Eric Zhan, David J. Anderson, Yisong Y

Programmatic interface to Synapse services for Python

A Python client for Sage Bionetworks' Synapse, a collaborative, open-source research platform that allows teams to share data, track analyses, and collaborate

Django API that scrapes and provides the last news of the city of Carlos Casares by semantic way (RDF format).

"Casares News" API Api that scrapes and provides the last news of the city of Carlos Casares by semantic way (RDF format). Usage Consume the articles

This python module can analyse cryptocurrency news for any number of coins given and return a sentiment. Can be easily integrated with a Trading bot to keep an eye on the news.

Python script that analyses news headline or body sentiment and returns the overall media sentiment of any given coin. It can take multiple coins an

NLP project that works with news (NER, context generation, news trend analytics)
NLP project that works with news (NER, context generation, news trend analytics)

СоАвтор СоАвтор – платформа и открытый набор инструментов для редакций и журналистов-фрилансеров, который призван сделать процесс создания контента ма

Your own movie streaming service. Easy to install, easy to use. Download, manage and watch your favorite movies conveniently from your browser or phone. Install it on your server, access it anywhere and enjoy.
Your own movie streaming service. Easy to install, easy to use. Download, manage and watch your favorite movies conveniently from your browser or phone. Install it on your server, access it anywhere and enjoy.

Vigilio Your own movie streaming service. Easy to install, easy to use. Download, manage and watch your favorite movies conveniently from your browser

💰 An Alfred Workflow that provides current price of cryptocurrency
💰 An Alfred Workflow that provides current price of cryptocurrency

Coin Ticker for Alfred Workflow An Alfred Workflow that provides current price and status about cryptocurrency from cryptocompare.com. Supports Alfred

Comments
  • Bump lxml from 4.5.0 to 4.6.5

    Bump lxml from 4.5.0 to 4.6.5

    Bumps lxml from 4.5.0 to 4.6.5.

    Changelog

    Sourced from lxml's changelog.

    4.6.5 (2021-12-12)

    Bugs fixed

    • A vulnerability (GHSL-2021-1038) in the HTML cleaner allowed sneaking script content through SVG images (CVE-2021-43818).

    • A vulnerability (GHSL-2021-1037) in the HTML cleaner allowed sneaking script content through CSS imports and other crafted constructs (CVE-2021-43818).

    4.6.4 (2021-11-01)

    Features added

    • GH#317: A new property system_url was added to DTD entities. Patch by Thirdegree.

    • GH#314: The STATIC_* variables in setup.py can now be passed via env vars. Patch by Isaac Jurado.

    4.6.3 (2021-03-21)

    Bugs fixed

    • A vulnerability (CVE-2021-28957) was discovered in the HTML Cleaner by Kevin Chung, which allowed JavaScript to pass through. The cleaner now removes the HTML5 formaction attribute.

    4.6.2 (2020-11-26)

    Bugs fixed

    • A vulnerability (CVE-2020-27783) was discovered in the HTML Cleaner by Yaniv Nizry, which allowed JavaScript to pass through. The cleaner now removes more sneaky "style" content.

    4.6.1 (2020-10-18)

    ... (truncated)

    Commits
    • a9611ba Fix a test in Py2.
    • a3eacbc Prepare release of 4.6.5.
    • b7ea687 Update changelog.
    • 69a7473 Cleaner: cover some more cases where scripts could sneak through in specially...
    • 54d2985 Fix condition in test decorator.
    • 4b220b5 Use the non-depcrecated TextTestResult instead of _TextTestResult (GH-333)
    • d85c6de Exclude a test when using the macOS system libraries because it fails with li...
    • cd4bec9 Add macOS-M1 as wheel build platform.
    • fd0d471 Install automake and libtool in macOS build to be able to install the latest ...
    • f233023 Cleaner: Remove SVG image data URLs since they can embed script content.
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
Releases(v1.0)
Owner
Rishabh
CSE Undergrad at IIIT-D.
Rishabh
A tool to download program information from Bugcrowd, for use by researchers to compare programs they are eligible to participate in

Description bcstats is a tool which allows Bugcrowd researchers to download information about all accessible programs (public and private) into a sing

19 Oct 13, 2022
Storing, versioning, and downloading files from S3 made as easy as using open() in Python. Caching included.

open(LARGE) Storing, versioning, and downloading files from S3 made as easy as using open() in Python. Caching included. Motivation Oftentimes, especi

András Schmelczer 2 Jan 30, 2022
Tool To download Amazon 4k SDR HDR 1080, CDM IS Not Included

WV-AMZN-4K-RIPPER Tool To download Amazon 4k SDR HDR 1080, CDM IS Not Included For CDM You can Mail :- Denis Trunov 179 Dec 17, 2022

AI Dungeon Catalog Archive Toolkit

AI Dungeon Content Archive Toolkit (AID CAT) AID CAT is a command-line utility that will allow you to download JSON backups of: Your private and publi

Mimi 31 Oct 26, 2022
Aline file downloader automator!

AlineDorker Aline is used for donwloading files with google dorking , dowloading specified links such as dorks. Dependences: python3 installed pip ins

27 Nov 16, 2022
Downloads .ksy files and their dependencies straight from the official kaitai-struct format gallery.

ksy-dl Downloads .ksy files and their dependencies straight from the official kaitai-struct format gallery. This tool will: Fetch any of the official

3 Jun 20, 2022
Animoo - Python scraper made with BeautifulSoup4 that scrapes images from /c/.

Animoo - Python scraper made with BeautifulSoup4 that scrapes images from /c/. Features Scrapes 10 pages Scrapes each thread Downloads all the images

aether 1 Dec 29, 2021
Download all games from a public Itch.io Game Jam

Itch Jam Downloader Downloads all games from a public Itch.io Game Jam. What you'll need: Python 3.8+ pip install -r requirements.txt For site mirrori

Dragoon Aethis 19 Dec 07, 2022
Youtube Downloader Telegram Bot 😉

Youtube Dl bot 😉 Prerequisite ffmpeg install dependencies pip3 install -r requirements.txt Setup Bot - Change configuration config.py File - insta

Aryan Vikash 285 Dec 06, 2022
New York Times Front Page Downloader.

TIMETRAVELER New York Times Front Page Downloader. Usage python3 timetraveler.py All data will be saved at ~/timetraveler/ Goals To keep a historica

Daeshon Jones 0 Oct 31, 2021
pubmex.py - a script to get a fancy paper title based on given DOI or PMID

pubmex.py is a script to get a fancy paper title based on given DOI or PMID (can be also combined with macOS Finder)

Marcin Magnus 13 Nov 20, 2022
Stremio addon for fetching videos from your google drive.

stremio-gdrive Instructions: There are two ways to go about: Method 1 is hard and long but might give you better performance and you need to make your

72 Dec 31, 2022
Audio/Video downloader

youtubeDownloader Audio/Video downloader • The project downloads audio/video/both after link is entered • It also shows total size of the file, time l

Tulsi Thakur 1 Nov 16, 2021
A Python script that allows you to download all of an anime's episodes at once.

BitAnime A Python script that allows you to download all of an anime's episodes at once. · Download executable version · About BitAnime BitAnime is a

sh1nobu 17 Aug 10, 2022
🔥 A Bot To Telegram For Download High Qulity Videos & Songs From Youtube

🔥 A Bot To Telegram For Download High Qulity Videos & Songs From Youtube 🎗 Fast And Free Bot No Need To Pay ✅ By SL-Alpha-X-Team ⚡

Official Alpha-X-Team Account 7 Aug 31, 2022
A script that downloads YouTube videos/audio

YouTube-Downloader A script that downloads YouTube videos/audio from youtube. Usage Download the script by executing the following in your terminal :

Debayan Sarkar 2 Jan 04, 2022
Arxiv2Kindle is a simple script written in python that converts LaTeX source downloaded from Arxiv and recompiles it to better fit a Kindle or other similar reading devices.

Arxiv2Kindle is a simple script written in python that converts LaTeX source downloaded from Arxiv and recompiles it to better fit a read

Soumik Rakshit 8 Jul 09, 2022
Music and video downloader, Made with love by Bryan Herrera

Python-Mp3Mp4-Downloader Music and video downloader, Made with love by Bryan Herrera Requirements CHOCOLATELY windows command If your system does not

ርᚱ1ናተᛰ ᚻህᚥተპᚱ 104 Dec 27, 2022
Python module to download all media from a GoFile gallery.

GoFile Downloader Setup First of all, clone this repository : ~$ git clone https://github.com/quatrecentquatre-404/gofile-downloader Second, oh wait..

Quatrecentquatre 61 Jan 01, 2023
YT-Downloader is a Tool to download youtube video.

YT-Downloader YT-Downloader is a Tool to download youtube video.If you are looking for a simple video downloader tool Than This YT-Downloader may be u

Pradip Thapa 7 May 11, 2022