Newsemble is an API that provides easy access to the current news for programmatic analysis

Overview

📰 Newsemble 📰


Logo
An API for fetching the current news.

python   Flask   MongoDB  Heroku

GitHub release Visits Badge Stars Badge Fork Badge Github all releases watchers Badge

🔖 About 🔖


Blog Post

Newsemble is an API that provides easy access to the current news for programmatic analysis. It has been built using Python, BeautifulSoup and MongoDB.
The data is scraped from these news websites every hour, stored in a database on the cloud and whenever requested, the most recent articles are promptly served.
Developers can make use of this API to fetch current data with each article having the following fields:
Headlines, Content, Source, Link and Time.



🗒️ Table of contents

💻 Technologies

Newsemble is created with:

  • Python 3
  • Flask
  • PyMongo
  • BeautifulSoup

📂 File Structure and Description

  • app.py - Flask code for the API
  • scraper.py - Collection of scrapers for the various news sites.
  • db.py - Connecting and Using MongoDB
  • utils.py - Utility Functions
  • scheduler.py - Scheduler
  • Procfile - For Deployment
  • requirements.txt - Python Requirments

🛠️ Pipeline

Newsemble pipeline

🚀 Getting-started

This project can be accessed by using following setup

Links

Links Description
www.newsemble.ml/news Link to fetch all the data from all sources
www.newsemble.ml/news/toi Link to fetch data from Times of India
www.newsemble.ml/news/th Link to fetch data from The Hindu
www.newsemble.ml/news/tie Link to fetch data from The Indian Express
www.newsemble.ml/news/ndtv Link to fetch data from NDTV news
www.newsemble.ml/news/it Link to fetch data from India Today

Request format

$ import requests
$ url = "http://www.newsemble.ml/news/"
$ requests.get(url).json()

Response format

{   
    ‘link’      :  $source_link$,
    ‘content’   :  $content_text$,    
    ‘source’    :  $news_source$,
    ‘title’     :  $headline$, 
    ‘time       :  $date_time_of_article$  
 }

Sample output

image

⚙️ Currently Supported Sites



🙏 Thanks!

All contributions are welcome and appreciated. 👍
If you liked this project, or found it useful in any way, please drop a 🌟 !

✍️ Authors ✍️

✒️ Rishabh Gupta
✒️ Vishal Singhania
✒️ Roshan Kumar

You might also like...
music downloader written in python.   (Uses jiosaavn API)
music downloader written in python. (Uses jiosaavn API)

music downloader written in python. (Uses jiosaavn API)

Mobile based API for Crunchyroll BETA (and Downloader).

Mobile based API for Crunchyroll BETA (and Downloader). Not restricted on servers and NO CLOUDFLARE

Pypixiv - A fully-typed, asynchronous api wrapper for pixiv

pypixiv this library is a fully-typed, asynchronous api wrapper for pixiv. featu

This project is helps to download contents from Streamtape by utilizing the API

It scrapes Streamtape api and download contents from the site.

A python script that discovers hidden YouTube API clients. Just a research project.

YouTube-Internal-Clients A script that discovers hidden internal clients of the YouTube (Innertube) API using bruteforce methods. The script tries cli

Simple Python script to download images and videos from public subreddits without using Reddit's API 😎
Simple Python script to download images and videos from public subreddits without using Reddit's API 😎

Subreddit Media Downloader Download images and videos from any public subreddit without using Reddit's API Made with ❤ by Nico 💬 About: This script a

This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular intervals.It sends out the most recent news at random!

Nepali-news-notifier This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular in

News-app - This is a news web app for reading news from different sources and topics
News-app - This is a news web app for reading news from different sources and topics

News-app - This is a news web app for reading news from different sources and topics

Nasdaq Cloud Data Service (NCDS) provides a modern and efficient method of delivery for realtime exchange data and other financial information. This repository provides an SDK for developing applications to access the NCDS.

Nasdaq Cloud Data Service (NCDS) Nasdaq Cloud Data Service (NCDS) provides a modern and efficient method of delivery for realtime exchange data and ot

The windML framework provides an easy-to-use access to wind data sources within the Python world, building upon numpy, scipy, sklearn, and matplotlib. Renewable Wind Energy, Forecasting, Prediction

windml Build status : The importance of wind in smart grids with a large number of renewable energy resources is increasing. With the growing infrastr

Repo Home WPDrawBot - (Repo, Home, WP) A powerful programmatic 2D drawing application for MacOS X which generates graphics from Python scripts. (graphics, dev, mac)

DrawBot DrawBot is a powerful, free application for macOS that invites you to write Python scripts to generate two-dimensional graphics. The built-in

A curated list of programmatic weak supervision papers and resources
A curated list of programmatic weak supervision papers and resources

A curated list of programmatic weak supervision papers and resources

Learning trajectory representations using self-supervision and programmatic supervision.
Learning trajectory representations using self-supervision and programmatic supervision.

Trajectory Embedding for Behavior Analysis (TREBA) Implementation from the paper: Jennifer J. Sun, Ann Kennedy, Eric Zhan, David J. Anderson, Yisong Y

Programmatic interface to Synapse services for Python

A Python client for Sage Bionetworks' Synapse, a collaborative, open-source research platform that allows teams to share data, track analyses, and collaborate

Django API that scrapes and provides the last news of the city of Carlos Casares by semantic way (RDF format).

"Casares News" API Api that scrapes and provides the last news of the city of Carlos Casares by semantic way (RDF format). Usage Consume the articles

This python module can analyse cryptocurrency news for any number of coins given and return a sentiment. Can be easily integrated with a Trading bot to keep an eye on the news.

Python script that analyses news headline or body sentiment and returns the overall media sentiment of any given coin. It can take multiple coins an

NLP project that works with news (NER, context generation, news trend analytics)
NLP project that works with news (NER, context generation, news trend analytics)

СоАвтор СоАвтор – платформа и открытый набор инструментов для редакций и журналистов-фрилансеров, который призван сделать процесс создания контента ма

Your own movie streaming service. Easy to install, easy to use. Download, manage and watch your favorite movies conveniently from your browser or phone. Install it on your server, access it anywhere and enjoy.
Your own movie streaming service. Easy to install, easy to use. Download, manage and watch your favorite movies conveniently from your browser or phone. Install it on your server, access it anywhere and enjoy.

Vigilio Your own movie streaming service. Easy to install, easy to use. Download, manage and watch your favorite movies conveniently from your browser

💰 An Alfred Workflow that provides current price of cryptocurrency
💰 An Alfred Workflow that provides current price of cryptocurrency

Coin Ticker for Alfred Workflow An Alfred Workflow that provides current price and status about cryptocurrency from cryptocompare.com. Supports Alfred

Comments
  • Bump lxml from 4.5.0 to 4.6.5

    Bump lxml from 4.5.0 to 4.6.5

    Bumps lxml from 4.5.0 to 4.6.5.

    Changelog

    Sourced from lxml's changelog.

    4.6.5 (2021-12-12)

    Bugs fixed

    • A vulnerability (GHSL-2021-1038) in the HTML cleaner allowed sneaking script content through SVG images (CVE-2021-43818).

    • A vulnerability (GHSL-2021-1037) in the HTML cleaner allowed sneaking script content through CSS imports and other crafted constructs (CVE-2021-43818).

    4.6.4 (2021-11-01)

    Features added

    • GH#317: A new property system_url was added to DTD entities. Patch by Thirdegree.

    • GH#314: The STATIC_* variables in setup.py can now be passed via env vars. Patch by Isaac Jurado.

    4.6.3 (2021-03-21)

    Bugs fixed

    • A vulnerability (CVE-2021-28957) was discovered in the HTML Cleaner by Kevin Chung, which allowed JavaScript to pass through. The cleaner now removes the HTML5 formaction attribute.

    4.6.2 (2020-11-26)

    Bugs fixed

    • A vulnerability (CVE-2020-27783) was discovered in the HTML Cleaner by Yaniv Nizry, which allowed JavaScript to pass through. The cleaner now removes more sneaky "style" content.

    4.6.1 (2020-10-18)

    ... (truncated)

    Commits
    • a9611ba Fix a test in Py2.
    • a3eacbc Prepare release of 4.6.5.
    • b7ea687 Update changelog.
    • 69a7473 Cleaner: cover some more cases where scripts could sneak through in specially...
    • 54d2985 Fix condition in test decorator.
    • 4b220b5 Use the non-depcrecated TextTestResult instead of _TextTestResult (GH-333)
    • d85c6de Exclude a test when using the macOS system libraries because it fails with li...
    • cd4bec9 Add macOS-M1 as wheel build platform.
    • fd0d471 Install automake and libtool in macOS build to be able to install the latest ...
    • f233023 Cleaner: Remove SVG image data URLs since they can embed script content.
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
Releases(v1.0)
Owner
Rishabh
CSE Undergrad at IIIT-D.
Rishabh
Neon: an add-on for making it easier to handle component interactions

Neon Neon is an add-on for Lightbulb making it easier to handle component interactions. Installation pip install git+https://github.com/neonjonn/light

Neon Jonn 9 Apr 29, 2022
music downloader written in python. (Uses jiosaavn API)

music downloader written in python. (Uses jiosaavn API)

Rohn Chatterjee 35 Jul 20, 2022
apkizer is a mass downloader for android applications for all available versions.

apkizer apkizer collects all available versions of an Android application from apkpure.com Purpose Sometimes mobile applications can be useful to dig

Kamil Onur Özkaleli 41 Dec 16, 2022
🔥 A Bot To Telegram For Download High Qulity Videos & Songs From Youtube

🔥 A Bot To Telegram For Download High Qulity Videos & Songs From Youtube 🎗 Fast And Free Bot No Need To Pay ✅ By SL-Alpha-X-Team ⚡

Official Alpha-X-Team Account 7 Aug 31, 2022
Python script to download entire campaign images and navigation.

Squidle campaign downloader Python script to download entire campaign images and navigation. usage: squidle_campaign_downloader.py [-h] [--api-token A

Miquel Massot 2 Nov 17, 2021
The lyrics module of the repository apple-playlist-downloader

This is the lyrics module of the repository apple-playlist-downloader. With this code you can download the .lrc file (time synced lyrics) from yours t

Antoine Bollengier 6 Oct 07, 2022
Discord Nitro Generator + Checker

Discord Nitro Generator + Checker Usage Download the project files and run main.py You will be prompted with 2 questions the first one being the amoun

509 Jan 02, 2023
一个在新番更新后第一时间在dmhy等BT下载站自动下载的小工具.

Anime Track 一个在新番更新后第一时间自动下载的小工具. 可以根据自定义的关键字在dmhy等BT下载站在搜索结果更新时将磁力链发送至aria2实现自动下载. 基本功能包含: 将BT下载站的某个关键字的搜索结果的所有磁力链添加至ARIA2 自动更新aria2 trackers 将已添加的磁力

Sunky 24 Oct 12, 2022
Youtube Downloader is a Graphic User Interface(GUI) that lets users download a Youtube Video or Audio through a URL

Youtube Downloader This Python and Tkinter based GUI allows users to directly download the Best Resolution Videos and Audios from Youtube. Pa-fy Insta

Samarth Kumar 2 Jun 25, 2022
Audio/Video downloader

youtubeDownloader Audio/Video downloader • The project downloads audio/video/both after link is entered • It also shows total size of the file, time l

Tulsi Thakur 1 Nov 16, 2021
A toolkit to automatically crawl the paper list and download paper pdfs of ACL Ahthology.

ACL-Anthology-Crawler A toolkit to automatically crawl the paper list and download paper pdfs of ACL Anthology

Ray GG 9 Oct 09, 2022
A manga download script written in python.

manga-dlp python script to download mangas Description A manga download script written in python. It only supports mangadex.org for now. But support f

Ivan Schaller 15 Nov 28, 2022
Python based Telegram bot. Search and download YouTube video or audio.

Python-Telegram-Youtube-Media-Bot Python based Telegram bot. Search and download YouTube video or audio. Just change settings.py and start TelegramBot

Ahmet Bohur 2 Oct 02, 2022
A Python script that allows you to download all of an anime's episodes at once.

BitAnime A Python script that allows you to download all of an anime's episodes at once. · Download executable version · About BitAnime BitAnime is a

sh1nobu 17 Aug 10, 2022
Downloader Middleware to support Playwright in Scrapy & Gerapy

Gerapy Playwright This is a package for supporting Playwright in Scrapy, also this package is a module in Gerapy. Installation pip3 install gerapy-pla

Gerapy 85 Dec 31, 2022
Used Insta Loader to download high quality images from instagram account

Insta Dp Downloader Project Description: In this project, I have used "Insta Loader" to download high quality images from instagram account. You only

Hassan Shahzad 3 Oct 31, 2022
A Udemy downloader that can download DRM protected videos and non-DRM protected videos.

Udemy Downloader with DRM support NOTE This program is WIP, the code is provided as-is and i am not held resposible for any legal repercussions result

Puyodead1 468 Dec 29, 2022
Python Youtube Video-Playlist Downloader

Youtube-Video-Playlist-Downloader-PyQt5 You can download videos and playlists on YouTube with this script. Script has GUI. Enjoy. Setup git clone http

Yunus Emre Öztürk 2 Jun 06, 2022
python code used to download all images contained in a facebook uid , the uid can be profile,group,fanpage

python code used to download all images contained in a facebook uid , the uid can be profile,group,fanpage

VVHai 2 Dec 21, 2021
Quickly, simply, and asynchronously download NFT's from an Opensea collection

iRightClick Quickly, simply, and asynchronously download NFT's from an Opensea collection. NOTICE This tool is not developed to encourage or facilitat

Setro 34 Dec 30, 2022