Stream-Kafka-ELK-Stack - Weather data streaming using Apache Kafka and Elastic Stack.

Last update: Jan 20, 2022

Overview

Streaming Data Pipeline - Kafka + ELK Stack

Streaming weather data using Apache Kafka and Elastic Stack.

Data source: https://openweathermap.org/api

Objectives: Develop a streaming data pipeline to extract weather data from OpenWeather API using Apache Kafka, Logstash, Elasticserach and Kibana (Kafka + ELK Stack).

To summarize, Python was used to develop a Kakfa producer that requests weather data from OpenWeather API every minute and sends it as a message to Apache Kafka. Logstash, as a Kafka consumer, consumes the data and stores it into Elasticsearch. Kibana uses the data from Elasticsearch to display the dashboard.

Kibana Weather Dashboard

Steps:

bash elk/start_elastic_docker.sh
bash kafka/start_kafka_docker.sh
Create a topic using kafka manager: localhost:9000

Logstash installed locally*

$LOGSTASH_HOME/bin/logstash -f $LOGSTASH_HOME/config/pipeline.conf

Before running Kafka Producer, is needed to set the API key inside the weather_api_key.ini file*

python3 weather_kfk_producer.py
Access Kibana: localhost:5601
Create an index pattern: must match with your index name inside pipeline.conf
Develop your dashboard.

Stream-Kafka-ELK-Stack - Weather data streaming using Apache Kafka and Elastic Stack.

Related tags

Overview

Streaming Data Pipeline - Kafka + ELK Stack

Kibana Weather Dashboard

Steps:

Owner

Felipe Demenech Vasconcelos

Weather analysis with Python, SQLite, SQLAlchemy, and Flask

Pipetools enables function composition similar to using Unix pipes.

Lale is a Python library for semi-automated data science.

A Python and R autograding solution

Programmatically access the physical and chemical properties of elements in modern periodic table.

Collections of pydantic models

ToeholdTools is a Python package and desktop app designed to facilitate analyzing and designing toehold switches, created as part of the 2021 iGEM competition.

Snakemake workflow for converting FASTQ files to self-contained CRAM files with maximum lossless compression.

Hydrogen (or other pure gas phase species) depressurization calculations

wikirepo is a Python package that provides a framework to easily source and leverage standardized Wikidata information

Business Intelligence (BI) in Python, OLAP

Intake is a lightweight package for finding, investigating, loading and disseminating data.

Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

PyPSA: Python for Power System Analysis

📊 Python Flask game that consolidates data from Nasdaq, allowing the user to practice buying and selling stocks.

PrimaryBid - Transform application Lifecycle Data and Design and ETL pipeline architecture for ingesting data from multiple sources to redshift

Provide a market analysis (R)

PCAfold is an open-source Python library for generating, analyzing and improving low-dimensional manifolds obtained via Principal Component Analysis (PCA).

Fit models to your data in Python with Sherpa.

Instant search for and access to many datasets in Pyspark.