PyTorch Kafka Dataset: A definition of a dataset to get training data from Kafka.

Last update: Aug 01, 2022

Related tags

Overview

PyTorch Kafka Dataset: A definition of a dataset to get training data from Kafka.

Objectives

The main objective of this library is to take training data from Kafka to create a PyTorch Dataset. This is useful when we have data distributed in Kafka and we want to train a model with this framework. The structure of data messages in Kafka should be key:value, where key is the label and value the input.

Usage

To use this library, you just have to create a TrainingKafkaDataset with a ControlMessage, boostrapServers, and a group_id. Once the object has been created and the data has been obtained from Kafka, the object is usable as a normal PyTorch Dataset, being for example, iterable with a DataLoader.

ControlMessage is a dictionary, which principal keys are topic and input_config.

In topic, you have to proportionate a comma-separated string with the different topic, partition, start and end offset (those values separated with double dots, like in Kafka). In input_config, you have to indicate the reshapes of the data fetched from Kafka, this is because Kafka works in bytes, and its needed to decode back the inputs of our model.

boostrap_servers and group_id are common parameters used in KafkaConsumers. This parameters are given directly to the KafkaConsumers inside the object.

Here you have an example of creating a TrainingKafkaDataset:

kafkaControlMessage = {'topic': 'pytorch_mnist_test:0:0:20000,pytorch:0:20000:50000,pytorch_mnist_test:0:120000:140000',
                'input_config': {'data_type': 'uint8', 
                                 'label_type': 'uint8', 
                                 'data_reshape': '28 28', 
                                 'label_reshape': ''}, 
                }
bootstrap_server = ["localhost:9094"]
group_id = 'sink'
df = TrainingKafkaDataset(kafkaControlMessage, bootstrap_server, group_id, ToTensor())

Examples

There is a folder with full example of Data Fetching and training of a model, specifically with MNIST dataset.

PyTorch Kafka Dataset: A definition of a dataset to get training data from Kafka.

Related tags

Overview

PyTorch Kafka Dataset: A definition of a dataset to get training data from Kafka.

Objectives

Usage

Examples

Owner

ERTIS Research Group

A Simple Framwork for CV Pre-training Model (SOCO, VirTex, BEiT)

Local Multi-Head Channel Self-Attention for FER2013

Official Matlab Implementation for "Tiny Obstacle Discovery by Occlusion-aware Multilayer Regression", TIP 2020

Predicting Auction Sale Price using the kaggle bulldozer auction sales data: Modeling with Ensembles vs Neural Network

Prototypical Networks for Few shot Learning in PyTorch

Network Compression via Central Filter

QT Py Media Knob using rotary encoder & neopixel ring

A 3D Dense mapping backend library of SLAM based on taichi-Lang designed for the aerial swarm.

A Collection of LiDAR-Camera-Calibration Papers, Toolboxes and Notes

OpenVINO黑客松比赛项目

Source code for The Power of Many: A Physarum Swarm Steiner Tree Algorithm

Code for BMVC2021 paper "Boundary Guided Context Aggregation for Semantic Segmentation"

A new benchmark for Icon Question Answering (IconQA) and a large-scale icon dataset Icon645.

Parameterising Simulated Annealing for the Travelling Salesman Problem

Compute descriptors for 3D point cloud registration using a multi scale sparse voxel architecture

Dynamic Graph Event Detection

Convert Python 3 code to CUDA code.

Image Fusion Transformer

The sixth place winning solution (6/220) in 2021 Gaofen Challenge.

This package contains deep learning models and related scripts for RoseTTAFold