whylogs Workshop

The code from the whylogs workshop in DataTalks.Club on 29 March 2022

whylogs - The open source standard for data logging (Don't forget to give it a star!)

Workshop

In this hands-on workshop, we’ll learn how to set up a system for monitoring your data pipelines, ensuring data quality and detecting changes in your data.

Without data monitoring, it’s impossible to guarantee to your stakeholders that the data that they are using for their analytics and machine learning use cases is trustworthy. By setting up a data observability system, you’ll be able to get visibility into the health of your data pipelines, thus building your customers’ trust in your work.

We’ll cover the following:

Introduction to data observability and monitoring
whylogs — the open source standard for data logging
How to monitor batch Python or Spark data pipelines with whylogs
How to monitor Kafka streaming pipelines with whylogs

By the end of this workshop, you’ll be able to set up such a system yourself.

Code

This repository contains files that are needed for the workshop:

ccloud_lib.py - file for connecting to confluent cloud
confluent_credentials.txt - template for configuration (put your credentials there - but don't commit them!)
producer.py - the code for putting events to Kafka
requirements.txt - all the dependencies for the workshop

Confluent cloud

For this workshop, you'll need

Account in Deepnote
Account in Confluent cloud (instructions)

The code from the whylogs workshop in DataTalks.Club on 29 March 2022

Related tags

Overview

whylogs Workshop

Workshop

Code

Confluent cloud

Owner

DataTalksClub

Arabic speech recognition, classification and text-to-speech.

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

GVT is a generic translation tool for parts of text on the PC screen with Text to Speak functionality.

Official PyTorch implementation of SegFormer

Pangu-Alpha for Transformers

Text Classification in Turkish Texts with Bert

Revisiting Pre-trained Models for Chinese Natural Language Processing (Findings of EMNLP 2020)

DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task

BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model

Unofficial PyTorch implementation of Google AI's VoiceFilter system

NeuralQA: A Usable Library for Question Answering on Large Datasets with BERT

Transformer training code for sequential tasks

DeepAmandine is an artificial intelligence that allows you to talk to it for hours, you won't know the difference.

Yodatranslator is a simple translator English to Yoda-language

NeuTex: Neural Texture Mapping for Volumetric Neural Rendering

NLP made easy

Toy example of an applied ML pipeline for me to experiment with MLOps tools.

SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples

BERT, LDA, and TFIDF based keyword extraction in Python

Large-scale pretraining for dialogue