Airflow Operator for running Soda SQL scans

Overview

Soda SQL Airflow Operator

Airflow Operator for running Soda SQL scans

Example Usage

# src/soda/scans/my_scan.yml
# Note that the Airflow rendered templates are accessible (e.g. {{ params.client_id }})
table_name: tmp_{{ params.client_id }}_{{ ds_nodash }}
sql_metrics:
  - sql: |
      SELECT
        SUM(value1) AS staged_value1,
        SUM(value2) AS staged_value2
      FROM tmp_{{ params.client_id }}_{{ ds_nodash }}
  - sql: |
      SELECT
        SUM(value1) AS final_value1,
        SUM(value2) AS final_value2
      FROM final_table
      WHERE
        date = '{{ ds }}'
        AND client_id = {{ params.client_id }}
tests:
  - staged_value1 == final_value1
  - staged_value2 > final_value2
# my_airflow_dag.py
from pathlib import Path
from soda_util import build_soda_warehouse, convert_templated_yml_to_dict

SODA_PATH = Path(os.getenv("PYTHON_PATH", "/code/src")) / "/soda/scans/"  # Matches where my_scan.yml is saved

validate_staged_data = SodaSqlOperator(
    task_id="validate_staged_data",
    warehouse=build_soda_warehouse("warehouse_name", "database_name"),  # Could also pass a file path to a yml file
    scan=convert_templated_yml_to_dict(SODA_PATH, "my_scan.yml"),
    params={"client_id": 12345},  # Params are rendered by Airflow and accessible in the yaml file
)

Notes

  • Unlike Soda itself, a builder pattern is not used to define the warehouse and scan argument. Rather, the warehouse and scan parameters are instance checked and the relevant Soda methods are set. This provides a much simpler API, where we can just pass in the args to the Operator
  • As we are passing over all rendering of Jinga templates to Airflow, the native Soda templates are not accessible. So always use Airflow templates
  • Soft failures (i.e. the Airflow task doesn't fail, it just alerts) have been implemented, but alerting of soft failures has not. So soft failures will essentially just mean the Airflow task passes. Alerting to be implemented
Owner
Todd de Quincey
Data Engineer, Chartered Accountant and all round nice guy (or so I like to think). I believe that quality, simplicity and focus are the keys to success
Todd de Quincey
Broken Link Finder is a Burp Extension to detect broken links for a passive scanning domains and links.

Broken Link Finder Broken Link Finder is a Burp Extension to detect broken links for a passive scanning domains and links. Inspired by InitRoot's link

Red Section 10 Sep 11, 2021
Open source book about making Python packages.

Python packages Tomas Beuzen & Tiffany Timbers Python packages are a core element of the Python programming language and are how you create organized,

Python Packages 169 Jan 06, 2023
Easily Generate Revolut Business Cards

RevBusinessCardGen Easily Generate Revolut Business Cards Prerequisites Before you begin, ensure you have met the following requirements: You have ins

Younes™ 35 Dec 14, 2022
Scientific Programming: A Crash Course

Scientific Programming: A Crash Course Welcome to the Scientific Programming course. My name is Jon Carr and I am a postdoc in Davide Crepaldi's lab.

Jon Carr 1 Feb 17, 2022
This is a Python package named - calculator

Calculator Python Package This is a Calculator Package of Python. How To Install The Package? Install calchundred with pip (Package Installer Of Pytho

Arinjoy_Programmer 1 Nov 21, 2021
[Cython] Vs [Python] Which one is Faster ?

[Cython] Vs [Python] ? Attractive Contrast :) Mission : Which one is Faster ? Comparing of Execution runtime for [Selection_sort] with Time Complexity

baqer marani 1 Dec 05, 2021
A StarkNet project template based on a Pythonic environment

StarkNet Project Template This is an opinionated StarkNet project template. It is based around the Python's ecosystem and best practices. tox to manag

Francesco Ceccon 5 Apr 21, 2022
UFDR2DIR - A script to convert a Cellebrite UFDR to the original file structure

UFDR2DIR A script to convert a Cellebrite UFDR to it's original file and directo

DFIRScience 25 Oct 24, 2022
A simple PID tuner and simulator.

PIDtuner-V0.1 PlantPy PID tuner version 0.1 Features Supports first order and ramp process models. Supports Proportional action on PV or error or a sp

3 Jun 23, 2022
Collection of Beginner to Intermediate level Python scripts contributed by members and participants.

Hacktoberfest2021-Python Hello there! This repository contains a 'Collection of Beginner to Intermediate level Python projects', created specially for

12 May 25, 2022
Automatically remove user join messages when the user leaves the server.

CleanLeave Automatically remove user join messages when the user leaves the server. Installation You will need to install poetry to run this bot local

11 Sep 19, 2022
You'll learn about Iterators, Generators, Closure, Decorators, Property, and RegEx in detail with examples.

07_Python_Advanced_Topics Introduction 👋 In this tutorial, you will learn about: Python Iterators: They are objects that can be iterated upon. In thi

Milaan Parmar / Милан пармар / _米兰 帕尔马 252 Dec 23, 2022
🌍💉 Global COVID-19 vaccination data at the regional level.

COVID-19 vaccination data at subnational level. To ensure its officiality, the source data is carefully verified.

sociepy 61 Sep 21, 2022
💡 Fully automatic light management based on conditions like motion, illuminance, humidity, and other clever features

Fully automatic light management based on motion as AppDaemon app. 🕓 multiple daytimes to define different scenes for morning, noon, ... 💡 supports

Ben 105 Dec 23, 2022
A pairs trade is a market neutral trading strategy enabling traders to profit from virtually any market conditions.

A pairs trade is a market neutral trading strategy enabling traders to profit from virtually any market conditions. This strategy is categorized as a statistical arbitrage and convergence trading str

Kanupriya Anand 13 Nov 27, 2022
A Red Team tool for exfiltrating sensitive data from Jira tickets.

Jir-thief This Module will connect to Jira's API using an access token, export to a word .doc, and download the Jira issues that the target has access

Antonio Piazza 82 Dec 12, 2022
NBT-Project: This is a APP for building NBT's

NBT-Project This is an APP for building NBT's When using this you select a box on kit maker You input the name and enchant in there related boxes Then

1 Jan 21, 2022
This synchronizes my appearances with my calendar

Josh's Schedule Synchronizer Here's the "problem:" I use a Google Sheets spreadsheet to maintain all my public appearances.

Developer Advocacy 2 Oct 18, 2021
Este script añade la config de s4vitar a bspwm automaticamente!

Se ha testeado este script en ParrotOS, Kali y Ubuntu. Funciona para todos los sistemas operativos basados en Debian. Instalación git clone https://gi

yorkox 201 Dec 30, 2022
It really seems like Trump is trying to get his own social media started. Not a huge fan tbh.

FuckTruthSocial It really seems like Trump is trying to get his own social media started. Not a huge fan tbh. (When TruthSocial actually releases, I'l

0 Jul 18, 2022