Airflow Operator for running Soda SQL scans

Overview

Soda SQL Airflow Operator

Airflow Operator for running Soda SQL scans

Example Usage

# src/soda/scans/my_scan.yml
# Note that the Airflow rendered templates are accessible (e.g. {{ params.client_id }})
table_name: tmp_{{ params.client_id }}_{{ ds_nodash }}
sql_metrics:
  - sql: |
      SELECT
        SUM(value1) AS staged_value1,
        SUM(value2) AS staged_value2
      FROM tmp_{{ params.client_id }}_{{ ds_nodash }}
  - sql: |
      SELECT
        SUM(value1) AS final_value1,
        SUM(value2) AS final_value2
      FROM final_table
      WHERE
        date = '{{ ds }}'
        AND client_id = {{ params.client_id }}
tests:
  - staged_value1 == final_value1
  - staged_value2 > final_value2
# my_airflow_dag.py
from pathlib import Path
from soda_util import build_soda_warehouse, convert_templated_yml_to_dict

SODA_PATH = Path(os.getenv("PYTHON_PATH", "/code/src")) / "/soda/scans/"  # Matches where my_scan.yml is saved

validate_staged_data = SodaSqlOperator(
    task_id="validate_staged_data",
    warehouse=build_soda_warehouse("warehouse_name", "database_name"),  # Could also pass a file path to a yml file
    scan=convert_templated_yml_to_dict(SODA_PATH, "my_scan.yml"),
    params={"client_id": 12345},  # Params are rendered by Airflow and accessible in the yaml file
)

Notes

  • Unlike Soda itself, a builder pattern is not used to define the warehouse and scan argument. Rather, the warehouse and scan parameters are instance checked and the relevant Soda methods are set. This provides a much simpler API, where we can just pass in the args to the Operator
  • As we are passing over all rendering of Jinga templates to Airflow, the native Soda templates are not accessible. So always use Airflow templates
  • Soft failures (i.e. the Airflow task doesn't fail, it just alerts) have been implemented, but alerting of soft failures has not. So soft failures will essentially just mean the Airflow task passes. Alerting to be implemented
Owner
Todd de Quincey
Data Engineer, Chartered Accountant and all round nice guy (or so I like to think). I believe that quality, simplicity and focus are the keys to success
Todd de Quincey
Module for remote in-memory Python package/module loading through HTTP/S

httpimport Python's missing feature! The feature has been suggested in Python Mailing List Remote, in-memory Python package/module importing through H

John Torakis 220 Dec 17, 2022
Objetivo: de forma colaborativa pasar de nodos de Dynamo a Python.

ITTI_Ed01_De-nodos-a-python ITTI. EXPERT TRAINING EN AUTOMATIZACIÓN DE PROCESOS BIM: OFFICIAL DE AUTODESK. Edición 1 Enlace al Master Enunciado: Traba

1 Jun 06, 2022
Expose multicam options in the Blender VSE headers.

Multicam Expose multicam options in the Blender VSE headers. Install Download space_sequencer.py and swap it with the one that comes with the Blender

4 Feb 27, 2022
Project 2 for Microsoft Azure on WUT

azure-proj2 Project 2 for Microsoft Azure on WUT Table of contents Team Tematyka projektu Architektura Opis rozwiązania Demo dzałania The Team Krzyszt

1 Dec 07, 2021
py-js: python3 objects for max

Simple (and extensible) python3 externals for MaxMSP

Shakeeb Alireza 39 Nov 20, 2022
Awesome Casino is simple offline casino made on python.

Awesome-Casino Awesome Casino is simple offline casino made on python. I found bug, what can i do? If you find any bug or want to suggest any idea, al

Herman 1 Feb 04, 2022
The git for the Python Story Utility Package library.

PSUP, The Python Story Utility Package Module. PSUP helps making stories or games with options, diverging paths, different endings and so on. You can

Enoki 6 Nov 27, 2022
Buffer overflow example for python

Buffer overflow example for python

Mehmet 1 Jan 04, 2022
Download and archive entire usenet newsgroups over NNTP.

Usenet Archiving Tool This code is for archiving Usenet discussions, not downloading files. Newsgroup posts are saved under the authors name and email

Corey White 2 Dec 23, 2021
This python code will get requests from SET (The Stock Exchange of Thailand) a previously-close stock price and return it in Thai Baht currency using beautiful soup 4 HTML scrapper.

This python code will get requests from SET (The Stock Exchange of Thailand) a previously-close stock price and return it in Thai Baht currency using beautiful soup 4 HTML scrapper.

Andre 1 Oct 24, 2022
This application is made solely for entertainment purposes

Timepass This application is made solely for entertainment purposes helps you find things to do when you're bored ! tells jokes guaranteed to bring on

Omkar Pramod Hankare 2 Nov 24, 2021
Virtual webcam that takes real webcam footage and replaces the background in order to have Virtual Backgrounds in MS Teams for Linux where the feature is unimplemented.

Background Remover The Need It's been good long while since Microsoft first released a Teams version for Linux and yet, one of Teams' coolest features

Dylan Turner 80 Dec 20, 2022
BridgeWalk is a partially-observed reinforcement learning environment with dynamics of varying stochasticity.

BridgeWalk is a partially-observed reinforcement learning environment with dynamics of varying stochasticity. The player needs to walk along a bridge to reach a goal location. When the player walks o

Danijar Hafner 6 Jun 13, 2022
APRS Track Direct is a collection of tools that can be used to run an APRS website

APRS Track Direct APRS Track Direct is a collection of tools that can be used to run an APRS website. You can use data from APRS-IS, CWOP-IS, OGN, HUB

Per Qvarforth 42 Dec 29, 2022
El_Binario - A converter for Binary, Decimal, Hexadecimal and Octal numbers

El_Binario El_Binario es un conversor de números Binarios, Decimales, Hexadecima

2 Jan 28, 2022
A simple but complete exercise to learning Python

ResourceReservationProject This is a simple but complete exercise to learning Python. Task and flow chart We are going to do a new fork of the existin

2 Nov 14, 2022
bamboo-engine 是一个通用的流程引擎,他可以解析,执行,调度由用户创建的流程任务,并提供了如暂停,撤销,跳过,强制失败,重试和重入等等灵活的控制能力和并行、子流程等进阶特性,并可通过水平扩展来进一步提升任务的并发处理能力。

bamboo-engine 是一个通用的流程引擎,他可以解析,执行,调度由用户创建的流程任务,并提供了如暂停,撤销,跳过,强制失败,重试和重入等等灵活的控制能力和并行、子流程等进阶特性,并可通过水平扩展来进一步提升任务的并发处理能力。 整体设计 Quick start 1. 安装依赖 2. 项目初始

腾讯蓝鲸 96 Dec 15, 2022
HairCLIP: Design Your Hair by Text and Reference Image

Overview This repository hosts the official PyTorch implementation of the paper: "HairCLIP: Design Your Hair by Text and Reference Image". Our single

322 Dec 30, 2022
The Playwright Workshop for TAU: The Homecoming

tau-playwright-workshop This repository contains the instructions and example code for the Playwright workshop for TAU: The Homecoming on December 1,

Pandy Knight 134 Dec 30, 2022
A Unified Framework for Hydrology

Unified Framework for Hydrology The Python package unifhy (Unified Framework for Hydrology) is a hydrological modelling framework which combines inter

Unified Framefork for Hydrology - Community Organisation 6 Jan 01, 2023