:fishing_pole_and_fish: List of `pre-commit` hooks to ensure the quality of your `dbt` projects.

Overview

dbt-pre-commit

pre-commit-dbt

CI black black

List of pre-commit hooks to ensure the quality of your dbt projects.

BETA NOTICE: This tool is still BETA and may have some bugs, so please be forgiving!

Goal

Quick ensure the quality of your dbt projects.

dbt is awesome, but when a number of models, sources, and macros grow it starts to be challenging to maintain quality. People often forget to update columns in schema files, add descriptions, or test. Besides, with the growing number of objects, dbt slows down, users stop running models/tests (because they want to deploy the feature quickly), and the demands on reviews increase.

If this is the case, pre-commit-dbt is here to help you!

List of pre-commit-dbt hooks

đź’ˇ Click on hook name to view the details.

Model checks:

Script checks:

Source checks:

Modifiers:

dbt commands:


âť— If you have an idea for a new hook or you found a bug, let us know âť—

Install

For detailed installation and usage, instructions see pre-commit.com site.

pip install pre-commit

Setup

  1. Create a file named .pre-commit-config.yaml in your dbt root folder.
  2. Add list of hooks you want to run befor every commit. E.g.:
repos:
- repo: https://github.com/offbi/pre-commit-dbt
  rev: v0.1.1
  hooks:
  - id: check-script-semicolon
  - id: check-script-has-no-table-name
  - id: dbt-test
  - id: dbt-docs-generate
  - id: check-model-has-all-columns
    name: Check columns - core
    files: ^models/core
  - id: check-model-has-all-columns
    name: Check columns - mart
    files: ^models/mart
  - id: check-model-columns-have-desc
    files: ^models/mart
  1. Optionally, run pre-commit install to set up the git hook scripts. With this, pre-commit will run automatically on git commit! You can also manually run pre-commit run after you stage all files you want to run. Or pre-commit run --all-files to run the hooks against all of the files (not only staged).

Run as Github Action

Unfortunately, you cannot natively use pre-commit-dbt if you are using dbt Cloud. But you can run checks after you push changes into Github.

To do that, make a file .github/workflows/pre-commit.yml.

name: pre-commit

on:
 pull_request:
 push:
 branches: [main]

jobs:
 pre-commit:
 runs-on: ubuntu-latest
 steps:
 - uses: actions/[email protected]
 - uses: actions/[email protected]
 - uses: pre-commit/[email protected]

To run only changed files:

name: pre-commit

on:
 pull_request:
 push:
 branches: [main]

jobs:
 pre-commit:
 runs-on: ubuntu-latest
 steps:
 - uses: actions/[email protected]
 - uses: actions/[email protected]
 - id: file_changes
 uses: trilom/[email protected]
 with:
 output: ' '
 - uses: pre-commit/[email protected]
 with:
 extra_args: --files ${{ steps.file_changes.outputs.files}}

To be able to run modifiers you need to use only private repository and change your .github/workflows/pre-commit.yml to:

name: pre-commit

on:
 pull_request:
 push:
 branches: [main]

jobs:
 pre-commit:
 runs-on: ubuntu-latest
 steps:
 - uses: actions/[email protected]
 with:
 fetch-depth: 0
 - uses: actions/[email protected]
 - id: file_changes
 uses: trilom/[email protected]
 with:
 output: ' '
 - uses: pre-commit/[email protected]
 with:
 extra_args: --files ${{ steps.file_changes.outputs.files}}
 token: ${{ secrets.GITHUB_TOKEN }}

For more informations about pre-commit/action visit https://github.com/pre-commit/action.

Comments
  • add argument support to dbt commands (dbt clean & dbt deps)

    add argument support to dbt commands (dbt clean & dbt deps)

    This enhancement will allow the dbt-clean and dbt-deps hooks to take arguments for global and cmd flags. This addresses issue 39 by allowing the project-dir flag to be added to the dbt clean and dbt deps commands Example:

    • id: dbt-clean args: ["--cmd-flags", "++project-dir", "./transform/dbt"]
    opened by Aostojic 8
  • Docker image pull down image action fails

    Docker image pull down image action fails

    Describe the bug When hooking to the docker file, getting error invalid reference format when the build runs the step to pull the action image.

    Pull down action image 'offbi:pre-commit-dbt:v1.0.0' /usr/bin/docker pull offbi:pre-commit-dbt:v1.0.0 invalid reference format Warning: Docker pull failed with exit code 1, back off 3.275 seconds before retry.

    To Reproduce Steps to reproduce the behavior:

    1. add new file .github/workflows/pre-commit-dbt.yml:
    name: pre-commit
    
    on:
      pull_request:
        branches: [master]
    
    jobs:
      pre-commit:
        runs-on: ubuntu-latest
        steps:
        - uses: actions/[email protected]
        - uses: actions/[email protected]
        - id: file_changes
          uses: trilom/[email protected]
          with:
            output: ' '
        - uses: offbi/[email protected]
          env:
            DB_PASSWORD: ${{ secrets.SuperSecret }}
          with:
            args: run --files ${{ steps.file_changes.outputs.files}}
    
    1. add the following code to existing (or create new) .pre-commit-config.yaml
    - repo: https://github.com/offbi/pre-commit-dbt
      rev: v1.0.0
      hooks:
      - id: check-script-has-no-table-name
        files: ^dbt
    
    1. commit changes and create a pull request to trigger test build.

    Expected behavior Build test completes successfully.

    Version: v1.0.0 (dbt version 0.21.0)

    Additional context I think it's odd that the yml file clearly shows offbi/[email protected] for the step action, but when the docker pull command is passed the syntax is changed to offbi:pre-commit-dbt:v1.0.0 (notice colons in place of slash & @). Nowhere in the documentation does it list how to avoid this. Also note: The "test" completes successfully if the following block is removed (all other steps run correctly):

        - uses: offbi/[email protected]
          env:
            DB_PASSWORD: ${{ secrets.SuperSecret }}
          with:
            args: run --files ${{ steps.file_changes.outputs.files}}
    

    So it seems that it's this specific docker image causing the issue.

    bug 
    opened by grace-cityblock 6
  • add prj_root argument to dbt commands

    add prj_root argument to dbt commands

    This enhancement enables user to benefit from dbt-command hooks whenever their dbt project is not at the root level of the repo. It works by providing an additional argument to the dbt commands. Example: hooks: - id: dbt-docs-generate files: ^projects/ args: ["--prj-root","./common/dbt"] - id: dbt-deps files: ^projects/ args: ["--prj-root","./common/dbt"]

    In this example, the root of the dbt project is located inside the folder './commond/dbt'.

    Addresses issue https://github.com/offbi/pre-commit-dbt/issues/39

    opened by joaobernardopa 5
  • Unable to load manifest file ([Errno 2] No such file or directory: 'target/manifest.json')

    Unable to load manifest file ([Errno 2] No such file or directory: 'target/manifest.json')

    Describe the bug Any hook that tries to read target/manifest.json results in Unable to load manifest file ([Errno 2] No such file or directory: 'target/manifest.json') if dbt project directory (contains target/) != git project root.

    To Reproduce Steps to reproduce the behavior:

    1. Create the following project structure
    my_project/
    ├── .git/
    └── dbt_project/
         ├── dbt_project.yml
         ├── target/
         ├── models/
         └── ...
    
    1. Run any hook that requires the manifest

    Expected behavior I guess the current behavior is expected but there could be an option to specify the dbt project directory.

    Workaround: cp -r dbt_project/target target

    Version: v1.0.0

    bug 
    opened by stumelius 5
  • Cannot use the action in a workflow

    Cannot use the action in a workflow

    Describe the bug Hi there, first of all thanks for the efforts in developing this library. We are trying to include this step in our workflow, but we are facing an issue related to the docker image. It cannot pull it:

    Screenshot 2021-04-23 at 10 31 04

    Might the issue be related to the actual image defined in action.yaml ?

    To Reproduce Steps to reproduce the behavior:

    1. follow either the readme example or the snippet as reported in the marketplace

    Example configuration:

         - id: dbt_checks
            uses: offbi/[email protected]
            env:
              [ ... ]
            with:
              args: run --files ${{ steps.file_changes.outputs.files }}
    

    Version: v1.0.0

    bug 
    opened by Sh1n 5
  • check-column-desc-are-same fails with a Python error

    check-column-desc-are-same fails with a Python error

    Describe the bug When trying to use the check-column-desc-are-same hook, I'm getting the following Python error. Other hooks I've tried so far are working:

    Check column descriptions are same.......................................Failed
    - hook id: check-column-desc-are-same
    - exit code: 1
    
    Traceback (most recent call last):
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/bin/check-column-desc-are-same", line 8, in <module>
        sys.exit(main())
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/lib/python3.7/site-packages/pre_commit_dbt/check_column_desc_are_same.py", line 80, in main
        return check_column_desc(paths=args.filenames, ignore=args.ignore)
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/lib/python3.7/site-packages/pre_commit_dbt/check_column_desc_are_same.py", line 55, in check_column_desc
        grouped = get_grouped(paths, ignore)
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/lib/python3.7/site-packages/pre_commit_dbt/check_column_desc_are_same.py", line 48, in get_grouped
        sorted(columns, key=lambda x: x.column_name), lambda x: x.column_name
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/lib/python3.7/site-packages/pre_commit_dbt/check_column_desc_are_same.py", line 29, in get_all_columns
        for item in schemas:
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/lib/python3.7/site-packages/pre_commit_dbt/utils.py", line 134, in get_model_schemas
        model_name = model.get("name")
    AttributeError: 'str' object has no attribute 'get'
    

    Version: v0.1.1 Python 3.7.9

    bug 
    opened by MartinGuindon 5
  • check-script-has-no-table-name is failing when using lateral flatten

    check-script-has-no-table-name is failing when using lateral flatten

    Describe the bug See updated description of the bug.

    ~~The check-script-has-no-table-name pre-commit hook is confusing CTEs with tables, and fails with code like this:~~

    with source as (
    
        select * from {{ source('stripe', 'payments') }}
    
    ),
    
    renamed as (
    
        select
            id as payment_id,
            order_id,
            payment_method,
    
            --`amount` is currently stored in cents, so we convert it to dollars
            amount / 100 as amount
    
        from source
    
    )
    
    select * from renamed
    

    ~~It reports that "source" and "renamed" are tables even though they are not, even though it looks the same from a code perspective.~~

    ~~I think this hook should perhaps fail only at the presence of schema.table or database.schema.table references, unless we can make this hook smarter by being aware of the CTEs defined in the model.~~

    Version: v0.1.1

    bug 
    opened by MartinGuindon 5
  • Fix Github Action docker reference

    Fix Github Action docker reference

    According to Github Actions documentation, we should reference the docker image directly from the project: https://docs.github.com/pt/actions/creating-actions/creating-a-docker-container-action#creating-an-action-metadata-file

    This fixes: https://github.com/offbi/pre-commit-dbt/issues/26

    opened by tlfbrito 3
  • Add check-model-name-contract hook

    Add check-model-name-contract hook

    There's now a check-model-name-contract hook (similar to check-model-name-contract) that is used to ensure that models follow naming convention.

    To-do before merge

    • [x] Write unit tests for the hook
    opened by stumelius 3
  • check-model-has-properties-file fails on macro with a valid properties yml

    check-model-has-properties-file fails on macro with a valid properties yml

    Describe the bug When running the test check-model-has-properties-file with a macro, the test fails with the following error.

    Check the model has properties file......................................Failed
    - hook id: check-model-has-properties-file
    - exit code: 1
    
    macros/grant_select_on_schemas.sql: does not have model properties defined in any .yml file.
    

    The .pre-commit-config.yaml includes the rule:

    repos:
    - repo: https://github.com/offbi/pre-commit-dbt
      rev: 607cb07a1918442f5963662a9aa19da8984931e6
      hooks:
      - id: check-model-has-properties-file
    

    And the macro has the following .yml file (the filename is the same as the macro name and is stored within the macros folder):

    
    macros:
      - name: grant_select_on_schemas
        description: "Grants privileges to groups after dbt run"
        docs:
          show: false
    

    Hope we can get this fixed soon as this is a really useful test to include

    bug 
    opened by andrewlee-trouva 3
  • wip: Added hook: check_model_has_tests_by_group

    wip: Added hook: check_model_has_tests_by_group

    This PR adds a hook that checks if a model has a sufficient number of tests pulled out of a group of acceptable tests, e.g. this model has 1 of unique, unique_where, or unique_threshold.

    opened by jtalmi 3
  • `check_macro_arguments_have_desc` hook fails to parse arguments

    `check_macro_arguments_have_desc` hook fails to parse arguments

    Describe the bug

    check_macro_arguments_have_desc hook raises the following error even though the content of the files is ok. Other hooks are working correctly, including check_macro_has_description.

    Traceback (most recent call last):
      File "/home/mache/.cache/pre-commit/repo0ldja9vt/py_env-python3/bin/check-macro-arguments-have-desc", line 8, in <module>
        sys.exit(main())
      File "/home/mache/.cache/pre-commit/repo0ldja9vt/py_env-python3/lib/python3.10/site-packages/pre_commit_dbt/check_macro_arguments_have_desc.py", line 90, in main
        status_code, _ = check_argument_desc(paths=args.filenames, manifest=manifest)
      File "/home/mache/.cache/pre-commit/repo0ldja9vt/py_env-python3/lib/python3.10/site-packages/pre_commit_dbt/check_macro_arguments_have_desc.py", line 52, in check_argument_desc
        for key, value in item.macro.get("arguments", {}).items()
    AttributeError: 'list' object has no attribute 'items'
    

    To Reproduce

    Steps to reproduce the behavior using getdbt examples:

    1. macros/schema.yml
    version: 2
    
    macros:
      - name: cents_to_dollars
        description: A macro to convert cents to dollars
        arguments:
          - name: column_name
            type: string
            description: The name of the column you want to convert
          - name: precision
            type: integer
            description: Number of decimal places. Defaults to 2.
    
    1. macros/cents_to_dollars.sql
    {% macro cents_to_dollars(column_name, precision=3) %}
        COALESCE (TRUNC(CAST({{ column_name }}/100 AS numeric), {{ precision }}), 0)
    {% endmacro %}
    
    1. Execute the following command after dbt deps, dbt compile and dbt docs generate: pre-commit run check-macro-arguments-have-desc --files macros/cents_to_dollars.sql

    Expected behavior The hook should pass successfully.

    Version: commit 34a2341234675d7a6b61766b2c33bdd5c33d090b (current latest commit)

    Additional context It looks like the problem is here: for key, value in item.macro.get("arguments", {}).items() The hook is guessing the arguments key contains a dictionary while this is a list of dictionaries. https://github.com/offbi/pre-commit-dbt/blob/main/pre_commit_dbt/check_macro_arguments_have_desc.py#L52

    bug 
    opened by hacherix 0
  • check-model-parents-and-childs for zero child check never runs

    check-model-parents-and-childs for zero child check never runs

    Attempting to use check-model-parents-and-childs hook to ensure that our data consumption layer models do not have any children does not fail when models DO have children

    Steps to reproduce the behavior:

    1. dbt project with a parent and child models
    2. Add check-model-parents-and-childs with --max-child-cnt of zero
      - id: check-model-parents-and-childs
        name: Check for child models in data consumption layers
        # manifest.json required
        args: ["--manifest","./pipelines/target/manifest.json","--max-child-cnt","0","--"]
        files: models/self_service/
    

    Expected outcome:

    Model that has a child is raised as failure

    Actual outcome:

    Hook passes

    Version:

    repos:
    - repo: https://github.com/offbi/pre-commit-dbt
      rev: 34a2341234675d7a6b61766b2c33bdd5c33d090b
    

    Additional info:

    Offending code appears to be checking for default (None) for the --max-child-cnt returning false for zero value i.e.

     if req_cnt and req_operator(real_value, req_cnt):
         status_code = 1
         print(
    ...
    

    may need to explicitly check for None?

     if req_cnt is not None and req_operator(real_value, req_cnt):
       status_code = 1
         print(
    ...
    
    bug 
    opened by PeteCorbettWS 0
  • `check-script-has-no-table-name` fails incorrectly due to `EXTRACT` function

    `check-script-has-no-table-name` fails incorrectly due to `EXTRACT` function

    Describe the bug Using an EXTRACT date function will recognize the column reference as a table reference.

    Check the script has not table name......................................Failed
    - hook id: check-script-has-no-table-name
    - exit code: 1
    
    models/example.sql: does not use source() or ref() macros for tables:
    - order_date
    

    To Reproduce Based on the jaffle shop dbt example, create a model with the following content:

    SELECT
        *,
        EXTRACT(YEAR FROM ORDER_DATE) as ORDER_YEAR
    FROM source('jaffle_shop', 'orders')
    

    Expected behavior Using an EXTRACT date function should not make the check-script-has-no-table-name fail.

    Version: v1.0.0

    Additional context Link to EXTRACT function documentation for different warehouses:

    bug 
    opened by nasseredine 0
  • Feature Request: `check-model-has-column`

    Feature Request: `check-model-has-column`

    Describe the feature you'd like Add a hook which asserts that a given column with a given type exists in a model.

    Additional context We generally require that every model has an audit timestamp column called _updated_at and it would be nice to be able to enforce that.

    enhancement 
    opened by huptonbirdsall 0
  • `Check model name contract` hook problem with version

    `Check model name contract` hook problem with version

    When using the following pre commit config file

    repos:
    - repo: https://github.com/offbi/pre-commit-dbt
      rev: v1.0.0
      hooks:
      - id: check-model-name-contract
        args: [--pattern, "(rep__).*"]
        files: models/reporting
    

    I get the following response [ERROR] check-model-name-contract is not present in repository https://github.com/offbi/pre-commit-dbt. Typo? Perhaps it is introduced in a newer version? Often pre-commit autoupdate fixes this.

    Even if I run the autoupdate command the error message is the same. Am I missing something?

    bug 
    opened by papost 2
Releases(v1.0.0)
Owner
Offbi
Data engineering with ❤️
Offbi
Headless chatbot that detects spam and posts links to it to chatrooms for quick deletion.

SmokeDetector Headless chatbot that detects spam and posts it to chatrooms. Uses ChatExchange, takes questions from the Stack Exchange realtime tab, a

Charcoal 421 Dec 21, 2022
Pokemon catch events project to demonstrate data pipeline on AWS

Pokemon Catches Data Pipeline This is a sample project to practice end-to-end data project; Terraform is used to deploy infrastructure; Kafka is the t

Vitor Carra 4 Sep 03, 2021
A place where one-off ideas/partial projects can live comfortably

A place to post ideas, partial projects, or anything else that doesn't necessarily warrant its own repo, from my mind to the web.

Carson Scott 2 Feb 25, 2022
CPLib is the abbreviation of Competitive Programming Library.

CPLib CPLib is the abbreviation of Competitive Programming Library. It aims to be a general template and optimization library for competitive programm

12 Oct 16, 2021
Karte der Allgemeinverfügungen zu Schulschließungen oder eingeschränktem Regelbetrieb in Sachsen

SNSZ Karte Datenquelle: Allgemeinverfügungen zu Schulschließungen oder eingeschränktem Regelbetrieb in Sachsen Sächsisches Staatsministerium für Kultu

Jannis Leidel 3 Sep 26, 2022
Box CRUD API With Python

Box CRUD API: Consider a store which has an inventory of boxes which are all cuboid(which have length breadth and height). Each Cuboid has been added

Akhil Bhalerao 3 Feb 17, 2022
The git for the Python Story Utility Package library.

PSUP, The Python Story Utility Package Module. PSUP helps making stories or games with options, diverging paths, different endings and so on. You can

Enoki 6 Nov 27, 2022
CEI Natural Disaster Tracking Portal

CEI Natural Disaster Tracking Portal (cc) Climatic Eye of ISCI We are an initiative that conducts studies in the field of Space Science, publishes pro

Baris Dincer 7 Dec 24, 2022
Open Source Repository for CFD Solvers

Background and Validation This wiki is built in Notion. Here are all the tips you need to contribute. General Background Flow over a cylinder The proj

1 Dec 30, 2021
Covid-19-Trends - A project that me and my friends created as the CSC110 Final Project at UofT

Covid-19-Trends Introduction The COVID-19 pandemic has caused severe financial s

1 Jan 07, 2022
A simple python script to convert Rubber Ducky payloads into AutoHotKey scripts

AHKDuckyReplacer A simple python script to convert Rubber Ducky payloads into AutoHotKey scripts. I have also added a sample payload for testing. I wi

Krizsan0596 5 Sep 28, 2022
Generates Windows 95 and 95 OEM keys using the modulus 7 check algorithm

w95keygen-python windowskeygen.py - Generates Windows 95 and 95 OEM keys using the modulus 7 check algorithm Just download and drop in the directory y

Joshua Alto 1 Dec 06, 2021
A Python script made for the Python Discord Pixels event.

Python Discord Pixels A Python script made for the Python Discord Pixels event. Usage Create an image.png RGBA image with your pattern. Transparent pi

Stanisław Jelnicki 4 Mar 23, 2022
For my Philips Airpurifier AC3259/10

Philips-Airpurifier For my Philips Airpurifier AC3259/10 I will try to keep this code

AcidSleeper 7 Feb 26, 2022
Python library to decode the EU Covid-19 vaccine certificate

DCC Utils Python library to decode the EU Covid-19 vaccine certificate, as specified by the EU. Setup pip install dcc-utils Make sure zbar is installe

Developers Italia 13 Mar 11, 2022
GUI tool to manage the contents of chests in Botw

Botw chest manager is a small gui tool allowing to easily manage chests. Sometimes Ice Spear can be very time consuming when adding a simple chest. The purpose of this light tool is to add a new ches

3 Aug 25, 2022
Build your own Etherscan with web3.py

Build your own Etherscan with web3.py Video Tutorial: Run it pip3 install -r requirements.txt export FLASK_APP=app export FLASK_ENV=development flask

35 Jan 02, 2023
Bootstraparse is a personal project started with a specific goal in mind: creating static html pages for direct display from a markdown-like file

Bootstraparse is a personal project started with a specific goal in mind: creating static html pages for direct display from a markdown-like file

1 Jun 15, 2022
A replacement of qsreplace, accepts URLs as standard input, replaces all query string values with user-supplied values and stdout.

Bhedak A replacement of qsreplace, accepts URLs as standard input, replaces all query string values with user-supplied values and stdout. Works on eve

Eshan Singh 84 Dec 31, 2022
World's best free and open source ERP.

World's best free and open source ERP.

Frappe 12.5k Jan 07, 2023