:fishing_pole_and_fish: List of `pre-commit` hooks to ensure the quality of your `dbt` projects.

Overview

dbt-pre-commit

pre-commit-dbt

CI black black

List of pre-commit hooks to ensure the quality of your dbt projects.

BETA NOTICE: This tool is still BETA and may have some bugs, so please be forgiving!

Goal

Quick ensure the quality of your dbt projects.

dbt is awesome, but when a number of models, sources, and macros grow it starts to be challenging to maintain quality. People often forget to update columns in schema files, add descriptions, or test. Besides, with the growing number of objects, dbt slows down, users stop running models/tests (because they want to deploy the feature quickly), and the demands on reviews increase.

If this is the case, pre-commit-dbt is here to help you!

List of pre-commit-dbt hooks

💡 Click on hook name to view the details.

Model checks:

Script checks:

Source checks:

Modifiers:

dbt commands:


If you have an idea for a new hook or you found a bug, let us know

Install

For detailed installation and usage, instructions see pre-commit.com site.

pip install pre-commit

Setup

  1. Create a file named .pre-commit-config.yaml in your dbt root folder.
  2. Add list of hooks you want to run befor every commit. E.g.:
repos:
- repo: https://github.com/offbi/pre-commit-dbt
  rev: v0.1.1
  hooks:
  - id: check-script-semicolon
  - id: check-script-has-no-table-name
  - id: dbt-test
  - id: dbt-docs-generate
  - id: check-model-has-all-columns
    name: Check columns - core
    files: ^models/core
  - id: check-model-has-all-columns
    name: Check columns - mart
    files: ^models/mart
  - id: check-model-columns-have-desc
    files: ^models/mart
  1. Optionally, run pre-commit install to set up the git hook scripts. With this, pre-commit will run automatically on git commit! You can also manually run pre-commit run after you stage all files you want to run. Or pre-commit run --all-files to run the hooks against all of the files (not only staged).

Run as Github Action

Unfortunately, you cannot natively use pre-commit-dbt if you are using dbt Cloud. But you can run checks after you push changes into Github.

To do that, make a file .github/workflows/pre-commit.yml.

name: pre-commit

on:
 pull_request:
 push:
 branches: [main]

jobs:
 pre-commit:
 runs-on: ubuntu-latest
 steps:
 - uses: actions/[email protected]
 - uses: actions/[email protected]
 - uses: pre-commit/[email protected]

To run only changed files:

name: pre-commit

on:
 pull_request:
 push:
 branches: [main]

jobs:
 pre-commit:
 runs-on: ubuntu-latest
 steps:
 - uses: actions/[email protected]
 - uses: actions/[email protected]
 - id: file_changes
 uses: trilom/[email protected]
 with:
 output: ' '
 - uses: pre-commit/[email protected]
 with:
 extra_args: --files ${{ steps.file_changes.outputs.files}}

To be able to run modifiers you need to use only private repository and change your .github/workflows/pre-commit.yml to:

name: pre-commit

on:
 pull_request:
 push:
 branches: [main]

jobs:
 pre-commit:
 runs-on: ubuntu-latest
 steps:
 - uses: actions/[email protected]
 with:
 fetch-depth: 0
 - uses: actions/[email protected]
 - id: file_changes
 uses: trilom/[email protected]
 with:
 output: ' '
 - uses: pre-commit/[email protected]
 with:
 extra_args: --files ${{ steps.file_changes.outputs.files}}
 token: ${{ secrets.GITHUB_TOKEN }}

For more informations about pre-commit/action visit https://github.com/pre-commit/action.

Comments
  • add argument support to dbt commands (dbt clean & dbt deps)

    add argument support to dbt commands (dbt clean & dbt deps)

    This enhancement will allow the dbt-clean and dbt-deps hooks to take arguments for global and cmd flags. This addresses issue 39 by allowing the project-dir flag to be added to the dbt clean and dbt deps commands Example:

    • id: dbt-clean args: ["--cmd-flags", "++project-dir", "./transform/dbt"]
    opened by Aostojic 8
  • Docker image pull down image action fails

    Docker image pull down image action fails

    Describe the bug When hooking to the docker file, getting error invalid reference format when the build runs the step to pull the action image.

    Pull down action image 'offbi:pre-commit-dbt:v1.0.0' /usr/bin/docker pull offbi:pre-commit-dbt:v1.0.0 invalid reference format Warning: Docker pull failed with exit code 1, back off 3.275 seconds before retry.

    To Reproduce Steps to reproduce the behavior:

    1. add new file .github/workflows/pre-commit-dbt.yml:
    name: pre-commit
    
    on:
      pull_request:
        branches: [master]
    
    jobs:
      pre-commit:
        runs-on: ubuntu-latest
        steps:
        - uses: actions/[email protected]
        - uses: actions/[email protected]
        - id: file_changes
          uses: trilom/[email protected]
          with:
            output: ' '
        - uses: offbi/[email protected]
          env:
            DB_PASSWORD: ${{ secrets.SuperSecret }}
          with:
            args: run --files ${{ steps.file_changes.outputs.files}}
    
    1. add the following code to existing (or create new) .pre-commit-config.yaml
    - repo: https://github.com/offbi/pre-commit-dbt
      rev: v1.0.0
      hooks:
      - id: check-script-has-no-table-name
        files: ^dbt
    
    1. commit changes and create a pull request to trigger test build.

    Expected behavior Build test completes successfully.

    Version: v1.0.0 (dbt version 0.21.0)

    Additional context I think it's odd that the yml file clearly shows offbi/[email protected] for the step action, but when the docker pull command is passed the syntax is changed to offbi:pre-commit-dbt:v1.0.0 (notice colons in place of slash & @). Nowhere in the documentation does it list how to avoid this. Also note: The "test" completes successfully if the following block is removed (all other steps run correctly):

        - uses: offbi/[email protected]
          env:
            DB_PASSWORD: ${{ secrets.SuperSecret }}
          with:
            args: run --files ${{ steps.file_changes.outputs.files}}
    

    So it seems that it's this specific docker image causing the issue.

    bug 
    opened by grace-cityblock 6
  • add prj_root argument to dbt commands

    add prj_root argument to dbt commands

    This enhancement enables user to benefit from dbt-command hooks whenever their dbt project is not at the root level of the repo. It works by providing an additional argument to the dbt commands. Example: hooks: - id: dbt-docs-generate files: ^projects/ args: ["--prj-root","./common/dbt"] - id: dbt-deps files: ^projects/ args: ["--prj-root","./common/dbt"]

    In this example, the root of the dbt project is located inside the folder './commond/dbt'.

    Addresses issue https://github.com/offbi/pre-commit-dbt/issues/39

    opened by joaobernardopa 5
  • Unable to load manifest file ([Errno 2] No such file or directory: 'target/manifest.json')

    Unable to load manifest file ([Errno 2] No such file or directory: 'target/manifest.json')

    Describe the bug Any hook that tries to read target/manifest.json results in Unable to load manifest file ([Errno 2] No such file or directory: 'target/manifest.json') if dbt project directory (contains target/) != git project root.

    To Reproduce Steps to reproduce the behavior:

    1. Create the following project structure
    my_project/
    ├── .git/
    └── dbt_project/
         ├── dbt_project.yml
         ├── target/
         ├── models/
         └── ...
    
    1. Run any hook that requires the manifest

    Expected behavior I guess the current behavior is expected but there could be an option to specify the dbt project directory.

    Workaround: cp -r dbt_project/target target

    Version: v1.0.0

    bug 
    opened by stumelius 5
  • Cannot use the action in a workflow

    Cannot use the action in a workflow

    Describe the bug Hi there, first of all thanks for the efforts in developing this library. We are trying to include this step in our workflow, but we are facing an issue related to the docker image. It cannot pull it:

    Screenshot 2021-04-23 at 10 31 04

    Might the issue be related to the actual image defined in action.yaml ?

    To Reproduce Steps to reproduce the behavior:

    1. follow either the readme example or the snippet as reported in the marketplace

    Example configuration:

         - id: dbt_checks
            uses: offbi/[email protected]
            env:
              [ ... ]
            with:
              args: run --files ${{ steps.file_changes.outputs.files }}
    

    Version: v1.0.0

    bug 
    opened by Sh1n 5
  • check-column-desc-are-same fails with a Python error

    check-column-desc-are-same fails with a Python error

    Describe the bug When trying to use the check-column-desc-are-same hook, I'm getting the following Python error. Other hooks I've tried so far are working:

    Check column descriptions are same.......................................Failed
    - hook id: check-column-desc-are-same
    - exit code: 1
    
    Traceback (most recent call last):
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/bin/check-column-desc-are-same", line 8, in <module>
        sys.exit(main())
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/lib/python3.7/site-packages/pre_commit_dbt/check_column_desc_are_same.py", line 80, in main
        return check_column_desc(paths=args.filenames, ignore=args.ignore)
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/lib/python3.7/site-packages/pre_commit_dbt/check_column_desc_are_same.py", line 55, in check_column_desc
        grouped = get_grouped(paths, ignore)
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/lib/python3.7/site-packages/pre_commit_dbt/check_column_desc_are_same.py", line 48, in get_grouped
        sorted(columns, key=lambda x: x.column_name), lambda x: x.column_name
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/lib/python3.7/site-packages/pre_commit_dbt/check_column_desc_are_same.py", line 29, in get_all_columns
        for item in schemas:
      File "/Users/Martin/.cache/pre-commit/repoewj3j5ob/py_env-python3.7/lib/python3.7/site-packages/pre_commit_dbt/utils.py", line 134, in get_model_schemas
        model_name = model.get("name")
    AttributeError: 'str' object has no attribute 'get'
    

    Version: v0.1.1 Python 3.7.9

    bug 
    opened by MartinGuindon 5
  • check-script-has-no-table-name is failing when using lateral flatten

    check-script-has-no-table-name is failing when using lateral flatten

    Describe the bug See updated description of the bug.

    ~~The check-script-has-no-table-name pre-commit hook is confusing CTEs with tables, and fails with code like this:~~

    with source as (
    
        select * from {{ source('stripe', 'payments') }}
    
    ),
    
    renamed as (
    
        select
            id as payment_id,
            order_id,
            payment_method,
    
            --`amount` is currently stored in cents, so we convert it to dollars
            amount / 100 as amount
    
        from source
    
    )
    
    select * from renamed
    

    ~~It reports that "source" and "renamed" are tables even though they are not, even though it looks the same from a code perspective.~~

    ~~I think this hook should perhaps fail only at the presence of schema.table or database.schema.table references, unless we can make this hook smarter by being aware of the CTEs defined in the model.~~

    Version: v0.1.1

    bug 
    opened by MartinGuindon 5
  • Fix Github Action docker reference

    Fix Github Action docker reference

    According to Github Actions documentation, we should reference the docker image directly from the project: https://docs.github.com/pt/actions/creating-actions/creating-a-docker-container-action#creating-an-action-metadata-file

    This fixes: https://github.com/offbi/pre-commit-dbt/issues/26

    opened by tlfbrito 3
  • Add check-model-name-contract hook

    Add check-model-name-contract hook

    There's now a check-model-name-contract hook (similar to check-model-name-contract) that is used to ensure that models follow naming convention.

    To-do before merge

    • [x] Write unit tests for the hook
    opened by stumelius 3
  • check-model-has-properties-file fails on macro with a valid properties yml

    check-model-has-properties-file fails on macro with a valid properties yml

    Describe the bug When running the test check-model-has-properties-file with a macro, the test fails with the following error.

    Check the model has properties file......................................Failed
    - hook id: check-model-has-properties-file
    - exit code: 1
    
    macros/grant_select_on_schemas.sql: does not have model properties defined in any .yml file.
    

    The .pre-commit-config.yaml includes the rule:

    repos:
    - repo: https://github.com/offbi/pre-commit-dbt
      rev: 607cb07a1918442f5963662a9aa19da8984931e6
      hooks:
      - id: check-model-has-properties-file
    

    And the macro has the following .yml file (the filename is the same as the macro name and is stored within the macros folder):

    
    macros:
      - name: grant_select_on_schemas
        description: "Grants privileges to groups after dbt run"
        docs:
          show: false
    

    Hope we can get this fixed soon as this is a really useful test to include

    bug 
    opened by andrewlee-trouva 3
  • wip: Added hook: check_model_has_tests_by_group

    wip: Added hook: check_model_has_tests_by_group

    This PR adds a hook that checks if a model has a sufficient number of tests pulled out of a group of acceptable tests, e.g. this model has 1 of unique, unique_where, or unique_threshold.

    opened by jtalmi 3
  • `check_macro_arguments_have_desc` hook fails to parse arguments

    `check_macro_arguments_have_desc` hook fails to parse arguments

    Describe the bug

    check_macro_arguments_have_desc hook raises the following error even though the content of the files is ok. Other hooks are working correctly, including check_macro_has_description.

    Traceback (most recent call last):
      File "/home/mache/.cache/pre-commit/repo0ldja9vt/py_env-python3/bin/check-macro-arguments-have-desc", line 8, in <module>
        sys.exit(main())
      File "/home/mache/.cache/pre-commit/repo0ldja9vt/py_env-python3/lib/python3.10/site-packages/pre_commit_dbt/check_macro_arguments_have_desc.py", line 90, in main
        status_code, _ = check_argument_desc(paths=args.filenames, manifest=manifest)
      File "/home/mache/.cache/pre-commit/repo0ldja9vt/py_env-python3/lib/python3.10/site-packages/pre_commit_dbt/check_macro_arguments_have_desc.py", line 52, in check_argument_desc
        for key, value in item.macro.get("arguments", {}).items()
    AttributeError: 'list' object has no attribute 'items'
    

    To Reproduce

    Steps to reproduce the behavior using getdbt examples:

    1. macros/schema.yml
    version: 2
    
    macros:
      - name: cents_to_dollars
        description: A macro to convert cents to dollars
        arguments:
          - name: column_name
            type: string
            description: The name of the column you want to convert
          - name: precision
            type: integer
            description: Number of decimal places. Defaults to 2.
    
    1. macros/cents_to_dollars.sql
    {% macro cents_to_dollars(column_name, precision=3) %}
        COALESCE (TRUNC(CAST({{ column_name }}/100 AS numeric), {{ precision }}), 0)
    {% endmacro %}
    
    1. Execute the following command after dbt deps, dbt compile and dbt docs generate: pre-commit run check-macro-arguments-have-desc --files macros/cents_to_dollars.sql

    Expected behavior The hook should pass successfully.

    Version: commit 34a2341234675d7a6b61766b2c33bdd5c33d090b (current latest commit)

    Additional context It looks like the problem is here: for key, value in item.macro.get("arguments", {}).items() The hook is guessing the arguments key contains a dictionary while this is a list of dictionaries. https://github.com/offbi/pre-commit-dbt/blob/main/pre_commit_dbt/check_macro_arguments_have_desc.py#L52

    bug 
    opened by hacherix 0
  • check-model-parents-and-childs for zero child check never runs

    check-model-parents-and-childs for zero child check never runs

    Attempting to use check-model-parents-and-childs hook to ensure that our data consumption layer models do not have any children does not fail when models DO have children

    Steps to reproduce the behavior:

    1. dbt project with a parent and child models
    2. Add check-model-parents-and-childs with --max-child-cnt of zero
      - id: check-model-parents-and-childs
        name: Check for child models in data consumption layers
        # manifest.json required
        args: ["--manifest","./pipelines/target/manifest.json","--max-child-cnt","0","--"]
        files: models/self_service/
    

    Expected outcome:

    Model that has a child is raised as failure

    Actual outcome:

    Hook passes

    Version:

    repos:
    - repo: https://github.com/offbi/pre-commit-dbt
      rev: 34a2341234675d7a6b61766b2c33bdd5c33d090b
    

    Additional info:

    Offending code appears to be checking for default (None) for the --max-child-cnt returning false for zero value i.e.

     if req_cnt and req_operator(real_value, req_cnt):
         status_code = 1
         print(
    ...
    

    may need to explicitly check for None?

     if req_cnt is not None and req_operator(real_value, req_cnt):
       status_code = 1
         print(
    ...
    
    bug 
    opened by PeteCorbettWS 0
  • `check-script-has-no-table-name` fails incorrectly due to `EXTRACT` function

    `check-script-has-no-table-name` fails incorrectly due to `EXTRACT` function

    Describe the bug Using an EXTRACT date function will recognize the column reference as a table reference.

    Check the script has not table name......................................Failed
    - hook id: check-script-has-no-table-name
    - exit code: 1
    
    models/example.sql: does not use source() or ref() macros for tables:
    - order_date
    

    To Reproduce Based on the jaffle shop dbt example, create a model with the following content:

    SELECT
        *,
        EXTRACT(YEAR FROM ORDER_DATE) as ORDER_YEAR
    FROM source('jaffle_shop', 'orders')
    

    Expected behavior Using an EXTRACT date function should not make the check-script-has-no-table-name fail.

    Version: v1.0.0

    Additional context Link to EXTRACT function documentation for different warehouses:

    bug 
    opened by nasseredine 0
  • Feature Request: `check-model-has-column`

    Feature Request: `check-model-has-column`

    Describe the feature you'd like Add a hook which asserts that a given column with a given type exists in a model.

    Additional context We generally require that every model has an audit timestamp column called _updated_at and it would be nice to be able to enforce that.

    enhancement 
    opened by huptonbirdsall 0
  • `Check model name contract` hook problem with version

    `Check model name contract` hook problem with version

    When using the following pre commit config file

    repos:
    - repo: https://github.com/offbi/pre-commit-dbt
      rev: v1.0.0
      hooks:
      - id: check-model-name-contract
        args: [--pattern, "(rep__).*"]
        files: models/reporting
    

    I get the following response [ERROR] check-model-name-contract is not present in repository https://github.com/offbi/pre-commit-dbt. Typo? Perhaps it is introduced in a newer version? Often pre-commit autoupdate fixes this.

    Even if I run the autoupdate command the error message is the same. Am I missing something?

    bug 
    opened by papost 2
Releases(v1.0.0)
Owner
Offbi
Data engineering with ❤️
Offbi
A not exist cat image generator python package

A not exist cat image generator python package

Fayas Noushad 2 Dec 03, 2021
Repositório do Projeto de Jogo da Resília Educação.

Jogo da Segurança das Indústrias Acme Descrição Este jogo faz parte do projeto de entrega do primeiro módulo da Resilia Educação, referente ao curso d

Márcio Estevam da Silva 2 Apr 28, 2022
🐍 This snake helps you reconnect the Web, with RSS feeds!

This snake helps you reconnect the Web, with RSS feeds! RSSerpent is an open-source software that create RSS feeds for websites that do not provide an

211 Dec 08, 2022
TrainingBike - Code, models and schematics I've used to interface my stationary training bike with PC.

TrainingBike Code, models and schematics I've used to interface my stationary training bike with PC. You can find more information about the project i

1 Jan 01, 2022
An example module hooking system, will be used in PySAMP.

An example module hooking system, will be used in PySAMP.

2 May 01, 2022
🤡 Multiple Discord selfbot src deobfuscated !

Deobfuscated selfbot sources About. If you whant to add src, please make pull requests. If you whant to deobfuscate src, send mail to

Sreecharan 5 Sep 13, 2021
Data Science Course at Dept. of Computer Engineering, Chula 2022

2110446 Data Science Course at Chula 2022 Short links for exercises: Week1: Intro to Numpy, Pandas Numpy: https://colab.research.google.com/github/kao

Kao Panboonyuen 17 Nov 27, 2022
Python Freecell Solver

freecell Python Freecell Solver Very early version right now. You can pick a board by changing the file path in freecell.py If you want to play a game

Ben Kaufman 1 Nov 26, 2021
A framework to create reusable Dash layout.

dash_component_template A framework to create reusable Dash layout.

The TolTEC Project 4 Aug 04, 2022
El Niño - Southern Oscillation analysis compared to minimum flow rates of rivers in northeast Brazil

ENSO (El Niño - Southern Oscillation) analysis in northeast Brazil É comprovada a influência dos fenômenos El Niño e La Niña nas secas no nordesde bra

Weyder Freire 1 Jan 13, 2022
Attempt at creating organized collection of little handy snippets of code I'm receiving along the way

ChaosCode Attempt at creating organized collection of little handy snippets of code I'm receiving along the way I always considered coding and program

INFU 4 Nov 26, 2022
An extended version of the hotkeys demo code using action classes

An extended version of the hotkeys application using action classes. In adafruit's Hotkeys code, a macro is using a series of integers, assumed to be

Neradoc 5 May 01, 2022
Airflow Operator for running Soda SQL scans

Airflow Operator for running Soda SQL scans

Todd de Quincey 7 Oct 18, 2022
Insert a Spotify Playlist, Get a list of YouTube URLs from it.

spotbee This is a module that spits out YouTube URLs from Spotify Playlist URLs Why use this? It is asynchronous which makes it compatible to use with

Nishant Sapkota 10 Apr 06, 2022
JLC2KICAD_lib is a python script that generate a component library for KiCad from the JLCPCB/easyEDA library.

JLC2KiCad_lib is a python script that generate a component library (schematic, footprint and 3D model) for KiCad from the JLCPCB/easyEDA library. This script requires Python 3.6 or higher.

Nicolas Toussaint 73 Dec 26, 2022
SWS Filters App - SWS Filters App With Python

SWS Filters App Fun 😅 ... Fun 😅 Click On photo and see 😂 😂 😂 Your Video rec

Sagar Jangid 3 Jul 07, 2022
This is a repository built by the community for the community.

Nutshell Machine Learning Machines can see, hear and learn. Welcome to the future 🌍 The repository was built with a tree-like structure in mind, it c

Edem Gold 82 Nov 18, 2022
Easy way to build a SaaS application using Python and Dash

EasySaaS This project will be attempt to make a great starting point for your next big business as easy and efficent as possible. This project will cr

xianhu 3 Nov 17, 2022
API Rate Limit Decorator

ratelimit APIs are a very common way to interact with web services. As the need to consume data grows, so does the number of API calls necessary to re

Tomas Basham 574 Dec 26, 2022
Convert long numbers into a human-readable format in Python

Convert long numbers into a human-readable format in Python

Alex Zaitsev 73 Dec 28, 2022