Dbt-core - dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

Overview

dbt logo

Unit Tests Badge Integration Tests Badge

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.

architecture

Understanding dbt

Analysts using dbt can transform their data by simply writing select statements, while dbt handles turning these statements into tables and views in a data warehouse.

These select statements, or "models", form a dbt project. Models frequently build on top of one another – dbt makes it easy to manage relationships between models, and visualize these relationships, as well as assure the quality of your transformations through testing.

dbt dag

Getting started

Join the dbt Community

Reporting bugs and contributing code

Code of Conduct

Everyone interacting in the dbt project's codebases, issue trackers, chat rooms, and mailing lists is expected to follow the dbt Code of Conduct.

Comments
  • [Feature] dbt should know about metrics

    [Feature] dbt should know about metrics

    Is there an existing feature request for this?

    • [X] I have searched the existing issues

    Describe the Feature

    dbt should know about metrics. A metric is a timeseries aggregation over a table that supports zero or more dimensions. Some examples of metrics include:

    • active users
    • churn rate
    • mrr (monthly recurring revenue)

    dbt should support metric definitions as a new node type. Like exposures, metrics participate in the dbt DAG and can be expressed in yaml files. By defining metrics in dbt projects, analytics engineers can encode crucial business logic in tested, version controlled code. Further, these metrics definitions can be exposed to downstream tooling to drive consistency and precision in metric reporting.

    The ecosystem

    There is some prior art for defining metrics in dbt projects. In particular, see

    While these two implementations differ (measures vs. metrics), more on that below, there exists a need in the community for a first-class way to define these metrics in dbt code. It is really neat to see that some folks have already made these definitions possible with dbt, but it would better if metrics were treated as well-defined nodes with field validation and helpful utilities inside of dbt Core.

    Specification

    A metric is a timeseries aggregation over a table that supports zero or more dimensions. These metrics can be encoded in schema.yml files. In the example below, a new_customers metric is defined as a count of customer records created in a given time grain.

    # models/marts/product/schema.yml
    
    version: 2
    
    models:
     - name: dim_customers
       ...
    
    metrics:
      - name: new_customers
        label: New Customers
        model: dim_customers
        description: "The number of paid customers who are using the product"
    
        type: count
        sql: user_id # superflous here, but shown as an example
    
        timestamp: signup_date
        time_grains: [day, week, month]
    
        dimensions:
          - plan
          - country
        
        filters:
          - field: is_paying
            value: true
    
        meta: {}
    

    Given this information, a downstream process (or a dbt macro!) can generate a sql SELECT statement that correctly calculates this metric with a specified time grain and set of dimensions. Here is a breakdown of supported fields:

    | Field | Description | Example | Required? | |-------------|-------------------------------------------------------------|---------------------------------|-----------| | name | A unique identifier for the metric | new_customers | yes | | model | The dbt model that powers this metric | dim_customers | yes | | label | A short for name / label for the metric | New Customers | no | | description | Long form, human-readable description for the metric | The number of customers who.... | no | | type | The type of calculation to perform when evaluating a metric | count_distinct | yes | | sql | The expression to aggregate/calculate over | user_id | yes | | timestamp | The time-based component of the metric | signup_date | yes | | time_grains | One or more "grains" at which the metric can be evaluated | [day, week, month] | yes | | dimensions | A list of dimensions to group or filter the metric by | [plan, country] | no | | filters | A list of filters to apply before calculating the metric | See below | no | | meta | Arbitrary key/value store | {team: Finance} | no |

    Model reference

    A reference to a dbt model. This model may be any "materialized" model, or a reference to an ephemeral model. Direct table references are not allowed, and alternate node types (seeds, snapshots) are not supported.

    Metric types

    The following metric types should be supported:

    • count
    • count_distinct
    • sum
    • average
    • min
    • max

    In the future, alternative metric types (ratios, deltas, etc) should be supported in this model.

    Filters

    Filters should be defined as a list of dictionaries that define predicates for the metric. Filters are ANDed together. If more complex filtering is required, users can (and should) push that logic down into the underlying model.

    filters:
      - field: is_paying
        value: true
    

    Functional requirements

    • Metrics should participate in the dbt DAG as a distinct node type
    • Metric nodes should be accessible in the dbt Core compilation context via:
      • the graph.metrics variable
      • one or more accessor functions like metrics.find_by_name('...') (exact mechanism TBD)
    • Metric nodes should be emitted into the manifest.json artifact
    • Metrics should work with partial parsing
    • Metric nodes should be supported in node selection and should be selectable with the metric: selector
      • When listing nodes, existing graph operators (+, &, etc) should be supported
    • (in a different issue) Metrics should be surfaced in the dbt Docs website

    dbt Core should not, itself, evaluate or calculate metrics. Instead, dbt Core should expose the definition of metrics to downstream tools or packages for evaluation and analysis. In this way, it is critical that dbt Core provides hooks into metrics that can be leveraged in both macro code, or by processes that consume dbt Core manifest.json files.

    Describe alternatives you've considered

    Don't implement metrics as distinct node types and keep encoding them in meta properties:

    • This information is untyped and semantically unrepresented in dbt, so it would be a net-improvement to instead create a first-class node type in dbt Core for these logical DAG nodes

    Metrics vs. Measures

    Metrics are strongly-typed objects. It is extremely common to see folks perform syntactically correct but semantically meaningless calculations over data. This looks like averaging an average, or adding two distinct counts together. You get a number back... but it's not a useful or meaningful result.

    To that end, I think we should start with metrics instead of measures. The difference here (and maybe a strawperson of my own creation - tell me if you think so) is that measures are untyped aggregations, whereas metrics are rigorously defined summaries over well-defined datasets. The creation of metrics does not preclude us from teaching dbt about more generic types of aggregations in the future, but I'd prefer to start with a narrow set of functionality and expand over time. It is easy to remove constraints, but it is hard to add them 🙂

    Include support for joins

    • Joins make metric calculations really complicated. dbt should absolutely know about foreign key relationships (outside of the existing relationships test) in the future, but this would be a meaningful expansion of scope for our first cut of this feature
    • While these joins would be semantically useful, they are not a blocker to defining metrics today. Join logic can be pushed down into model code (whether materialized or ephemeral). We should experiment with this single table paradigm, see how it feels, and then consider the best approaching for teaching dbt about semantic joins in the future.

    Where metrics are defined

    Should metrics be a property of a model? While that could be functional today, I think this would make it hard to extend metrics to work with joins (see above). Instead, declaring metrics as independent nodes that participate in the DAG is a more future-proof idea, and we'd probably do well to avoid the "patching" flow required to get schema tests (properties of models today) translated into their own independent nodes in the DAG.

    Inheriting configuration from models

    Should metrics be namespaced under a model? This would make it possible to define some "shared" properties for all of the metrics derived from a model (eg. valid dimensions, the time field, supported time grain). This would be good for ergonomics, but not a huge value-add IMO. I'd like to keep this simple for the initial implementation and then make decisions like this with some more information from the community around example use-cases.

    Example:

    metrics:
      - model: dim_customers
        # dimensions are shared for all metrics defined in terms of this model
        dimensions:
          - country
          - plan
    
        definitions:
          - name: new_customers
          - name: churned_customers
    

    SQL calculations in dimensions

    Should dimensions be allowed to provide arbitrary SQL expressions? I don't think so — that SQL is best encoded in model code, and it would be confusing and dissonant to break up dimension definitions across SQL and yaml files.

    Example:

    metrics:
      - name: dim_customers
    
        # This logic should be represented in the underlying model
        dimensions:
          - field: plan_type
            sql: case when plan in ('pro', 'extra pro') then 'paid' else 'free' end
            
    

    Who will this benefit?

    Analytics engineers - As with models, AEs will be able to define metric logic under version control. By colocating model and metric code, new metrics or changes to existing metrics can be made in a tested, versioned, documented, code reviewed environment - Further, dbt Core's built in lineage can surface information about how changes to an upstream model may impact a downstream metric

    BI/Analytics tooling (and therein, data consumers) - Organizations use metrics to understand performance and make decisions. To that end, the correctness and precision of these metrics is really paramount! By defining metrics rigorously under version control, and then exposing their definitions globally, dbt Core can help ensure consistency in reporting.

    The data ecosystem - There are so many tools, both existing and yet to be created, that can benefit from an open source mechanism for defining a semantic model on top of the data warehouse. I believe that this business logic is just too valuable and strategically important for end-users to be locked up in proprietary tooling. To that end, this feature, and future types of semantic logic like this, should be addressable in an open source way

    Are you interested in contributing this feature?

    I sure am :)

    Anything else?

    From around the internet:

    enhancement metrics 
    opened by drewbanin 48
  • Feature/dbt deps tarball

    Feature/dbt deps tarball

    resolves #4205

    add new dbt.deps type: url to internally hosted tarball #420

    Continued from https://github.com/dbt-labs/dbt-core/pull/4220

    Revision 3 added Nov 6 2022

    Proposed solution for feature request 4205

    Description

    Enable direct linking to tarball urls in packages.yml, for example:

    # manufactured test, since you'd want to use hub to install these 
    # public tarball used here as example only! 
    # this would usually be a tarball hosted  on an internal network
    packages:
      - tarball: https://codeload.github.com/dbt-labs/dbt-utils/tar.gz/0.6.5
        name: 'dbt_utils_065'
    
    
    image image

    Rational:

    • dbt projects being self hosted on larger enterprise environments often don't have a connection to the internet (dbt hubs won't work).
    • dbt users on larger enterprise environments like to build internal private packages for non-public use (help out other dbt users in company with specific functionality)
    • git package install is not a good option at scale for larger enterprise environments
    • internal file hosting service (such as internal artifactory service or internal cloud storage buckets) can be easily configured to host packages for install during deployment, so lets give dbt users a way to install from a direct tar file link

    Sketching out doc changes here: https://github.com/timle2/docs.getdbt.com/blob/dbt-docs-tarball-package-updates/website/docs/docs/building-a-dbt-project/package-management.md#tar-files

    Checklist

    • [x] I have signed the CLA
    • [x] I have run this code in development and it appears to resolve the stated issue
    • [x] This PR includes tests, or tests are not required/relevant for this PR
    • [x] I have run changie new to create a changelog entry
    • [x] I have opened an issue to add/update docs~, or docs changes are not required/relevant for this PR~
      • https://github.com/dbt-labs/docs.getdbt.com/issues/2474
    cla:yes Team:Language ready_for_review 
    opened by timle2 43
  • Gets columns to update from config for BQ and Snowflake

    Gets columns to update from config for BQ and Snowflake

    resolves #1862

    Description

    Incremental models currently default to updating all columns. For databases that support merge statements, this PR allows the user to pass in an update_columns config parameter to selectively update only a subset of columns by replacing the call to adapter.get_columns_in_relation in the materialization for BigQuery and Snowflake.

    Checklist

    • [x] I have signed the CLA
    • [x] I have run this code in development and it appears to resolve the stated issue
    • [x] This PR includes tests, or tests are not required/relevant for this PR
    • [x] I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.
    cla:yes 
    opened by prratek 43
  • Adding Full Refresh on schema change option for model config

    Adding Full Refresh on schema change option for model config

    Goal

    This is a work in progress Pull Request for the implementation of the following feature for DBT: https://github.com/fishtown-analytics/dbt/issues/1132

    The specific aspect of the feature is for incremental merge models to have a config option that allows them to more gracefully handle schema changes in either the source or target models. through the on_schema_change optional config option, you can either set fail and dbt will throw an exception before it attempts to merge, or full_refresh and dbt will do a full_refresh mode run of the model if it detects a schema change.

    Logic for schema change detection

    The new config on_schema_change calls the adapter function has_schema_changed(). The function makes the following checks between the temp_relation and the target_relation generated when an incremental merge is attempted.

    1. First compare length of temp_relation schema and target_relation schema, if one is longer, there are more rows and therefore the schemas do not match.

    2. Test to see if all temp_relation column names appear in the target_relation column names, if not then a column name has changed and the schemas are different.

    3. Test to see if the datatypes in temp_relation columns are different from target_relation. Only checks for the named type, not precision (so character varying(5) is only compared as character). Since dbt will handle type resolution only check for a breaking error.

    4. Test to see if there is a target_relation column that does not appear in the temp_relation.

    5. if these tests fail, no schema change detected

    NOTE: Logging has been added to explain what changes were detected and show the schema of the temp and target relations.

    Approach

    The general approach I took for implementing the "on schema change" feature is to:

    1. Add a new configuration key on_schema_change to the appropriate contract - core/dbt/contracts/graph/parsed.py

    2. Create a new SchemaChangeException that will be thrown when a schema change is detected and should be failed - core/dbt/exceptions.py

    3. Add a new abstract method to hold the logic for detecting a schema change for each type of DB - core/dbt/adapters/base/impl.py

    4. Add a concrete implementation for the sql adapter - core/dbt/adapters/sql/impl.py

    5. Add on_schema_change logic to sql adapter incremental materialization macro

    6. Add a concrete implementation for the snowflake adapter -plugins/snowflake/dbt/adapters/snowflake/impl.py

    7. Add on_schema_change logic to snowflake adapter incremental materialization macro

    8. [WIP] Integration tests for multiple configuration setups

    9. [Not started] Working unit tests and style cleaned up

    opened by cfraleig 40
  • Future art for exposures

    Future art for exposures

    Describe the feature

    What other properties should exposures get?

    • [x] tags
    • [x] meta
      • we do want some traditional meta fields (e.g. owner) to be required or top-level
      • still could be nice as a catch-all for structured key-value properties users would want to define, beyond what'd be available via tags or description
    • [ ] new type options:
      • "reverse pipeline", e.g. a census sync
      • users supplying their own string types
    • [ ] new maturity options
      • higher than high? "mission-critical"
      • ...

    Should exposures be ref-able?

    • exposures that depend on other exposures: one exposure for each Mode query / Looker view, one exposure for the dashboard that depends on those queries / views
    • models that depend on exposures: modeled input to data science --> data science as exposure --> modeled output
      • what exactly would ref('my_exposure') return?

    Describe alternatives you've considered

    We're likely to keep these bare-bones for a little while. I'm still curious to hear what community members want!

    Who will this benefit?

    Users of the exposures resource type, which is new in v0.18.1

    exposures 
    opened by jtcohen6 34
  • [CT-346] [Bug] Snapshot never succeeds (but does all the work successfully)

    [CT-346] [Bug] Snapshot never succeeds (but does all the work successfully)

    Is there an existing issue for this?

    • [x] I have searched the existing issues

    Current Behavior

    When i run dbt snapshot my snapshot model is run, but the cli never moves on to the following task. As far as i can tell all the snapshot work is done successfully (the snapshot table is appended to etc).

    This happens both locally and on dbt cloud. Exact same behaviour.

    Expected Behavior

    I expect the cli to either successfully take me back to an empty prompt, or continue with the next command.

    Steps To Reproduce

    1. Setup a snapshot model like this:
    % snapshot snap__transaction_records_state %}
    
    {{
        config(
          description='Snapshot of transaction records state for accounting automation project. Started 2022-02-21.',
          target_database='transfer-galaxy',
          target_schema='dbt_snapshots',
          unique_key='tx_id',
          strategy='check',
          check_cols='all',
          partition_by={'field': 'tx_date_day', 'data_type': 'date'},
        )
    }}
    
    select
        tx_id,
        date(tx_time_cet) as tx_date_day,
        tx_state_enum,
        tx_state,
    from
        {{ ref('src_portal__transaction_records') }}
    
    {% endsnapshot %}
    
    1. Run "dbt snapshot'
    2. Get terminal output like:
    11:12:50  Running with dbt=1.0.3
    11:12:50  Found 205 models, 100 tests, 1 snapshot, 13 analyses, 392 macros, 0 operations, 5 seed files, 80 sources, 1 exposure, 0 metrics
    11:12:50  
    11:12:51  Concurrency: 16 threads (target='default')
    11:12:51  
    11:12:51  1 of 1 START snapshot dbt_snapshots.snap__transaction_records_state............. [RUN]
    
    

    Wait an eternity (until a timeout of some sort, in dbt cloud i think it's 13 hours), until the job is cancelled.

    Relevant log output

    Log output from dbt cloud for "dbt snapshot" here:
    
    2022-03-10 11:12:50.561223 (MainThread): 11:12:50  Running with dbt=1.0.3
    2022-03-10 11:12:50.561843 (MainThread): 11:12:50  running dbt with arguments Namespace(cls=<class 'dbt.task.snapshot.SnapshotTask'>, debug=None, defer=None, event_buffer_size=None, exclude=None, fail_fast=None, log_cache_events=False, log_format=None, partial_parse=None, printer_width=None, profile='user', profiles_dir='/tmp/jobs/47162168/.dbt', project_dir=None, record_timing_info=None, rpc_method='snapshot', select=None, selector_name=None, send_anonymous_usage_stats=None, single_threaded=False, state=None, static_parser=None, target='default', threads=None, use_colors=None, use_experimental_parser=None, vars='{}', version_check=None, warn_error=None, which='snapshot', write_json=None)
    2022-03-10 11:12:50.562060 (MainThread): 11:12:50  Tracking: tracking
    2022-03-10 11:12:50.571622 (MainThread): 11:12:50  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f49a8616a90>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f49a8616850>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f49a8616c70>]}
    2022-03-10 11:12:50.701971 (MainThread): 11:12:50  Partial parsing enabled: 0 files deleted, 0 files added, 0 files changed.
    2022-03-10 11:12:50.702302 (MainThread): 11:12:50  Partial parsing enabled, no changes found, skipping parsing
    2022-03-10 11:12:50.723178 (MainThread): 11:12:50  Sending event: {'category': 'dbt', 'action': 'load_project', 'label': '35add917-e38d-4443-a411-d0171182a210', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f4991fe4e50>]}
    2022-03-10 11:12:50.951696 (MainThread): 11:12:50  Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': '35add917-e38d-4443-a411-d0171182a210', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f49a8719e50>]}
    2022-03-10 11:12:50.952118 (MainThread): 11:12:50  Found 205 models, 100 tests, 1 snapshot, 13 analyses, 392 macros, 0 operations, 5 seed files, 80 sources, 1 exposure, 0 metrics
    2022-03-10 11:12:50.964288 (MainThread): 11:12:50  
    2022-03-10 11:12:50.964778 (MainThread): 11:12:50  Acquiring new bigquery connection "master"
    2022-03-10 11:12:50.965712 (ThreadPoolExecutor-0_0): 11:12:50  Acquiring new bigquery connection "list_transfer-galaxy"
    2022-03-10 11:12:50.966081 (ThreadPoolExecutor-0_0): 11:12:50  Opening a new connection, currently in state init
    2022-03-10 11:12:51.221515 (ThreadPoolExecutor-1_0): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_risk_scoring"
    2022-03-10 11:12:51.222081 (ThreadPoolExecutor-1_1): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_forecasting"
    2022-03-10 11:12:51.222541 (ThreadPoolExecutor-1_0): 11:12:51  Opening a new connection, currently in state closed
    2022-03-10 11:12:51.222941 (ThreadPoolExecutor-1_2): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_staging_zendesk"
    2022-03-10 11:12:51.223551 (ThreadPoolExecutor-1_3): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_snapshots"
    2022-03-10 11:12:51.223754 (ThreadPoolExecutor-1_1): 11:12:51  Opening a new connection, currently in state init
    2022-03-10 11:12:51.224255 (ThreadPoolExecutor-1_4): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_staging_airtable"
    2022-03-10 11:12:51.224911 (ThreadPoolExecutor-1_5): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_behavioural"
    2022-03-10 11:12:51.225430 (ThreadPoolExecutor-1_6): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_seeds"
    2022-03-10 11:12:51.225943 (ThreadPoolExecutor-1_2): 11:12:51  Opening a new connection, currently in state init
    2022-03-10 11:12:51.226308 (ThreadPoolExecutor-1_7): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_referral"
    2022-03-10 11:12:51.227004 (ThreadPoolExecutor-1_8): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_playground"
    2022-03-10 11:12:51.227548 (ThreadPoolExecutor-1_9): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_staging"
    2022-03-10 11:12:51.228091 (ThreadPoolExecutor-1_10): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_staging_accounting"
    2022-03-10 11:12:51.228691 (ThreadPoolExecutor-1_11): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_accounting"
    2022-03-10 11:12:51.229038 (ThreadPoolExecutor-1_3): 11:12:51  Opening a new connection, currently in state init
    2022-03-10 11:12:51.229424 (ThreadPoolExecutor-1_12): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_outgoing"
    2022-03-10 11:12:51.230062 (ThreadPoolExecutor-1_13): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_marts"
    2022-03-10 11:12:51.230479 (ThreadPoolExecutor-1_4): 11:12:51  Opening a new connection, currently in state init
    2022-03-10 11:12:51.230918 (ThreadPoolExecutor-1_14): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_events"
    2022-03-10 11:12:51.231544 (ThreadPoolExecutor-1_15): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_intermediary"
    2022-03-10 11:12:51.232549 (ThreadPoolExecutor-1_5): 11:12:51  Opening a new connection, currently in state init
    2022-03-10 11:12:51.232995 (ThreadPoolExecutor-1_6): 11:12:51  Opening a new connection, currently in state init
    2022-03-10 11:12:51.233441 (ThreadPoolExecutor-1_7): 11:12:51  Opening a new connection, currently in state init
    2022-03-10 11:12:51.233739 (ThreadPoolExecutor-1_8): 11:12:51  Opening a new connection, currently in state init
    2022-03-10 11:12:51.234178 (ThreadPoolExecutor-1_9): 11:12:51  Opening a new connection, currently in state init
    2022-03-10 11:12:51.235224 (ThreadPoolExecutor-1_10): 11:12:51  Opening a new connection, currently in state init
    2022-03-10 11:12:51.235596 (ThreadPoolExecutor-1_11): 11:12:51  Opening a new connection, currently in state init
    2022-03-10 11:12:51.236978 (ThreadPoolExecutor-1_12): 11:12:51  Opening a new connection, currently in state init
    2022-03-10 11:12:51.237314 (ThreadPoolExecutor-1_13): 11:12:51  Opening a new connection, currently in state init
    2022-03-10 11:12:51.238763 (ThreadPoolExecutor-1_14): 11:12:51  Opening a new connection, currently in state init
    2022-03-10 11:12:51.240000 (ThreadPoolExecutor-1_15): 11:12:51  Opening a new connection, currently in state init
    2022-03-10 11:12:51.509916 (ThreadPoolExecutor-1_11): 11:12:51  Acquiring new bigquery connection "list_transfer-galaxy_dbt_staging_soluno"
    2022-03-10 11:12:51.510480 (ThreadPoolExecutor-1_11): 11:12:51  Opening a new connection, currently in state closed
    2022-03-10 11:12:51.664519 (MainThread): 11:12:51  Concurrency: 16 threads (target='default')
    2022-03-10 11:12:51.664816 (MainThread): 11:12:51  
    2022-03-10 11:12:51.667539 (Thread-18): 11:12:51  Began running node snapshot.transfergalaxy.snap__transaction_records_state
    2022-03-10 11:12:51.667922 (Thread-18): 11:12:51  1 of 1 START snapshot dbt_snapshots.snap__transaction_records_state............. [RUN]
    2022-03-10 11:12:51.668325 (Thread-18): 11:12:51  Acquiring new bigquery connection "snapshot.transfergalaxy.snap__transaction_records_state"
    2022-03-10 11:12:51.668513 (Thread-18): 11:12:51  Began compiling node snapshot.transfergalaxy.snap__transaction_records_state
    2022-03-10 11:12:51.668694 (Thread-18): 11:12:51  Compiling snapshot.transfergalaxy.snap__transaction_records_state
    2022-03-10 11:12:51.673363 (Thread-18): 11:12:51  finished collecting timing info
    2022-03-10 11:12:51.673586 (Thread-18): 11:12:51  Began executing node snapshot.transfergalaxy.snap__transaction_records_state
    2022-03-10 11:12:51.692717 (Thread-18): 11:12:51  Opening a new connection, currently in state closed
    2022-03-10 11:12:51.923759 (Thread-18): 11:12:51  On snapshot.transfergalaxy.snap__transaction_records_state: /* {"app": "dbt", "dbt_version": "1.0.3", "profile_name": "user", "target_name": "default", "node_id": "snapshot.transfergalaxy.snap__transaction_records_state"} */
    
        select CURRENT_TIMESTAMP() as snapshot_start
      
    2022-03-10 11:12:52.774374 (Thread-18): 11:12:52  On snapshot.transfergalaxy.snap__transaction_records_state: /* {"app": "dbt", "dbt_version": "1.0.3", "profile_name": "user", "target_name": "default", "node_id": "snapshot.transfergalaxy.snap__transaction_records_state"} */
    select * from (
                
    
    
    
    select
        tx_id,
        date(tx_time_cet) as tx_date_day,
        tx_state_enum,
        tx_state,
    from
        `transfer-galaxy`.`dbt_staging`.`src_portal__transaction_records`
    
            ) as __dbt_sbq
            where false
            limit 0
        
    2022-03-10 11:12:53.703297 (Thread-18): 11:12:53  On snapshot.transfergalaxy.snap__transaction_records_state: /* {"app": "dbt", "dbt_version": "1.0.3", "profile_name": "user", "target_name": "default", "node_id": "snapshot.transfergalaxy.snap__transaction_records_state"} */
    select * from (
                select * from `transfer-galaxy`.`dbt_snapshots`.`snap__transaction_records_state`
            ) as __dbt_sbq
            where false
            limit 0
        
    2022-03-10 11:12:55.098683 (Thread-18): 11:12:55  On snapshot.transfergalaxy.snap__transaction_records_state: /* {"app": "dbt", "dbt_version": "1.0.3", "profile_name": "user", "target_name": "default", "node_id": "snapshot.transfergalaxy.snap__transaction_records_state"} */
    
            
    
      create or replace table `transfer-galaxy`.`dbt_snapshots`.`snap__transaction_records_state__dbt_tmp`
      partition by tx_date_day
      
      OPTIONS(
          expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 12 hour)
        )
      as (
        with snapshot_query as (
    
            
    
    
    
    select
        tx_id,
        date(tx_time_cet) as tx_date_day,
        tx_state_enum,
        tx_state,
    from
        `transfer-galaxy`.`dbt_staging`.`src_portal__transaction_records`
    
    
        ),
    
        snapshotted_data as (
    
            select *,
                tx_id as dbt_unique_key
    
            from `transfer-galaxy`.`dbt_snapshots`.`snap__transaction_records_state`
            where dbt_valid_to is null
    
        ),
    
        insertions_source_data as (
    
            select
                *,
                tx_id as dbt_unique_key,
                TIMESTAMP("2022-03-10 11:12:52.097775+00:00") as dbt_updated_at,
                TIMESTAMP("2022-03-10 11:12:52.097775+00:00") as dbt_valid_from,
                nullif(TIMESTAMP("2022-03-10 11:12:52.097775+00:00"), TIMESTAMP("2022-03-10 11:12:52.097775+00:00")) as dbt_valid_to,
                to_hex(md5(concat(coalesce(cast(tx_id as string), ''), '|',coalesce(cast(TIMESTAMP("2022-03-10 11:12:52.097775+00:00") as string), '')))) as dbt_scd_id
    
            from snapshot_query
        ),
    
        updates_source_data as (
    
            select
                *,
                tx_id as dbt_unique_key,
                TIMESTAMP("2022-03-10 11:12:52.097775+00:00") as dbt_updated_at,
                TIMESTAMP("2022-03-10 11:12:52.097775+00:00") as dbt_valid_from,
                TIMESTAMP("2022-03-10 11:12:52.097775+00:00") as dbt_valid_to
    
            from snapshot_query
        ),
    
        insertions as (
    
            select
                'insert' as dbt_change_type,
                source_data.*
    
            from insertions_source_data as source_data
            left outer join snapshotted_data on snapshotted_data.dbt_unique_key = source_data.dbt_unique_key
            where snapshotted_data.dbt_unique_key is null
               or (
                    snapshotted_data.dbt_unique_key is not null
                and (
                    (snapshotted_data.tx_id != source_data.tx_id
            or
            (
                ((snapshotted_data.tx_id is null) and not (source_data.tx_id is null))
                or
                ((not snapshotted_data.tx_id is null) and (source_data.tx_id is null))
            ) or snapshotted_data.tx_date_day != source_data.tx_date_day
            or
            (
                ((snapshotted_data.tx_date_day is null) and not (source_data.tx_date_day is null))
                or
                ((not snapshotted_data.tx_date_day is null) and (source_data.tx_date_day is null))
            ) or snapshotted_data.tx_state_enum != source_data.tx_state_enum
            or
            (
                ((snapshotted_data.tx_state_enum is null) and not (source_data.tx_state_enum is null))
                or
                ((not snapshotted_data.tx_state_enum is null) and (source_data.tx_state_enum is null))
            ) or snapshotted_data.tx_state != source_data.tx_state
            or
            (
                ((snapshotted_data.tx_state is null) and not (source_data.tx_state is null))
                or
                ((not snapshotted_data.tx_state is null) and (source_data.tx_state is null))
            ))
                )
            )
    
        ),
    
        updates as (
    
            select
                'update' as dbt_change_type,
                source_data.*,
                snapshotted_data.dbt_scd_id
    
            from updates_source_data as source_data
            join snapshotted_data on snapshotted_data.dbt_unique_key = source_data.dbt_unique_key
            where (
                (snapshotted_data.tx_id != source_data.tx_id
            or
            (
                ((snapshotted_data.tx_id is null) and not (source_data.tx_id is null))
                or
                ((not snapshotted_data.tx_id is null) and (source_data.tx_id is null))
            ) or snapshotted_data.tx_date_day != source_data.tx_date_day
            or
            (
                ((snapshotted_data.tx_date_day is null) and not (source_data.tx_date_day is null))
                or
                ((not snapshotted_data.tx_date_day is null) and (source_data.tx_date_day is null))
            ) or snapshotted_data.tx_state_enum != source_data.tx_state_enum
            or
            (
                ((snapshotted_data.tx_state_enum is null) and not (source_data.tx_state_enum is null))
                or
                ((not snapshotted_data.tx_state_enum is null) and (source_data.tx_state_enum is null))
            ) or snapshotted_data.tx_state != source_data.tx_state
            or
            (
                ((snapshotted_data.tx_state is null) and not (source_data.tx_state is null))
                or
                ((not snapshotted_data.tx_state is null) and (source_data.tx_state is null))
            ))
            )
        )
    
        select * from insertions
        union all
        select * from updates
    
      );
        
    2022-03-10 11:13:00.608568 (Thread-18): 11:13:00  BigQuery adapter: Adding columns ([]) to table `transfer-galaxy`.`dbt_snapshots`.`snap__transaction_records_state`".
    2022-03-10 11:13:01.249736 (Thread-18): 11:13:01  Writing runtime SQL for node "snapshot.transfergalaxy.snap__transaction_records_state"
    2022-03-10 11:13:01.250481 (Thread-18): 11:13:01  On snapshot.transfergalaxy.snap__transaction_records_state: /* {"app": "dbt", "dbt_version": "1.0.3", "profile_name": "user", "target_name": "default", "node_id": "snapshot.transfergalaxy.snap__transaction_records_state"} */
    
          merge into `transfer-galaxy`.`dbt_snapshots`.`snap__transaction_records_state` as DBT_INTERNAL_DEST
        using `transfer-galaxy`.`dbt_snapshots`.`snap__transaction_records_state__dbt_tmp` as DBT_INTERNAL_SOURCE
        on DBT_INTERNAL_SOURCE.dbt_scd_id = DBT_INTERNAL_DEST.dbt_scd_id
    
        when matched
         and DBT_INTERNAL_DEST.dbt_valid_to is null
         and DBT_INTERNAL_SOURCE.dbt_change_type in ('update', 'delete')
            then update
            set dbt_valid_to = DBT_INTERNAL_SOURCE.dbt_valid_to
    
        when not matched
         and DBT_INTERNAL_SOURCE.dbt_change_type = 'insert'
            then insert (`tx_id`, `tx_date_day`, `tx_state_enum`, `tx_state`, `dbt_updated_at`, `dbt_valid_from`, `dbt_valid_to`, `dbt_scd_id`)
            values (`tx_id`, `tx_date_day`, `tx_state_enum`, `tx_state`, `dbt_updated_at`, `dbt_valid_from`, `dbt_valid_to`, `dbt_scd_id`)
    
    
      
    2022-03-10 11:13:06.113359 (Thread-18): 11:13:06  On snapshot.transfergalaxy.snap__transaction_records_state: /* {"app": "dbt", "dbt_version": "1.0.3", "profile_name": "user", "target_name": "default", "node_id": "snapshot.transfergalaxy.snap__transaction_records_state"} */
    drop table if exists `transfer-galaxy`.`dbt_snapshots`.`snap__transaction_records_state__dbt_tmp`
    2022-03-10 11:13:06.726671 (Thread-18): 11:13:06  finished collecting timing info
    2022-03-10 11:13:06.727242 (Thread-18): 11:13:06  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '35add917-e38d-4443-a411-d0171182a210', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f499065b280>]}
    2022-03-10 14:11:42.039645 (MainThread): 14:11:42  The bigquery adapter does not support query cancellation. Some queries may still be running!
    2022-03-10 14:11:42.040546 (MainThread): 14:11:42  
    2022-03-10 14:11:42.040856 (MainThread): 14:11:42  Exited because of keyboard interrupt.
    2022-03-10 14:11:42.041085 (MainThread): 14:11:42  
    2022-03-10 14:11:42.041310 (MainThread): 14:11:42  Done. PASS=0 WARN=0 ERROR=0 SKIP=0 TOTAL=0
    2022-03-10 14:11:42.041545 (MainThread): 14:11:42  Connection 'master' was properly closed.
    2022-03-10 14:11:42.041723 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_risk_scoring' was properly closed.
    2022-03-10 14:11:42.041904 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_forecasting' was properly closed.
    2022-03-10 14:11:42.042059 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_staging_zendesk' was properly closed.
    2022-03-10 14:11:42.042208 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_snapshots' was properly closed.
    2022-03-10 14:11:42.042409 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_staging_airtable' was properly closed.
    2022-03-10 14:11:42.042577 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_behavioural' was properly closed.
    2022-03-10 14:11:42.042729 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_seeds' was properly closed.
    2022-03-10 14:11:42.042878 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_referral' was properly closed.
    2022-03-10 14:11:42.043024 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_playground' was properly closed.
    2022-03-10 14:11:42.043170 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_staging' was properly closed.
    2022-03-10 14:11:42.043315 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_staging_accounting' was properly closed.
    2022-03-10 14:11:42.043459 (MainThread): 14:11:42  Connection 'snapshot.transfergalaxy.snap__transaction_records_state' was properly closed.
    2022-03-10 14:11:42.043603 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_outgoing' was properly closed.
    2022-03-10 14:11:42.043746 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_marts' was properly closed.
    2022-03-10 14:11:42.043890 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_events' was properly closed.
    2022-03-10 14:11:42.044032 (MainThread): 14:11:42  Connection 'list_transfer-galaxy_dbt_intermediary' was properly closed.
    2022-03-10 14:11:42.044218 (MainThread): 14:11:42  Flushing usage events
    2022-03-10 14:11:42.065240 (MainThread):
    

    Environment

    - OS: macOS Big Sur version 11.1
    - Python: 3.7.9
    - dbt: 1.0
    

    What database are you using dbt with?

    bigquery

    Additional Context

    Snapshot model code:

    `{% snapshot snap__transaction_records_state %}

    {{ config( description='Snapshot of transaction records state for accounting automation project. Started 2022-02-21.', target_database='transfer-galaxy', target_schema='dbt_snapshots', unique_key='tx_id', strategy='check', check_cols='all', partition_by={'field': 'tx_date_day', 'data_type': 'date'}, ) }}

    select tx_id, date(tx_time_cet) as tx_date_day, tx_state_enum, tx_state, from {{ ref('src_portal__transaction_records') }}

    {% endsnapshot %} `

    bug Team:Execution 
    opened by MaxKrog 32
  • Include hard-deletes when making snapshot

    Include hard-deletes when making snapshot

    Continues the work of the closed PR #2355, which resolves #249.

    Description

    The idea is to select ids from the snapshotted table which no longer exist in the source table, and update these to set dbt_valid_to to the current timestamp. Marking them as hard deleted.

    As of now, I didn't write any integration tests yet as I'm having difficulties running them locally. Any pointers on how these work would be nice. Running integration test (postgres) locally currently fails, but I'm unsure why.

    scheduling tests via LoadScheduling
    
    test/integration/004_simple_snapshot_test/test_simple_snapshot.py::TestSimpleSnapshotFiles::test__postgres_ref_snapshot 
    test/integration/004_simple_snapshot_test/test_simple_snapshot.py::TestSimpleSnapshotFiles::test__postgres__simple_snapshot 
    test/integration/004_simple_snapshot_test/test_simple_snapshot.py::TestCustomSnapshotFiles::test__postgres__simple_custom_snapshot 
    test/integration/004_simple_snapshot_test/test_simple_snapshot.py::TestSimpleColumnSnapshotFiles::test_postgres_renamed_source 
    [gw1] PASSED test/integration/004_simple_snapshot_test/test_simple_snapshot.py::TestSimpleSnapshotFiles::test__postgres_ref_snapshot 
    test/integration/004_simple_snapshot_test/test_simple_snapshot.py::TestNamespacedCustomSnapshotFiles::test__postgres__simple_custom_snapshot_namespaced 
    [gw2] FAILED test/integration/004_simple_snapshot_test/test_simple_snapshot.py::TestSimpleColumnSnapshotFiles::test_postgres_renamed_source 
    
    ============================================================================================ FAILURES =============================================================================================
    ___________________________________________________________________ TestSimpleColumnSnapshotFiles.test_postgres_renamed_source ____________________________________________________________________
    [gw2] linux -- Python 3.6.9 /usr/app/.tox/integration-postgres-py36/bin/python
    
    self = <test_simple_snapshot.TestSimpleColumnSnapshotFiles testMethod=test_postgres_renamed_source>
    
        @use_profile('postgres')
        def test_postgres_renamed_source(self):
    >       self._run_snapshot_test()
    
    test/integration/004_simple_snapshot_test/test_simple_snapshot.py:158: 
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    test/integration/004_simple_snapshot_test/test_simple_snapshot.py:135: in _run_snapshot_test
        self.run_dbt(['snapshot', '--vars', '{seed_name: seed_newcol}'])
    test/integration/base.py:578: in run_dbt
        "dbt exit state did not match expected")
    E   AssertionError: False != True : dbt exit state did not match expected
    -------------------------------------------------------------------------------------- Captured logbook call --------------------------------------------------------------------------------------
    [DEBUG] dbt: Acquiring new postgres connection "__test".
    [DEBUG] dbt: Acquiring new postgres connection "__test".
    [DEBUG] dbt: test connection "__test" executing: DROP SCHEMA IF EXISTS "test15998113637441561482_simple_snapshot_004" CASCADE
    [DEBUG] dbt: Opening a new connection, currently in state init
    [DEBUG] dbt: On __test: Close
    [DEBUG] dbt: Acquiring new postgres connection "__test".
    [DEBUG] dbt: Acquiring new postgres connection "__test".
    [DEBUG] dbt: test connection "__test" executing: CREATE SCHEMA "test15998113637441561482_simple_snapshot_004"
    [DEBUG] dbt: Opening a new connection, currently in state closed
    [DEBUG] dbt: On __test: Close
    [INFO] dbt: Invoking dbt with ['--strict', '--test-new-parser', 'seed', '--profiles-dir', '/tmp/dbt-int-test-1dn56bze', '--log-cache-events']
    [INFO] dbt: Invoking dbt with ['--strict', '--test-new-parser', 'snapshot', '--profiles-dir', '/tmp/dbt-int-test-1dn56bze', '--log-cache-events']
    [DEBUG] dbt: Acquiring new postgres connection "__test".
    [DEBUG] dbt: test connection "__test" executing: select * from dbt.test15998113637441561482_simple_snapshot_004.my_snapshot
    [DEBUG] dbt: Opening a new connection, currently in state init
    [DEBUG] dbt: On __test: Close
    [INFO] dbt: Invoking dbt with ['--strict', '--test-new-parser', 'snapshot', '--vars', '{seed_name: seed_newcol}', '--profiles-dir', '/tmp/dbt-int-test-1dn56bze', '--log-cache-events']
    [DEBUG] dbt: Acquiring new postgres connection "__test".
    [DEBUG] dbt: Acquiring new postgres connection "__test".
    [DEBUG] dbt: test connection "__test" executing: DROP SCHEMA IF EXISTS "test15998113637441561482_simple_snapshot_004" CASCADE
    [DEBUG] dbt: Opening a new connection, currently in state closed
    [DEBUG] dbt: On __test: Close
    [DEBUG] dbt: Connection '__test' was properly closed.
    ===================================================================================== slowest test durations ======================================================================================
    9.03s call     test/integration/004_simple_snapshot_test/test_simple_snapshot.py::TestSimpleColumnSnapshotFiles::test_postgres_renamed_source
    6.44s call     test/integration/004_simple_snapshot_test/test_simple_snapshot.py::TestSimpleSnapshotFiles::test__postgres_ref_snapshot
    
    (0.00 durations hidden.  Use -vv to show these durations.)
    ===================================================================================== short test summary info =====================================================================================
    FAILED test/integration/004_simple_snapshot_test/test_simple_snapshot.py::TestSimpleColumnSnapshotFiles::test_postgres_renamed_source - AssertionError: False != True : dbt exit state did not m...
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! xdist.dsession.Interrupted: stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    ============================================================================= 1 failed, 1 passed in 60.78s (0:01:00) ==============================================================================
    ERROR: InvocationError for command /bin/bash -c '/usr/app/.tox/integration-postgres-py36/bin/python -m pytest --durations 0 -v -m profile_postgres -s -x -m profile_postgres test/integration/004_simple_snapshot_test/test_simple_snapshot.py -n4 test/integration/*' (exited with code 2)
    _____________________________________________________________________________________________ summary _____________________________________________________________________________________________
    ERROR:   integration-postgres-py36: commands failed
    

    Checklist

    • [x] I have signed the CLA
    • [x] I have run this code in development and it appears to resolve the stated issue
    • [x] This PR includes tests, or tests are not required/relevant for this PR
    • [x] I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.
    cla:yes 
    opened by joelluijmes 27
  • Propagate docs to the database

    Propagate docs to the database

    Feature

    We should propagate docs to BQ table and column descriptions.

    Feature description

    Cool that dbt generates a web page and docs for your project. Unfortunately, it's a bit removed from where our org is used to looking for documentation. We use BQ extensively and rely on the table and column descriptions provided in the BQ UI.

    Who will this benefit?

    Anyone using BQ would benefit.

    enhancement 
    opened by jakebiesinger 26
  • Pass schema in credentials to Postgresql

    Pass schema in credentials to Postgresql

    There is a property called schema in Postgresql credentials but it looks like DBT doesn't actually use it probably because Psycopg2 doesn't provide a native way to pass schema to Postgresql.

    It's a bit tricky but there's actually a way to do that via options of libpq. I believe that we should support this feature since we have the schema parameter in Postgresql configuration.

    opened by buremba 25
  • Wider google-cloud dependencies

    Wider google-cloud dependencies

    resolves #2794

    Description

    As discussed in the issue. I can narrow the range of these as per that issue; figured I'd ensure tests pass before fixing the versions.

    Checklist

    • [ ] I have signed the CLA
    • [ ] I have run this code in development and it appears to resolve the stated issue
    • [x] This PR includes tests, or tests are not required/relevant for this PR
    • [ ] I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.
    cla:yes 
    opened by max-sixty 24
  • Indefinite Snowflake auth loop  when using browser authentication

    Indefinite Snowflake auth loop when using browser authentication

    Describe the bug

    We encounter indefinite auth loop while running dbt with browser auth enabled

    Steps To Reproduce

    Install dbt 0.17.2 and run anything on Snowflake

    Expected behavior

    Expected two auth requests in case of MFA and dbt proceeding to transform. (non-MFA logins fail into the loop too)

    Screenshots and log output

    (dbt-transformations) <wiped out>@<wiped out> dbt-transformations % dbt run --models <wiped out>
    Running with dbt=0.17.2
    Found 129 models, 135 tests, 3 snapshots, 0 analyses, 140 macros, 0 operations, 1 seed file, 53 sources
    
    Initiating login request with your identity provider. A browser window should have opened for you to complete the login. If you can't see it, check existing browser windows, or your OS settings. Press CTRL+C to abort and try again...
    Initiating login request with your identity provider. A browser window should have opened for you to complete the login. If you can't see it, check existing browser windows, or your OS settings. Press CTRL+C to abort and try again...
    Initiating login request with your identity provider. A browser window should have opened for you to complete the login. If you can't see it, check existing browser windows, or your OS settings. Press CTRL+C to abort and try again...
    Initiating login request with your identity provider. A browser window should have opened for you to complete the login. If you can't see it, check existing browser windows, or your OS settings. Press CTRL+C to abort and try again...
    Initiating login request with your identity provider. A browser window should have opened for you to complete the login. If you can't see it, check existing browser windows, or your OS settings. Press CTRL+C to abort and try again...
    ^C^C^Cctrl-c
    
    

    System information

    Which database are you using dbt with?

    • [ ] postgres
    • [ ] redshift
    • [ ] bigquery
    • [x] snowflake
    • [ ] other (specify: ____________)

    The output of dbt --version:

    installed version: 0.17.2
       latest version: 0.17.2
    
    Up to date!
    
    Plugins:
      - bigquery: 0.17.2
      - snowflake: 0.17.2
      - redshift: 0.17.2
      - postgres: 0.17.2
    

    The operating system you're using: MacOS 10.15.5 (19F101)

    The output of python --version: Python 3.7.7

    Additional context

    reverting dbt to 0.17.1 fixed the issue

    bug 
    opened by eugene-nikolaev 24
  • [CT-1758] [Bug] dbt  process is never reaped,occur a large number of zombie processes.

    [CT-1758] [Bug] dbt process is never reaped,occur a large number of zombie processes.

    Is this a new bug in dbt-core?

    • [X] I believe this is a new bug in dbt-core
    • [X] I have searched the existing issues, and I could not find an existing issue for this bug

    Current Behavior

    Every time I execute the dbt project, a python zombie process will be added.

    image

    I also found that other people used the dbt project to have this problem(https://github.com/python/cpython/issues/88887). It may be a python problem, but I still want to confirm that dbt is also caused by this? Is there any good solution?

    and i try to add threads=1, but It still appears zombie process.

    Expected Behavior

    occur a large number of zombie processes.

    Steps To Reproduce

    run dbt project by golang cmd.Command(). image

    Relevant log output

    No response

    Environment

    - OS: CentOS Linux release 7.9.2009
    - Python: 3.9.13
    - dbt:1.3.0
    

    Which database adapter are you using with dbt?

    postgres

    Additional Context

    No response

    bug triage 
    opened by abeizn 0
  • [DO NOT MERGE] test cl/merge-main-click-cli branch

    [DO NOT MERGE] test cl/merge-main-click-cli branch

    Opening this just for purposes of sanity testing the merge main to click branch.

    Confirming that unit and integration tests (including new ones we're bringing in from main) would pass without major/unpredicted modifications to the feature/click-cli branch using the old CLI. Explanations of any changes necessary to get tests passing in the comments.

    Note: mypy/flake8 errors are a result of some quick and dirty fixes here, nothing concerning.

    cla:yes 
    opened by MichelleArk 1
  • [CT-1753] [Bug] Allow using agate 1.7.1 due to serious vulnerability in future

    [CT-1753] [Bug] Allow using agate 1.7.1 due to serious vulnerability in future

    Is this a new bug in dbt-core?

    • [X] I believe this is a new bug in dbt-core
    • [X] I have searched the existing issues, and I could not find an existing issue for this bug

    Current Behavior

    It has been recently found that the future package has a serious vulnerability and can lead to a denial of service (see details here https://nvd.nist.gov/vuln/detail/CVE-2022-40899). The current version of dbt-core is indirectly depended on it. The current version of dbt-core specifies the dependency on the agate package to be constrained: "agate>=1.6,<1.7.1" but agate <1.7.1 depended on the version parsedatetime that in its turn depended on future. The future repository has been dead since 2019 (latest release) and parsedatetime dropped the dependency on it: https://github.com/bear/parsedatetime/releases/tag/v2.5. agate reacted to that and also updated its version to 1.7.1 to use the latest version of parsedatetime without the future dependency: https://github.com/wireservice/agate/commit/52198daae198389649a44deb0ec2d1d41f6720c1.

    Please, update your dependency on agate to allow version 1.7.1 and by doing so get rid of the dependency on future.

    Expected Behavior

    The dependency on agate is updated to allow version 1.7.1.

    Steps To Reproduce

    .

    Relevant log output

    No response

    Environment

    - OS:
    - Python:
    - dbt:
    

    Which database adapter are you using with dbt?

    No response

    Additional Context

    No response

    bug dependencies security support_rotation 
    opened by piankris 2
  • [CT-1752] [Bug] Small typos related to the interactive profile set up

    [CT-1752] [Bug] Small typos related to the interactive profile set up

    Is this a new bug in dbt-core?

    • [X] I believe this is a new bug in dbt-core
    • [X] I have searched the existing issues, and I could not find an existing issue for this bug

    Current Behavior

    I believe interative is a typo of interactive

    https://github.com/dbt-labs/dbt-core/blob/54538409509e0d677876f1102aa3bc0c67007278/core/dbt/cli/params.py#L272-L274 https://github.com/dbt-labs/dbt-core/blob/60f80056b1b09b35a22ac91c93319827bdac0e0e/core/dbt/main.py#L348-L356 https://github.com/dbt-labs/dbt-core/blob/b3440417ad1e38a40c488c3632a63f5d2e0c88f4/core/dbt/docs/build/html/index.html#L321-L325

    Expected Behavior

    Fix typos:

    • interative to interactive

    Steps To Reproduce

    1. Run dbt init --h

    Relevant log output

    N/A
    

    Environment

    - OS:Ubuntu 22.04.1
    - Python: 3.10.6
    - dbt: 1.1.3
    

    Which database adapter are you using with dbt?

    No response

    Additional Context

    N/A

    bug good_first_issue 
    opened by nshuman1 0
  • [CT-1751] Config to optionally skip population of relation cache

    [CT-1751] Config to optionally skip population of relation cache

    Users should have the ability to turn off relation cache population, if they really need to. It should still be "on" by default.

    $ dbt --no-populate-cache [run|test|...]
    $ DBT_POPULATE_CACHE=0|1 dbt [run|test|...]
    
    # profiles.yml (UserConfig)
    config:
      populate_cache: true|false
    

    This is different from entirely disabling or skipping over the cache — we're just skipping the population of the cache on startup. When dbt needs to run caching queries, I think it should still report "cache miss," and then cache the result of the metadata query, if it needs to be used again in the same invocation.

    Where

    https://github.com/dbt-labs/dbt-core/blob/54538409509e0d677876f1102aa3bc0c67007278/core/dbt/task/runnable.py#L407-L421

    Who is it for?

    • Users in large projects, or of data warehouses that are quite slow at running metadata queries.
    • Users running into inexplicable issues with specific relations not showing up in the relation cache (https://github.com/dbt-labs/dbt-core/issues/6050)
    • Interactive compile & preview (#6358, #6359), which need to be blazing-fast

    YMMV

    End users will need to experiment with the approach that's most efficient for them, between:

    • full cache enabled
    • "cache selected only" (docs)
    • --no-populate-cache

    I expect mileage may vary between dev, CI, and prod environments.

    Questions

    Would this break behavior around --defer, which expects to use the relation cache to determine if model X already exists in the dev schema, or should have its reference rewritten to use the schema defined in the other manifest?

    https://github.com/dbt-labs/dbt-core/blob/54538409509e0d677876f1102aa3bc0c67007278/core/dbt/contracts/graph/manifest.py#L1018

    Imagining a future where interactive compile/preview want to be both very fast, and able to correctly leverage --defer: We should also think more about making the adapter cache pluggable, as something that can live & persist outside of a single dbt-core invocation. It would be the responsibility of that other application wrapping dbt-core to handle cache invalidation (fun!).

    enhancement Team:Execution Team:Adapters adapter_caching 
    opened by jtcohen6 0
  • Update agate requirement from <1.7.1,>=1.6 to >=1.6,<1.7.2 in /core

    Update agate requirement from <1.7.1,>=1.6 to >=1.6,<1.7.2 in /core

    Updates the requirements on agate to permit the latest version.

    Changelog

    Sourced from agate's changelog.

    1.7.1 - Jan 4, 2023

    • Allow parsedatetime 2.6.

    1.7.0 - Jan 3, 2023

    • Add Python 3.11 support.
    • Add Python 3.10 support.
    • Drop Python 3.6 support (end-of-life was December 23, 2021).
    • Drop Python 2.7 support (end-of-life was January 1, 2020).

    1.6.3 - July 15, 2021

    • feat: :meth:.Table.from_csv accepts a row_limit keyword argument. (#740)
    • feat: :meth:.Table.from_json accepts an encoding keyword argument. (#734)
    • feat: :meth:.Table.print_html accepts a max_precision keyword argument, like :meth:.Table.print_table. (#753)
    • feat: :class:.TypeTester accepts a null_values keyword argument, like individual data types. (#745)
    • feat: :class:.Min, :class:.Max and :class:.Sum (#735) work with :class:.TimeDelta.
    • feat: :class:.FieldSizeLimitError includes the line number in the error message. (#681)
    • feat: :class:.csv.Sniffer warns on error while sniffing CSV dialect.
    • fix: :meth:.Table.normalize works with basic processing methods. (#691)
    • fix: :meth:.Table.homogenize works with basic processing methods. (#756)
    • fix: :meth:.Table.homogenize casts compare_values and default_row. (#700)
    • fix: :meth:.Table.homogenize accepts tuples. (#710)
    • fix: :meth:.TableSet.group_by accepts input with no rows. (#703)
    • fix: :class:.TypeTester warns if a column specified by the force argument is not in the table, instead of raising an error. (#747)
    • fix: Aggregations return None if all values are None, instead of raising an error. Note that Sum, MaxLength and MaxPrecision continue to return 0 if all values are None. (#706)
    • fix: Ensure files are closed when errors occur. (#734)
    • build: Make PyICU an optional dependency.
    • Drop Python 3.5 support (end-of-life was September 13, 2020).
    • Drop Python 3.4 support (end-of-life was March 18, 2019).

    1.6.2 - March 10, 2021

    • feat: :meth:.Date.__init__ and :meth:.DateTime.__init__ accepts a locale keyword argument (e.g. :code:en_US) for parsing formatted dates. (#730)
    • feat: :meth:.Number.cast casts True to 1 and False to 0. (#733)
    • fix: :meth:.utils.max_precision ignores infinity when calculating precision. (#726)
    • fix: :meth:.Date.cast catches OverflowError when type testing. (#720)
    • Included examples in Python package. (#716)

    1.6.1 - March 11, 2018

    • feat: :meth:.Table.to_json can use Decimal as keys. (#696)
    • fix: :meth:.Date.cast and :meth:.DateTime.cast no longer parse non-date strings that contain date sub-strings as dates. (#705)
    • docs: Link to tutorial now uses version through Sphinx to avoid bad links on future releases. (#682)

    ... (truncated)

    Commits
    • 52198da build: Allow parsedatetime 2.6
    • cd63792 docs: Update changelog
    • 32cce43 docs: Update badges
    • a1e2aaf docs: Update changelog
    • 4d5267e ci: No duplicate builds
    • ec184b7 Merge pull request #772 from wireservice/python311
    • a82b43c chore: Fix misconversion from c83d726d535dcec08b0c5b1301b8671068c25b12
    • 7945f1a chore: Simplify super() calls and elif/else after returns
    • 9abb426 chore: Remove use of u''
    • c83d726 chore: Remove Python 2 code
    • Additional commits viewable in compare view

    You can trigger a rebase of this PR by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies cla:yes python 
    opened by dependabot[bot] 3
Releases(v1.2.4)
Owner
dbt Labs
dbt helps data teams work like software engineers—to ship trusted data, faster.
dbt Labs
MoRecon - A tool for reconstructing missing frames in motion capture data.

MoRecon - A tool for reconstructing missing frames in motion capture data.

Yuki Nishidate 38 Dec 03, 2022
PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster (by @firmai)

PandaPy "I came across PandaPy last week and have already used it in my current project. It is a fascinating Python library with a lot of potential to

Derek Snow 527 Jan 02, 2023
MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV2020]

MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV2020] by Kaisiyuan Wang, Qianyi Wu, Linsen Song, Zhuoqian Yang, Wa

112 Dec 28, 2022
PySpark Structured Streaming ROS Kafka ApacheSpark Cassandra

PySpark-Structured-Streaming-ROS-Kafka-ApacheSpark-Cassandra The purpose of this project is to demonstrate a structured streaming pipeline with Apache

Zekeriyya Demirci 5 Nov 13, 2022
Business Intelligence (BI) in Python, OLAP

Open Mining Business Intelligence (BI) Application Server written in Python Requirements Python 2.7 (Backend) Lua 5.2 or LuaJIT 5.1 (OML backend) Mong

Open Mining 1.2k Dec 27, 2022
Hue Editor: Open source SQL Query Assistant for Databases/Warehouses

Hue Editor: Open source SQL Query Assistant for Databases/Warehouses

Cloudera 759 Jan 07, 2023
Data analysis and visualisation projects from a range of individual projects and applications

Python-Data-Analysis-and-Visualisation-Projects Data analysis and visualisation projects from a range of individual projects and applications. Python

Tom Ritman-Meer 1 Jan 25, 2022
Projeto para realizar o RPA Challenge . Utilizando Python e as bibliotecas Selenium e Pandas.

RPA Challenge in Python Projeto para realizar o RPA Challenge (www.rpachallenge.com), utilizando Python. O objetivo deste desafio é criar um fluxo de

Henrique A. Lourenço 1 Apr 12, 2022
A set of tools to analyse the output from TraDIS analyses

QuaTradis (Quadram TraDis) A set of tools to analyse the output from TraDIS analyses Contents Introduction Installation Required dependencies Bioconda

Quadram Institute Bioscience 2 Feb 16, 2022
Python script for transferring data between three drives in two separate stages

Waterlock Waterlock is a Python script meant for incrementally transferring data between three folder locations in two separate stages. It performs ha

David Swanlund 13 Nov 10, 2021
Ejercicios Panda usando Pandas

Readme Below we add configuration details to locally test your application To co

1 Jan 22, 2022
DaDRA (day-druh) is a Python library for Data-Driven Reachability Analysis.

DaDRA (day-druh) is a Python library for Data-Driven Reachability Analysis. The main goal of the package is to accelerate the process of computing estimates of forward reachable sets for nonlinear dy

2 Nov 08, 2021
simple way to build the declarative and destributed data pipelines with python

unipipeline simple way to build the declarative and distributed data pipelines. Why you should use it Declarative strict config Scaffolding Fully type

aliaksandr-master 0 Jan 26, 2022
Pipetools enables function composition similar to using Unix pipes.

Pipetools Complete documentation pipetools enables function composition similar to using Unix pipes. It allows forward-composition and piping of arbit

186 Dec 29, 2022
:truck: Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

To launch a live notebook server to test optimus using binder or Colab, click on one of the following badges: Optimus is the missing framework to prof

Iron 1.3k Dec 30, 2022
Pandas and Spark DataFrame comparison for humans

DataComPy DataComPy is a package to compare two Pandas DataFrames. Originally started to be something of a replacement for SAS's PROC COMPARE for Pand

Capital One 259 Dec 24, 2022
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code

Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather

Tuplex 791 Jan 04, 2023
Functional tensors for probabilistic programming

Funsor Funsor is a tensor-like library for functions and distributions. See Functional tensors for probabilistic programming for a system description.

208 Dec 29, 2022
Exploratory Data Analysis for Employee Retention Dataset

Exploratory Data Analysis for Employee Retention Dataset Employee turn-over is a very costly problem for companies. The cost of replacing an employee

kana sudheer reddy 2 Oct 01, 2021
Vectorizers for a range of different data types

Vectorizers for a range of different data types

Tutte Institute for Mathematics and Computing 69 Dec 29, 2022