Dagster & dbt (Component)
The dagster-dbt library provides a DbtProjectComponent which can be used to easily represent dbt models as assets in Dagster. Dagster assets understand dbt at the level of individual dbt models. This means that you can:
- Use Dagster's UI or APIs to run subsets of your dbt models, seeds, and snapshots.
- Track failures, logs, and run history for individual dbt models, seeds, and snapshots.
- Define dependencies between individual dbt models and other data assets. For example, put dbt models after the Fivetran-ingested table that they read from, or put a machine learning after the dbt models that it's trained from.
DbtProjectComponent is a state-backed component, which compiles and caches your dbt project's manifest. For information on managing component state, see Configuring state-backed components.
Dagster supports dbt Fusion as of the 1.11.5 release. Dagster will automatically detect which engine you have installed. If you're currently using core, to migrate uninstall dbt-core and install dbt Fusion. For more information please reference the dbt docs.
This feature is still in preview pending dbt Fusion GA.
1. Prepare a Dagster project
To begin, you'll need a Dagster project. You can use an existing components-ready project or create a new one:
create-dagster project my-project && cd my-project
Activate the project virtual environment:
source .venv/bin/activate
Then, add the dagster-dbt library to the project, along with a duckdb adapter:
- uv
- pip
uv add dagster-dbt dbt-duckdb
pip install dagster-dbt dbt-duckdb
2. Set up a dbt project
For this tutorial, we'll use the jaffle shop dbt project as an example. Clone it into your project:
git clone --depth=1 https://github.com/dbt-labs/jaffle_shop.git dbt && rm -rf dbt/.git
We will create a profiles.yml file in the dbt directory to configure the project to use DuckDB:
jaffle_shop:
target: dev
outputs:
dev:
type: duckdb
path: tutorial.duckdb
threads: 24
3. Scaffold a dbt component definition
Now that you have a Dagster project with a dbt project, you can scaffold a dbt component definition. You'll need to provide the path to your dbt project:
dg scaffold defs dagster_dbt.DbtProjectComponent dbt_ingest \
--project-path "dbt"
Creating defs at /.../my-project/src/my_project/defs/dbt_ingest.
The dg scaffold defs call will generate a defs.yaml file in your project structure:
tree src/my_project
src/my_project
├── __init__.py
├── definitions.py
└── defs
├── __init__.py
└── dbt_ingest
└── defs.yaml
3 directories, 4 files
In its scaffolded form, the defs.yaml file contains the configuration for your dbt project:
type: dagster_dbt.DbtProjectComponent
attributes:
project: '{{ project_root }}/dbt'
This is sufficient to load your dbt models as assets. You can use dg list defs to see the asset representation:
dg list defs
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Section ┃ Definitions ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Assets │ ┏━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ │
│ │ ┃ Key ┃ Group ┃ Deps ┃ Kinds ┃ Description ┃ │
│ │ ┡━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━ ━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │
│ │ │ customers │ default │ stg_customers │ dbt │ This table has basic information about a │ │
│ │ │ │ │ stg_orders │ duckdb │ customer, as well as some derived facts based │ │
│ │ │ │ │ stg_payments │ │ on a custome… │ │
│ │ ├───────────────┼─────────┼───────────────┼────────┼────────────────────────────────────────────────┤ │
│ │ │ orders │ default │ stg_orders │ dbt │ This table has basic information about orders, │ │
│ │ │ │ │ stg_payments │ duckdb │ as well as some derived facts based on │ │
│ │ │ │ │ │ │ payments │ │
│ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ #### … │ │
│ │ ├───────────────┼─────────┼───────────────┼────────┼────────────────────────────────────────────────┤ │
│ │ │ raw_customers │ default │ │ dbt │ dbt seed raw_customers │ │
│ │ │ │ │ │ duckdb │ │ │
│ │ │ │ │ │ │ #### Raw SQL: │ │
│ │ │ │ │ │ │ ```sql │ │
│ │ │ │ │ │ │ │ │