https://dagster.io/ logo
#integration-dbt
Title
# integration-dbt
p

Patricia Ayuso

06/13/2023, 10:16 AM
Hello all. Happy Tuesday! We’re new to Dagster, and working with dbt from a few weeks back. First, we simply loaded our assets. Then, we realised we needed them as partitions, so we have:
Copy code
my_partitions_def = DailyPartitionsDefinition(start_date="2022-02-02")

def partition_key_to_dbt_vars(partition_key):
  return {"p_load_datetime": partition_key}

# Load dbt models as assets
dbt_uk_assets = load_assets_from_dbt_project(
  project_dir = DBT_PROJECT_PATH,
  profiles_dir = DBT_PROFILES,
  use_build_command = True,
  select = 'tag:uk',
  exclude = 'tag:wip',
  partitions_def = my_partitions_def,
  partition_key_to_vars_fn = partition_key_to_dbt_vars,
  key_prefix = "uk"
)

dbt_uk_companies_assets = load_assets_from_dbt_project(
  project_dir = DBT_PROJECT_PATH,
  profiles_dir = DBT_PROFILES,
  use_build_command = True,
  partitions_def = my_partitions_def,
  partition_key_to_vars_fn = partition_key_to_dbt_vars,
  select = 'uk_d_companies',
  key_prefix = "uk_companies"
)
Both groups materialize daily though,
uk_d_companies
should materialize after
uk
(they have a dbt dependency) and it runs at the same time. So, I’ve been researching how to do it, and it seems that there are multiple ways we can do it, so I focussed on: Add a freshness policy to my dbt configuration
uk_d_companies.sql
Copy code
{{
  config(
    materialized = 'incremental',
    transient = false,
    incremental_strategy = 'delete+insert',
    on_schema_change = 'sync_all_columns',
    unique_key = ['COMPANY_SK', 'DBT_VALID_FROM'],
    dagster_freshness_policy = { "maximum_lag_minutes": 15 }
  )
}}
+ a reconciliation sensor in dagster
Copy code
uk_companies_sensor = build_asset_reconciliation_sensor(
  AssetSelection.key_prefixes("uk_companies"),
  name = "uk_companies_sensor"
)
OR + a auto materialize policy in dagster when loading the uk_companies asset
Copy code
dbt_uk_companies_assets = load_assets_from_dbt_project(
  project_dir = DBT_PROJECT_PATH,
  profiles_dir = DBT_PROFILES,
  use_build_command = True,
  partitions_def = my_partitions_def,
  partition_key_to_vars_fn = partition_key_to_dbt_vars,
  select = 'uk_d_companies',
  key_prefix = "uk_companies", 
  node_info_to_auto_materialize_policy_fn = lambda _: AutoMaterializePolicy.lazy()
)
But none of the strategies seems to do what I need. They materialize
uk_companies
when I first activate them, but not - 15 minutes after - the daily partition (
uk
) materialization finishes. Is this the way to go and I’m missing something? Or should I take another route?
b

Brendan Jackson

06/13/2023, 1:12 PM
When you say they have a dbt dependency, do they also have a dagster dependency?
p

Patricia Ayuso

06/13/2023, 1:15 PM
yes, we can see the dependancy in the Global Asset Lineage
b

Brendan Jackson

06/13/2023, 1:54 PM
You could create a dbt job that materialises those two assets, and a schedule that runs that job daily.
But I don't understand why they materialise at the same time, in this case.
p

Patricia Ayuso

06/13/2023, 1:55 PM
maybe a dumb question but, do they’ll run in order?
b

Brendan Jackson

06/13/2023, 1:57 PM
I guess I'm confused. You said:
Both groups materialize daily though,
uk_d_companies
should materialize after
uk
(they have a dbt dependency) and it runs at the same time.
What makes them materialise at all, right now? (Before the policies and sensors were added)
p

Patricia Ayuso

06/13/2023, 2:00 PM
This schedule for
uk
Copy code
uk_models = build_schedule_from_partitioned_job(
    name = "uk_models"
    , job = uk
    , hour_of_day = 2
    , minute_of_hour = 00
)
b

Brendan Jackson

06/13/2023, 2:00 PM
Got it. And what's the
uk
job?
p

Patricia Ayuso

06/13/2023, 2:00 PM
yep
b

Brendan Jackson

06/13/2023, 2:00 PM
If the job is to materialise both of those assets, it should materialise them in order of their dependencies.
p

Patricia Ayuso

06/13/2023, 2:01 PM
makes sense, I’ll give it a try. Thanks a lot 🙂
b

Brendan Jackson

06/13/2023, 2:01 PM
Good luck!
fistbump 1
p

Patricia Ayuso

06/13/2023, 4:27 PM
Quick update. Sadly the several assets job didn’t do the trick, it materialize both pipelines at the same time. Though, because both are daily partitions, I loaded them together and it materialize them in sequence. Here is the code: ASSETS
Copy code
dbt_uk_assets = load_assets_from_dbt_project(
    project_dir = DBT_PROJECT_PATH,
    profiles_dir = DBT_PROFILES,
    use_build_command = True,
    select = 'tag:uk uk_d_companies',
    exclude = 'tag:wip',
    partitions_def = daily_partition_def,
    partition_key_to_vars_fn = partition_key_to_dbt_vars,
    key_prefix = "uk_daily"
)
JOBS
Copy code
uk_daily = define_asset_job( 
    "uk_daily", 
    selection = AssetSelection.key_prefixes( "uk_daily" ) 
)
SCHEDULES
Copy code
uk_daily = build_schedule_from_partitioned_job(
    name = "uk_daily"
    , job = uk_daily
    , hour_of_day = 4
    , minute_of_hour = 00
)
Let’s see how it goes with the weekly and monthly partitions… 🫣
👍 1
b

Brendan Jackson

06/13/2023, 4:29 PM
Yes, you got it - one job for both assets.
thank you box 1
4 Views