Is the dagster dbt partition feature available if I load my dagster #integration-dbt

Is the dagster_dbt partition feature available if ...

Jacob Marcil

08/16/2023, 8:34 PM

Is the dagster_dbt partition feature available if I load my assets using

load_assets_from_dbt_project

? ++ Is there any documentation for how to use the

dbt_asset

decorator? My goal is to start playing around with how do partition works for

dbt assets

with my

incremental models

🤖 1

Pedram Navid

08/17/2023, 3:07 AM

Hey Jacob -- there's an example here that might help https://github.com/dagster-io/hooli-data-eng-pipelines/blob/bc05e2d2bd5ec62b2aec25dc78728361c2cfb1a9/hooli_data_eng/assets/dbt_assets.py#L107 I am working on more documentation on it this week and hope to have something out by Friday, so feel free to ask any specific questions and I'll try my best to help

rex

08/17/2023, 1:01 PM

We added examples on how to use the

@dbt_assets

decorator in our most recent edition of the API docs: https://docs.dagster.io/master/_apidocs/libraries/dagster-dbt#dagster_dbt.dbt_assets The partition feature is available with

load_assets_from_dbt_project

, but its partition support will soon be deprecated. We recommend that you use

@dbt_assets

❤️ 1

Jacob Marcil

08/17/2023, 1:15 PM

Thank you for you responses. I have a couple of questions if I may. From what I understand there’s 2 ways of loading a DBT Project in Dagster. 1. Using

load_assets_from_dbt_project

2. Using

load_assets_from_dbt_manifest

Then if we want to do something more with the assets, like configure partitions on top of them, we need to use the

@dbt_assets

decorator. Since the decorator only takes a

manifest

as it’s parameter,

load_assets_from_dbt_project

becomes obsolete right? Right now, my DBT code is stored in a different repo as my Dagster code, and we plan on keeping it this way. We are using the method describe in this discussion thread in order to load the dbt repo inside Dagster whenever we have a new version. That method copy the repo into another folder located next to dagster and I reference that folder in my configuration in order to load the assets with

load_assets_from_dbt_project

. If

@dbt_assets

requires a

manifest

though, I can’t use that method and I need to generate the

manifest

myself in CI/CD or do Dagster generate that file somewhere when it load the assets with

load_assets_from_dbt_project

and I can reuse that file somehow? TL;DR; I don’t have an update version of the

manifest

file deployed to my repo. How can I use

@dbt_assets

decorator.

rex

08/17/2023, 1:23 PM

@dbt_assets

is a replacement for both

load_assets_from_dbt_project

and

load_assets_from_dbt_manifest

. If you want to generate the manifest, you can either: 1. Create it in CI/CD as you mentioned, 2. Or, create it at runtime: https://dagster.slack.com/archives/C04CW71AGBW/p1690395671541379?thread_ts=1690388053.337689&cid=C04CW71AGBW We recommend that you do (1) because creating the manifest at runtime incurs a lot of latency. But (2) is still an option, and essentially replicates your workflow from

load_assets_from_dbt_project

👀 1

Jacob Marcil

08/18/2023, 1:33 PM

Hey @Pedram Navid , Since you give me the opportunity to ask questions I will 🙂 I’ve configured

dagster_dbt

with the

@dbt_asset

decorator, my models are showing and I can run them, so that part is fine. Now one of the problem I have is with the dbt

source

. I have a source table define in my

sources.yml

file that look like this

Copy code

version: 2

sources:
  - name: company_x
    database: dagster_db
    schema: company_x
    tables:
      - name: company_x_hourly_token_price
        meta:
          dagster:
            asset_key: ["company_x_hourly_token_price"]

company_x_hourly_token_price

is a table in Snowflake and it’s created by

dagster

. The problem I have is that

dagster

tries to read the table from the

source

dagster_db

define in that

sources.yml

file. Which would make sense in production, but doesn’t in any other environments. My understanding was that

dagster

would ignore the

sources.yml

information and find the asset with the same

asset_key

. And if he was able he would use the asset data instead of reaching out to the table itself. Is there a step I’m missing? I’ve read the documentation a couple of time and it seems like we need to change the

seeds

to dagster assets. https://docs.dagster.io/integrations/dbt/using-dbt-with-dagster/upstream-assets Is there something similar that needs to be done for

source

like changing the

source

to a

ref

or something? At the same time if I need to do that what is the need behind adding the metadata

Copy code

meta:
          dagster:
            asset_key: ["company_x_hourly_token_price"]

Jacob Marcil

08/18/2023, 2:09 PM

If I look at the

Global Asset Lineage

Graph I can see that the

source

is not link to the dagster asset, so maybe there’s something wrong here and that’s why it’s reaching to the table instead.

rex

08/18/2023, 2:14 PM

Hi Jacob, when you specify the

asset_key

in the dbt metadata, it has to correspond to the Dagster asset key of the Dagster asset producing your table. What is your `dar_hourly_token_price`’s asset_key? Does it have a prefix? If so, that should also be included in the dbt metadata.

Jacob Marcil

08/18/2023, 2:22 PM

Having the good

asset_key

fix the lineage part. Didn’t think of the prefix thank you very much for pointing that out. But I still have the issue were dagster try to read from the table defined in the source instead of where the parent asset got materialized.

rex

08/18/2023, 2:26 PM

My understanding was that
dagster
would ignore the
sources.yml
information and find the asset with the same
asset_key
. And if he was able he would use the asset data instead of reaching out to the table itself.

This is incorrect. Dagster uses the

sources.yml

information to establish the lineage relationship between the source (defined in Dagster) and the dbt models (defined in dbt). And when materializing the dbt models, the

sources.yml

is still used as it is defined in dbt: the table’s schema and name are inferred from the

sources.yml

and used in any dependent models that call it using

{{ source(…, …) }}

rex

08/18/2023, 2:29 PM

Which would make sense in production, but doesn’t in any other environments.

If this is the case, then you should make your source dynamic, based on your dbt target?

Jacob Marcil

08/18/2023, 2:30 PM

So let’s say I have 2 different environment

sandbox

and

prod

Let’s say

Copy code

RESOURCES_PROD = {
    "dbt": DbtCliClientResource(
        profiles_dir=DBT_PROFILES_DIR,
        project_dir=DBT_PROJECT_DIR,
        target="dev",
    ),
}

RESOURCES_SANDBOX = {
    "dbt": DbtCliClientResource(
        profiles_dir=DBT_PROFILES_DIR,
        project_dir=DBT_PROJECT_DIR,
        target="dev",
    ),
}

as describe in the fully feature project https://github.com/dagster-io/dagster/blob/master/examples/project_fully_featured/project_fully_featured/resources/__init__.py#L89 I won’t be able to use those environment because the dbt part of my project always need to read from

prod

destination defined in the

sources.yml

file? Seems like this goes again the concept of having multiple environment and providing different

dagster ressources

per env.

Jacob Marcil

08/18/2023, 2:32 PM

In the picture bellow. Technically, I could create the

dar_hourly_token_price

as a file on the local system, or a file on s3, or a table in duckdb. But then when

stg_dar__token_prices_usd

will get materialize it will look at the content of

sources.yml

and won’t be able to find the data?

rex

08/18/2023, 2:34 PM

This is why you should make your source dynamic, based on your dbt target: https://stackoverflow.com/questions/73609118/dbt-source-yml-based-on-target-name. Does that make sense?

🌈 1

Jacob Marcil

08/18/2023, 2:36 PM

If this is the case, then you should make your source dynamic, based on your dbt target?

Oh okok so making the sources dynamic make sense.

Jacob Marcil

08/18/2023, 2:36 PM

Thank you for answering so many questions from me.

rex

08/18/2023, 2:39 PM

Of course!

Open in Slack

Previous Next