Is the dagster_dbt partition feature available if ...
# integration-dbt
j
Is the dagster_dbt partition feature available if I load my assets using
load_assets_from_dbt_project
? ++ Is there any documentation for how to use the
dbt_asset
decorator? My goal is to start playing around with how do partition works for
dbt assets
with my
incremental models
🤖 1
p
Hey Jacob -- there's an example here that might help https://github.com/dagster-io/hooli-data-eng-pipelines/blob/bc05e2d2bd5ec62b2aec25dc78728361c2cfb1a9/hooli_data_eng/assets/dbt_assets.py#L107 I am working on more documentation on it this week and hope to have something out by Friday, so feel free to ask any specific questions and I'll try my best to help
r
We added examples on how to use the
@dbt_assets
decorator in our most recent edition of the API docs: https://docs.dagster.io/master/_apidocs/libraries/dagster-dbt#dagster_dbt.dbt_assets The partition feature is available with
load_assets_from_dbt_project
, but its partition support will soon be deprecated. We recommend that you use
@dbt_assets
.
❤️ 1
j
Thank you for you responses. I have a couple of questions if I may. From what I understand there’s 2 ways of loading a DBT Project in Dagster. 1. Using
load_assets_from_dbt_project
2. Using
load_assets_from_dbt_manifest
Then if we want to do something more with the assets, like configure partitions on top of them, we need to use the
@dbt_assets
decorator. Since the decorator only takes a
manifest
as it’s parameter,
load_assets_from_dbt_project
becomes obsolete right? Right now, my DBT code is stored in a different repo as my Dagster code, and we plan on keeping it this way. We are using the method describe in this discussion thread in order to load the dbt repo inside Dagster whenever we have a new version. That method copy the repo into another folder located next to dagster and I reference that folder in my configuration in order to load the assets with
load_assets_from_dbt_project
. If
@dbt_assets
requires a
manifest
though, I can’t use that method and I need to generate the
manifest
myself in CI/CD or do Dagster generate that file somewhere when it load the assets with
load_assets_from_dbt_project
and I can reuse that file somehow? TL;DR; I don’t have an update version of the
manifest
file deployed to my repo. How can I use
@dbt_assets
decorator.
r
@dbt_assets
is a replacement for both
load_assets_from_dbt_project
and
load_assets_from_dbt_manifest
. If you want to generate the manifest, you can either: 1. Create it in CI/CD as you mentioned, 2. Or, create it at runtime: https://dagster.slack.com/archives/C04CW71AGBW/p1690395671541379?thread_ts=1690388053.337689&cid=C04CW71AGBW We recommend that you do (1) because creating the manifest at runtime incurs a lot of latency. But (2) is still an option, and essentially replicates your workflow from
load_assets_from_dbt_project
.
👀 1
j
Hey @Pedram Navid , Since you give me the opportunity to ask questions I will 🙂 I’ve configured
dagster_dbt
with the
@dbt_asset
decorator, my models are showing and I can run them, so that part is fine. Now one of the problem I have is with the dbt
source
. I have a source table define in my
sources.yml
file that look like this
Copy code
version: 2

sources:
  - name: company_x
    database: dagster_db
    schema: company_x
    tables:
      - name: company_x_hourly_token_price
        meta:
          dagster:
            asset_key: ["company_x_hourly_token_price"]
company_x_hourly_token_price
is a table in Snowflake and it’s created by
dagster
. The problem I have is that
dagster
tries to read the table from the
source
dagster_db
define in that
sources.yml
file. Which would make sense in production, but doesn’t in any other environments. My understanding was that
dagster
would ignore the
sources.yml
information and find the asset with the same
asset_key
. And if he was able he would use the asset data instead of reaching out to the table itself. Is there a step I’m missing? I’ve read the documentation a couple of time and it seems like we need to change the
seeds
to dagster assets. https://docs.dagster.io/integrations/dbt/using-dbt-with-dagster/upstream-assets Is there something similar that needs to be done for
source
like changing the
source
to a
ref
or something? At the same time if I need to do that what is the need behind adding the metadata
Copy code
meta:
          dagster:
            asset_key: ["company_x_hourly_token_price"]
If I look at the
Global Asset Lineage
Graph I can see that the
source
is not link to the dagster asset, so maybe there’s something wrong here and that’s why it’s reaching to the table instead.
r
Hi Jacob, when you specify the
asset_key
in the dbt metadata, it has to correspond to the Dagster asset key of the Dagster asset producing your table. What is your `dar_hourly_token_price`’s asset_key? Does it have a prefix? If so, that should also be included in the dbt metadata.
j
Having the good
asset_key
fix the lineage part. Didn’t think of the prefix thank you very much for pointing that out. But I still have the issue were dagster try to read from the table defined in the source instead of where the parent asset got materialized.
r
My understanding was that
dagster
would ignore the
sources.yml
information and find the asset with the same
asset_key
. And if he was able he would use the asset data instead of reaching out to the table itself.
This is incorrect. Dagster uses the
sources.yml
information to establish the lineage relationship between the source (defined in Dagster) and the dbt models (defined in dbt). And when materializing the dbt models, the
sources.yml
is still used as it is defined in dbt: the table’s schema and name are inferred from the
sources.yml
and used in any dependent models that call it using
{{ source(…, …) }}
Which would make sense in production, but doesn’t in any other environments.
If this is the case, then you should make your source dynamic, based on your dbt target?
j
So let’s say I have 2 different environment
sandbox
and
prod
Let’s say
Copy code
RESOURCES_PROD = {
    "dbt": DbtCliClientResource(
        profiles_dir=DBT_PROFILES_DIR,
        project_dir=DBT_PROJECT_DIR,
        target="dev",
    ),
}

RESOURCES_SANDBOX = {
    "dbt": DbtCliClientResource(
        profiles_dir=DBT_PROFILES_DIR,
        project_dir=DBT_PROJECT_DIR,
        target="dev",
    ),
}
as describe in the fully feature project https://github.com/dagster-io/dagster/blob/master/examples/project_fully_featured/project_fully_featured/resources/__init__.py#L89 I won’t be able to use those environment because the dbt part of my project always need to read from
prod
destination defined in the
sources.yml
file? Seems like this goes again the concept of having multiple environment and providing different
dagster ressources
per env.
In the picture bellow. Technically, I could create the
dar_hourly_token_price
as a file on the local system, or a file on s3, or a table in duckdb. But then when
stg_dar__token_prices_usd
will get materialize it will look at the content of
sources.yml
and won’t be able to find the data?
r
This is why you should make your source dynamic, based on your dbt target: https://stackoverflow.com/questions/73609118/dbt-source-yml-based-on-target-name. Does that make sense?
🌈 1
j
If this is the case, then you should make your source dynamic, based on your dbt target?
Oh okok so making the sources dynamic make sense.
Thank you for answering so many questions from me.
r
Of course!