Jacob Marcil
08/16/2023, 8:34 PMload_assets_from_dbt_project
?
++ Is there any documentation for how to use the dbt_asset
decorator?
My goal is to start playing around with how do partition works for dbt assets
with my incremental models
Pedram Navid
08/17/2023, 3:07 AMrex
08/17/2023, 1:01 PM@dbt_assets
decorator in our most recent edition of the API docs: https://docs.dagster.io/master/_apidocs/libraries/dagster-dbt#dagster_dbt.dbt_assets
The partition feature is available with load_assets_from_dbt_project
, but its partition support will soon be deprecated. We recommend that you use @dbt_assets
.Jacob Marcil
08/17/2023, 1:15 PMload_assets_from_dbt_project
2. Using load_assets_from_dbt_manifest
Then if we want to do something more with the assets, like configure partitions on top of them, we need to use the @dbt_assets
decorator.
Since the decorator only takes a manifest
as it’s parameter, load_assets_from_dbt_project
becomes obsolete right?
Right now, my DBT code is stored in a different repo as my Dagster code, and we plan on keeping it this way. We are using the method describe in this discussion thread in order to load the dbt repo inside Dagster whenever we have a new version.
That method copy the repo into another folder located next to dagster and I reference that folder in my configuration in order to load the assets with load_assets_from_dbt_project
. If @dbt_assets
requires a manifest
though, I can’t use that method and I need to generate the manifest
myself in CI/CD or do Dagster generate that file somewhere when it load the assets with load_assets_from_dbt_project
and I can reuse that file somehow?
TL;DR; I don’t have an update version of the manifest
file deployed to my repo. How can I use @dbt_assets
decorator.rex
08/17/2023, 1:23 PM@dbt_assets
is a replacement for both load_assets_from_dbt_project
and load_assets_from_dbt_manifest
.
If you want to generate the manifest, you can either:
1. Create it in CI/CD as you mentioned,
2. Or, create it at runtime: https://dagster.slack.com/archives/C04CW71AGBW/p1690395671541379?thread_ts=1690388053.337689&cid=C04CW71AGBW
We recommend that you do (1) because creating the manifest at runtime incurs a lot of latency. But (2) is still an option, and essentially replicates your workflow from load_assets_from_dbt_project
.Jacob Marcil
08/18/2023, 1:33 PMdagster_dbt
with the @dbt_asset
decorator, my models are showing and I can run them, so that part is fine.
Now one of the problem I have is with the dbt source
.
I have a source table define in my sources.yml
file that look like this
version: 2
sources:
- name: company_x
database: dagster_db
schema: company_x
tables:
- name: company_x_hourly_token_price
meta:
dagster:
asset_key: ["company_x_hourly_token_price"]
company_x_hourly_token_price
is a table in Snowflake and it’s created by dagster
.
The problem I have is that dagster
tries to read the table from the source
dagster_db
define in that sources.yml
file. Which would make sense in production, but doesn’t in any other environments.
My understanding was that dagster
would ignore the sources.yml
information and find the asset with the same asset_key
. And if he was able he would use the asset data instead of reaching out to the table itself.
Is there a step I’m missing?
I’ve read the documentation a couple of time and it seems like we need to change the seeds
to dagster assets.
https://docs.dagster.io/integrations/dbt/using-dbt-with-dagster/upstream-assets
Is there something similar that needs to be done for source
like changing the source
to a ref
or something?
At the same time if I need to do that what is the need behind adding the metadata
meta:
dagster:
asset_key: ["company_x_hourly_token_price"]
Jacob Marcil
08/18/2023, 2:09 PMGlobal Asset Lineage
Graph I can see that the source
is not link to the dagster asset, so maybe there’s something wrong here and that’s why it’s reaching to the table instead.rex
08/18/2023, 2:14 PMasset_key
in the dbt metadata, it has to correspond to the Dagster asset key of the Dagster asset producing your table.
What is your `dar_hourly_token_price`’s asset_key? Does it have a prefix? If so, that should also be included in the dbt metadata.Jacob Marcil
08/18/2023, 2:22 PMasset_key
fix the lineage part. Didn’t think of the prefix thank you very much for pointing that out.
But I still have the issue were dagster try to read from the table defined in the source instead of where the parent asset got materialized.rex
08/18/2023, 2:26 PMMy understanding was thatThis is incorrect. Dagster uses thewould ignore thedagster
information and find the asset with the samesources.yml
. And if he was able he would use the asset data instead of reaching out to the table itself.asset_key
sources.yml
information to establish the lineage relationship between the source (defined in Dagster) and the dbt models (defined in dbt). And when materializing the dbt models, the sources.yml
is still used as it is defined in dbt: the table’s schema and name are inferred from the sources.yml
and used in any dependent models that call it using {{ source(…, …) }}
rex
08/18/2023, 2:29 PMWhich would make sense in production, but doesn’t in any other environments.If this is the case, then you should make your source dynamic, based on your dbt target?
Jacob Marcil
08/18/2023, 2:30 PMsandbox
and prod
Let’s say
RESOURCES_PROD = {
"dbt": DbtCliClientResource(
profiles_dir=DBT_PROFILES_DIR,
project_dir=DBT_PROJECT_DIR,
target="dev",
),
}
RESOURCES_SANDBOX = {
"dbt": DbtCliClientResource(
profiles_dir=DBT_PROFILES_DIR,
project_dir=DBT_PROJECT_DIR,
target="dev",
),
}
as describe in the fully feature project
https://github.com/dagster-io/dagster/blob/master/examples/project_fully_featured/project_fully_featured/resources/__init__.py#L89
I won’t be able to use those environment because the dbt part of my project always need to read from prod
destination defined in the sources.yml
file?
Seems like this goes again the concept of having multiple environment and providing different dagster ressources
per env.Jacob Marcil
08/18/2023, 2:32 PMdar_hourly_token_price
as a file on the local system, or a file on s3, or a table in duckdb.
But then when stg_dar__token_prices_usd
will get materialize it will look at the content of sources.yml
and won’t be able to find the data?rex
08/18/2023, 2:34 PMJacob Marcil
08/18/2023, 2:36 PMIf this is the case, then you should make your source dynamic, based on your dbt target?Oh okok so making the sources dynamic make sense.
Jacob Marcil
08/18/2023, 2:36 PMrex
08/18/2023, 2:39 PM