https://dagster.io/ logo
#integration-dbt
Title
# integration-dbt
d

D Zavadskykh

08/28/2023, 8:19 AM
Hi everyone, I've just started working with Dagster and I have a question about dbt+dagster: What would be the best way to define an upstream asset for a dbt_asset? The upstream asset is basically a data migration script and the dbt_asset must depend on it.
t

Todd de Quincey

08/28/2023, 9:00 AM
You can do this in the
dbt
sources file. Personally, I don’t go down this approach (i.e. defining Dagster dependencies inside my dbt project), as I like to keep Dagster logic inside of Dagster and dbt as a stand alone element which has no knowledge of Dagster. But this is the documented way 🙂 https://docs.dagster.io/integrations/dbt/using-dbt-with-dagster/upstream-assets#step-3-in-the-dbt-project-replace-a-seed-with-a-source
d

D Zavadskykh

08/28/2023, 9:16 AM
Thanks @Todd de Quincey. I checked it as well as this but I don't understand conceptually how to define an upstream asset for the whole dbt_asset (in my case 500 models). Is it possible at all? I also thought about creating an op with my script followed by materialization of the dbt assets, but seems like ops and assets aren't supposed to work together like this, are they?
t

Todd de Quincey

08/28/2023, 9:19 AM
I could be wrong, but you’ll probably want to look at implementing a custom
DagsterDbtTranslator
then and override the
get_metadata
method. So if in a simplified example, all 500 models depended on Asset A, then you could update the meta data at the translator level. https://docs.dagster.io/_apidocs/libraries/dagster-dbt#dagster_dbt.DagsterDbtTranslator.get_metadata
d

D Zavadskykh

08/28/2023, 9:33 AM
Thanks, will check it out!
Hmm, seems like
get_metadata
is only for display purposes in the UI.
r

rex

08/28/2023, 1:23 PM
get_metadata
is only for display purposes. You could just run your migration script before your materializing your dbt assets:
Copy code
@dbt_assets(...)
def my_dbt_assets(context: OpExecutionContext, dbt: DbtCliResource):
    # run migration script
    
    yield from dbt.cli(["run"], context=context)
If your data migration only affects some of your dbt sources, then you could define a multi asset instead that materializes your dbt sources: https://docs.dagster.io/integrations/dbt/reference#defining-an-asset-as-an-upstream-dependency-of-a-dbt-model
d

D Zavadskykh

08/28/2023, 1:32 PM
Thanks @rex! The migration affects all dbt sources. Is there a way not just to execute those 2 steps one-by-one but assign a dependency? I think your solution will work, but I suppose that the whole orchestration will take less time if a model is materialized when its source is migrated.
r

rex

08/28/2023, 1:36 PM
When you're working with software-defined assets, you're assigning dependencies at the asset level. In this case, you need to: • Gather your dbt sources, using your dbt manifest • Iterate through your dbt sources, and create a multi_asset that produces outputs corresponding to your dbt sources. • In the body of your multi asset, run the migration.
❤️ 1
7 Views