Hey there We re currently working with the Dagster dbt integ dagster #integration-dbt

Hey there! We're currently working with the Dagst...

Joseph Marcus

07/18/2023, 12:14 PM

Hey there! We're currently working with the Dagster dbt integration and have encountered some unexpected behavior related to asset materialization. When we manually trigger the materialization of our assets (dbt models),

context.dagster_run.asset_selection

correctly returns the corresponding assets. However, when the same assets are materialized by a scheduled job,

context.dagster_run.asset_selection

unexpectedly returns None instead of the asset keys. Our understanding is that

context.dagster_run.asset_selection

should behave consistently, regardless of whether the assets are materialized manually or by a scheduled job. We have ensured that there are no differences in code or environment configurations between the manual and scheduled runs that could account for this disparity. Is this difference in behavior expected? If not, could someone please guide us towards potential solutions?

rex

07/18/2023, 6:56 PM

@owen this looks like an underlying Dagster framework issue — is there a workaround here?

owen

07/18/2023, 7:10 PM

Hi @Joseph Marcus -- The

asset_selection

is not a public property of the

DagsterRun

object, it's more of an internal implementation detail. There's not a hard-set reason for this difference in behavior, but the basic idea is that the asset_selection generally represents a subselection of the total set of assets in a job (so if you're materializing all assets in a job, there is no subselection, it's just all the assets). What's your usecase for accessing this? There's likely another way of doing this that would be more consistent

Joseph Marcus

07/19/2023, 10:59 AM

Hi @owen, Thanks for your response. Our use case for accessing asset_selection is a bit unique. We have set up an alerting mechanism to notify our team on an internal Slack channel whenever a Dagster job fails. To provide more granular context around each failure, we aim to include the specific assets that failed to materialize in our alerts. We have been leveraging asset_selection to fetch this information. Given that asset_selection is not a public property and its behavior may not be consistent, could you suggest an alternate way to get this information? We still need a method to identify the specific assets involved in a failure during a scheduled job run. Thanks!

Dane Linssen

07/19/2023, 11:19 AM

Hi @owen! Thanks for the response. I work with @Joseph Marcus, here are some examples of what we’re doing. It’s pretty cool!!!

rex

07/19/2023, 12:03 PM

Could you do the same thing by accessing the dbt run_results.json? You should be able to retrieve dbt artifacts in the op after invoking the dbt command

Dane Linssen

07/19/2023, 12:04 PM

thanks @rex! we’ll look into it 🙂

Joseph Marcus

07/19/2023, 12:06 PM

Thanks @rex!

Dane Linssen

07/26/2023, 9:11 AM

Hi @rex! Update from our side. We are able to use the

run_results.json

, perfect suggestion! We have it implemented locally. In our cluster though, our user code deployment spins up a new ephemeral pod for every run. How can we capture the

run_results.json

from this pod before it gets spun down?

Open in Slack

Previous Next