All dbt assets are running in a single step dagster #integration-dbt

Join Slack

All dbt assets are running in a single step

# integration-dbt

Todd de Quincey

08/07/2023, 1:50 PM

All dbt assets are running in a single step

Todd de Quincey

08/07/2023, 1:51 PM

Hi, Perhaps this isn’t how the dbt assets are intended to work, or perhaps I am just missing something somewhere. Either way, I suspect this will be a simple answer. So I have loaded all of my dbt assets into Dagster (code snippet below). I am essentially creating asset groups (and subsequent jobs) based on the dbt

tags

. The attached screenshot is an example of one of the asset groups / jobs that will run. However, when we run the job (either via scheduled run or a manual run), I see that Dagster is running all of the assets in a single run - which is not what I expected. I was expecting that each asset would be materialized in it’s own run, therefore allowing us to re-run only individual steps in the event of failures etc etc. I presume this is a user error. Any guidance would be much appreciated Running on Dagster 1.4.4

Copy code

assets = with_resources(
    load_assets_from_dbt_project(
        profiles_dir=DBT_PROJECT_PATH,
        project_dir=DBT_PROFILES,
        # Use the first dbt tag as the asset group
        node_info_to_group_fn=lambda node: node["config"]["tags"][0]
        if node["config"]["tags"] != []
        else None,
        display_raw_sql=True,
        exclude="tag: integration_tests unit-tests",
    ),
    {
        "dbt": dbt_cli_resource.configured(
            {
                "project_dir": DBT_PROJECT_PATH,
                "profiles_dir": DBT_PROFILES,
            },
        )
    },
)

We then create the job as so:

Copy code

define_asset_job(name="dbt_tag_here", selection=AssetSelection.groups("dbt_tag_here"))

rex

08/07/2023, 1:56 PM

This is intended. We don’t separate out the execution of dbt models into separate steps (e.g.

dbt run --select asset1

dbt run --select asset2

, … Like you pointed out, we will run

dbt run --select asset1 asset2

Todd de Quincey

08/07/2023, 1:58 PM

Oh ok. Thanks, @rex. That’s a bummer, as I thought that this wasn’t the case (especially based off of the UI graph). Are there any plans to make these individual executable steps when the entire job is run?

Todd de Quincey

08/07/2023, 1:58 PM

IMO, this would be a big leap forward

Todd de Quincey

08/07/2023, 2:00 PM

I guess the idea though is that in the event of a failure of e.g. 3/20 models, that this shows up in the UI and we can manually select just those assets for re-materialization?

rex

08/07/2023, 2:03 PM

I don’t believe we’re planning to make these individual executable steps. Our framework coalesces the work that needs to be done, so that it can all be executed in one step. This is actually more efficient since it saves on process initialization cost, step startup cost, etc. You should be able to just rematerialize failed models from the UI — you don’t need to manually select them. cc @owen if this functionality is hidden somewhere?

Todd de Quincey

08/07/2023, 2:05 PM

This is actually more efficient since it saves on process initialization cost, step startup cost, etc.

That makes sense from that POV.

You should be able to just rematerialize failed models from the UI — you don’t need to manually select them.

Ah ok. I naively presumed that doing this would re-run the same command (ie. selecting all 20 assets). Not just the 3 failed ones. Great to know. Thanks for taking the time to clarify this for me.

rex

08/07/2023, 2:09 PM

Just to clarify: you have to click the dropdown menu next to

Materialize all

in your job view to see the screenshot that I posted. In the runs view (your screenshot) I believe you are correct: we re-run the same command from failure. I think your suggestion makes sense here. We could probably add the same button to Re-execute failed assets from the runs page, similar to the jobs page. Will double check with the team

Todd de Quincey

08/07/2023, 2:10 PM

Awesome. Thanks again, Rex 🙂

Qwame

08/07/2023, 5:15 PM

I added this feature myself in the event of a failure. The command to execute failed and skipped models is logged and I can copy and execute this myself. However, I agree that it'd be a great addition if we could just retry from failure like with all other asset runs.

rex

08/07/2023, 5:31 PM

Mind filing a feature request for this?

rex

08/07/2023, 6:01 PM

Looks like there’s an existing issue: https://github.com/dagster-io/dagster/issues/12423

Todd de Quincey

08/07/2023, 6:02 PM

Cheers. Gave it a thumbs up

👍🏽 1

2 Views

Open in Slack

Previous Next