So we ran into an issue that happens when we run multiple pa dagster #integration-dbt

So we ran into an issue that happens when we run m...

Michael Qiu

08/30/2023, 5:34 PM

So we ran into an issue that happens when we run multiple partitions on a materialization for the first time on a partitioned asset in parallel. Essentially, there is a race condition where several runs will do a

CREATE OR REPLACE TABLE

since DBT recognizes that the table in BigQuery doesn't exist yet, and what ends up happening is that those runs will override previously existing partitions in the other parallel runs. Anybody know of a good workaround for this? Maybe a way to create tables (with no data) before materializing based on either the DBT schema/query?

Qwame

08/30/2023, 5:58 PM

Why not build your tables incrementally when you are using partitions? table or view materializations means each partition will override the existing ones.

Zachary Bluhm

08/30/2023, 5:59 PM

We do build them incrementally - the problem is that when you launch a backfill in Dagster it will run multiple partitions in parallel

this 1

Brendan Jackson

08/30/2023, 7:06 PM

This is pretty unfortunate.

Brendan Jackson

08/30/2023, 7:07 PM

You can run dbt incremental models over a multiple partitions in a single run, provided the partition is a time range and contiguous.

Tim Castillo

08/30/2023, 7:40 PM

Hmm, if I'm reading this correctly, would you be able to materialize a single partition (ie. latest, for convenience) and then backfill the rest of them? ack'ing it's not the cleanest solution. Another option would be to run a modified version of your dbt project separately first, modded with something like a

limit 0

at every partitioned model? This should do what're you asking for, which is create all the tables, but leave them empty.

Michael Qiu

08/30/2023, 7:58 PM

yeah, we're considering the 2nd option of using some sort of

limit 0

within DBT (using this approach): https://github.com/dbt-labs/dbt-core/issues/4201

Zachary Bluhm

08/31/2023, 2:05 PM

As an update here: I think in the short term we will just limit backfill run concurrency to 1 and leverage the new backfill policies feature in dagster to speed things up

2 Views

Open in Slack

Previous Next