So we ran into an issue that happens when we run m...
# integration-dbt
m
So we ran into an issue that happens when we run multiple partitions on a materialization for the first time on a partitioned asset in parallel. Essentially, there is a race condition where several runs will do a
CREATE OR REPLACE TABLE
since DBT recognizes that the table in BigQuery doesn't exist yet, and what ends up happening is that those runs will override previously existing partitions in the other parallel runs. Anybody know of a good workaround for this? Maybe a way to create tables (with no data) before materializing based on either the DBT schema/query?
q
Why not build your tables incrementally when you are using partitions? table or view materializations means each partition will override the existing ones.
z
We do build them incrementally - the problem is that when you launch a backfill in Dagster it will run multiple partitions in parallel
this 1
b
This is pretty unfortunate.
You can run dbt incremental models over a multiple partitions in a single run, provided the partition is a time range and contiguous.
t
Hmm, if I'm reading this correctly, would you be able to materialize a single partition (ie. latest, for convenience) and then backfill the rest of them? ack'ing it's not the cleanest solution. Another option would be to run a modified version of your dbt project separately first, modded with something like a
limit 0
at every partitioned model? This should do what're you asking for, which is create all the tables, but leave them empty.
m
yeah, we're considering the 2nd option of using some sort of
limit 0
within DBT (using this approach): https://github.com/dbt-labs/dbt-core/issues/4201
z
As an update here: I think in the short term we will just limit backfill run concurrency to 1 and leverage the new backfill policies feature in dagster to speed things up