https://dagster.io/ logo
Title
z

Zachary Bluhm

11/07/2022, 2:20 PM
Hey all, qq: what happens when you start a backfill that takes longer than the scheduled interval? Will the backfill be smart enough to reload the newest partition as well? for example, I have partitions 1, 2, 3 available when starting the backfill. The backfill takes so long that a partition 4 becomes available (new hour, day etc). Will the backfill automatically include that new partition? If not, any suggestions for getting around that?
:dagster-bot-responded-by-community: 1
a

Adam Bloom

11/07/2022, 2:41 PM
It won’t - the partitions for the backfill are locked in when you create it. Why would you want that though? Wouldn’t you want a separate schedule or sensor handling new partitions as they’re made available that would automatically queue a job for partition 4?
z

Zachary Bluhm

11/07/2022, 3:05 PM
my concern is that partition 4 runs before the backfill completes and if partition 4 contains tables that have lookbacks at previous partitions, then partition 4 could have wrong data. Correctness here would require that a backfill reruns all partitions (including those that are created while the backfill is running)
Although if we set concurrency to 1, if partition 4 was queued up then would it actually run after the backfill completes anyways?
a

Adam Bloom

11/07/2022, 3:07 PM
yup, that would do it pretty easily