https://dagster.io/ logo
Title
d

Duncan

09/20/2022, 8:59 PM
Is there documentation somewhere on how to upgrade the dagster daemon? How are currently-running jobs handled, etc?
d

daniel

09/20/2022, 9:02 PM
Hi Duncan, not sure there's a doc on this specifically - if you're using any run launcher other than the default one, the runs don't actually happen as part of the daemon process, so you should be able to shut down the daemon without interrupting any ongoing runs. Schedules and sensors (and the run queue) should also pick up where they left off once the daemon is back up and running.
d

Duncan

09/20/2022, 9:06 PM
I see. Do the runs not need to communicate with the daemon in order to signal that they have completed and record that fact?
d

daniel

09/20/2022, 9:07 PM
The way that they communicate that back to the daemon is by writing updates to the database which the daemon then reads to figure out what to do next - there's no direct communication
d

Duncan

09/20/2022, 9:08 PM
So that’s done in-process with the user code, on the target that is doing the running?
d

daniel

09/20/2022, 9:08 PM
Each run writes events and updates to the database within the run process, yeah
d

Duncan

09/20/2022, 9:09 PM
I see. That’s smart.
What happens if, say, you need to upgrade, and thus have downtime, for the daemon, to jobs that would have executed in that timeframe?
Is there any kind of auto-detection going on? Some kind of checkpointing of the timestamp and then backfilling?
d

daniel

09/20/2022, 9:11 PM
schedules for partitioned jobs will catch up on any partitions that were missed (a mini-backfill, like you said). Schedules for jobs that are not on partitions will just wait for the next schedule time to happen
d

Duncan

09/20/2022, 9:12 PM
Cool. Tangentially: what happens if the jobs can’t reach the database (perhaps you need downtime for an upgrade).
Will they pause and retry?
d

daniel

09/20/2022, 9:13 PM
In that case I believe the jobs would fail - pause and retry would be nice, but not a feature we have yet
(and you can set up job-level retries in general that encompass all types of job failure)
d

Duncan

09/20/2022, 9:18 PM
Right
I think it’s a narrow case, just good to know the edges of the space