In addition, is replicating the Dagster Daemon ok if we are not using sensors/scheduler? Are the queueing/run worker error handling replicable? The queueing + run worker error handling are features I'd like to be able to leverage, but I'm concerned that having only one instance of them running will affect live workflows (during deployments/availability blips)
01/19/2023, 10:37 PM
Hey Jimmy - we don't currently have support for replicating the dagster daemon, even if you're not using schedules and sensors
I'd be curious to know more about what your latency requirements are
01/19/2023, 10:51 PM
Regarding the run worker retries, will these run-retries settings be ignored if the Dagster daemon is disabled (or are these retries handled by a diff process from the run monitoring dagster daemon)? https://docs.dagster.io/deployment/run-retries
Regarding latencies, I believe the majority of our workflows currently take only a few seconds to complete. Additional overhead latency from adding finer-grained orchestration and visibility is expected, but it would be ideal if p100 overhead latency of workflows were less than 10 seconds (to avoid too noticeable of a latency change).
01/19/2023, 10:52 PM
You'll need the daemon to be running in order for run retries to work
01/20/2023, 6:09 PM
Is support for replicating daemons specifically for the queueing/run worker retries) currently on the roadmap?
01/20/2023, 6:10 PM
It's on the medium term roadmap, but it may still be several months out