Apologies if I missed it in the docs is there a notion of a dagster #announcements

Apologies if I missed it in the docs, is there a n...

Rich Schiavi

12/02/2019, 8:33 PM

Apologies if I missed it in the docs, is there a notion of a timeout/heartbeat for a Solid? ie, if we have long running tasks that would need to keep sending a heartbeat/alive, and if that stops, reschedule the Task as failed

abhi

12/02/2019, 8:59 PM

This is a great question. So we do not have this capability yet, but working on dagster executors is a big priority for us at the moment so I would love to learn more about your use case. There are two workarounds I can suggest. The first would be building a heartbeat resource which gets plugged into solids that can be used for "retrying". The second option could be to use the Dask executor because timeouts/heartbeats are a first class citizen there. However, I haven't really played around with heartbeats on dask/dagster before so there be dragons. https://dagster.readthedocs.io/en/latest/sections/api/apidocs/dagster_dask.html

Rich Schiavi

12/02/2019, 10:35 PM

For the use case, we have tasks that take take an indeterminate amount of time. A heartbeat allows us to know they are still processing. If we just scheduled say a very long timeout, and the task died a few seconds in, we'd be very inefficient on retries, versus say a 60 second timeout that would have stopped and let us know to reschedule that task. This is similar to the AWS Step Functions. " "TimeoutSeconds": 300, "HeartbeatSeconds": 60," For our use case, we could schedule a one day "Timeout" (excessive) but 60 seconds heartbeats

Rich Schiavi

12/02/2019, 10:36 PM

I saw the heartbeat option in dagster_dask, but was unclear on how it's used. Are there any examples that show that option?

Rich Schiavi

12/02/2019, 10:52 PM

this describes what we are looking for: https://cadenceworkflow.io/docs/03_concepts/02_activities#long-running-activities

alex

12/03/2019, 6:27 PM

So in dagster the execution substrate is pluggable. In the default case we execute in process. This is useful for testing but obviously not what you want for lots of long running jobs. This is where

dagster_dask

comes in. It provides Dask as alternative executor which has all of its own configuration for its cluster based execution model. So Dask is what you will be configuring to manage heartbeats etc. Some more details are here https://dagster.readthedocs.io/en/0.6.5/sections/deploying/dask.html

6 Views

Open in Slack

Previous Next