Hi I have successfully been running schedules using dagster dagster #announcements

Hi, I have successfully been running schedules us...

Nick

02/23/2021, 11:39 AM

Hi, I have successfully been running schedules using "dagster-daemon run" for about 4 weeks now. Probably about 20 schedules were being run reliably on a daily basis. Today I have added a further 20 and have the following error in the terminal

exception: stopping dagster-daemon process since the following threads are no longer sending heartbeats: ['SCHEDULER']

When observing the terminal, it looks like each schedule takes 3 seconds to assess, and it seems to drop out around the same point. I have also noticed in Dagit itself, the Daemon Status page shows the attached picture (basically 'no recent heartbeat detected') Is there a limit to the number of schedules I can have running? Can I extend the timeout somehow? Or maybe split the schedule into 2 so it copes better?

daniel

02/23/2021, 1:49 PM

Hi Nick - 40 schedules should be no problem. I think I know what’s going on here. Are dagit and the daemon and the runs all happening on the same machine?

Nick

02/23/2021, 1:50 PM

Hi, Yes same machine. Basically 2 terminals in pycharm...

daniel

02/23/2021, 2:15 PM

Got it - we should be able to fix the problem here before the next release this week, and likely bring that 3 second per schedule number down at the same time - that should be faster unless your schedule function is doing a ton of work for some reason. A short term workaround while we get that sorted out would be to run the new schedules 5 minutes later.

Nick

02/23/2021, 3:30 PM

Perfect, that would be great. The schedules don't run concurrently. They run all different times / days. But it seems that the process of assessing whether they should run still takes a while. I have attached an output to show you what I mean.

scheduler_output.txt

daniel

02/23/2021, 3:33 PM

Ah right - that makes sense. There's some caching that we should be doing to make this much faster (and even when it's not fast, it should still be heartbeating between each schedule check). Thanks for the report!

👍 1

daniel

02/23/2021, 3:46 PM

OK, second attempt at a temporary workaround (this one is a bit more annoying): If you follow these steps and run your own gRPC server as described here: https://docs.dagster.io/overview/repositories-workspaces/workspaces#running-your-own-grpc-server I'd expect things to get much snappier. You'd want to turn off all your schedules before changing your workspace (since the way to load the schedule code would change)

daniel

02/23/2021, 3:47 PM

(but the problem should go away on its own on Thursday if you upgrade, assuming I'm correct about the source of the problem)

Nick

02/23/2021, 3:51 PM

This is really helpful. I will take a look tomorrow, but upgrade Thursday and see if it has gone away. For now, I have written a never ending loop which uses popen to run "dagster-daemon run" to restart when it errors out. Crude but it will work for the next 2 days.

Nick

02/26/2021, 9:02 AM

Hi, just to update on this... I upgraded to 0.10.7 and this issue has resolved. It is now checking 4-5 schedules a second and cycles round to the heartbeat much quicker. Thanks!

dagstir 1

2 Views

Open in Slack

Previous Next