The cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability.

dagster

Hi all, I am not sure what went wrong but suddenly all of my scheduled dagster jobs have started running twice on its scheduled time. I haven’t made any changes to the job code, schedule and/or queue coordinator. Did anyone ever see that happen? This is so strange, so trying to get a support here. Thanks!

Hi pragna, if you can share your daemon logs during a time period when two runs were launched we could take a look.. are you possibly running two daemons pointed at the same database?

Hi Daniel,
&gt; [32m2023-02-07 20:33:00 +0000[0m - dagster.daemon.SchedulerDaemon - [34mERROR[0m - [31mAnother SCHEDULER daemon is still sending heartbeats. You likely have multiple daemon processes running at once, which is not supported. Last heartbeat daemon id: 835ff957-f01e-41a8-bdd9-21de44c07f13, Current daemon_id: 047fb0cb-3b48-4de7-b542-5297202e72c5[0m
I see this like you said. However, when I check there is only 1 task available and running. Can you help me with how to track down that extra running container?

How are you checking that there's only 1 task running?

Hi Daniel, I just checked at the scheduled time, two task instances are spinning up.
Do you have any clue why that would happen and how can I stop that?

I don’t have quite enough information about your setup to know where the other daemon task would be coming from - but dagster only supports running a single daemon task at a time

I'm gonna add that I am also having this issue.

I am running on Dagster 1.1.15.

Started getting the same error message Pragna posted above. This began once I added a default configured executor to our repository.

Let me know if you need additional information and I'll be glad to provide it.

Adam how do you have your daemon deployed?

I have it deployed via the terminal.

I've ensured that I've closed all terminal instances and that Dagit shows the daemon is down before relaunching a single instance of the daemon. I still get the error.

Does the error appear more than once in the logs?

I could imagine it popping up once on startup if you stop a daemon and then start a new one right away

Yes, it seems to appear every 30 seconds for each daemon type (sensor, queued_run_coordinator, backfill, and scheduler)

Try running this in your terminal: "ps aux | grep dagster-daemon" - that would show if there was a background process still running

I do show two daemon PIDs. One of them shows it was started on Feb 10th despite killing all terminal instances.

I can, of course, kill that oldest one. Any idea why I only began getting these errors once I implemented the configured executor though?

I can't think of a connection there - very likely not related to the executor specifically

Really strange. I'll kill it and will report back here if a "ghost" daemon appears again.

Thanks, Daniel