Hello all! When running the `dagster-daemon` for a...
# announcements
a
Hello all! When running the
dagster-daemon
for a long period of time we occasionally get a traceback as detailed in the screenshot. Sensor stops running, and current tasks hang. Is there any precaution we can take to avoid this problem?
d
Hi Alex - two questions: • is there anything earlier in the logs that might provide a clue to why the thread stopped? Any chance you could send over the full logs?
• What version of dagster is this? Some tweaks to this flow went out in 0.11.4 that might provide more clues about the exact reason it died
a
Hey Daniel! So with regards to your first question, that was more or less the full traceback. This screenshot was provided by a colleague; however, I was able to emulate the error by suspending my laptop with the following output:
Copy code
Traceback (most recent call last):
  File "/home/alex/miniconda3/envs/eip/bin/dagster-daemon", line 8, in <module>
    sys.exit(main())
  File "/home/alex/miniconda3/envs/eip/lib/python3.8/site-packages/dagster/daemon/cli/__init__.py", line 141, in main
    cli(obj={})  # pylint:disable=E1123
  File "/home/alex/miniconda3/envs/eip/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/alex/miniconda3/envs/eip/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/alex/miniconda3/envs/eip/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/alex/miniconda3/envs/eip/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/alex/miniconda3/envs/eip/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/alex/miniconda3/envs/eip/lib/python3.8/site-packages/dagster/daemon/cli/__init__.py", line 53, in run_command
    controller.check_daemons()
  File "/home/alex/miniconda3/envs/eip/lib/python3.8/site-packages/dagster/daemon/controller.py", line 89, in check_daemons
    raise Exception(
Exception: Stopping dagster-daemon process since the following threads are no longer sending heartbeats: ['SENSOR']
Regarding that second question, we use version 0.10.7 and 0.10.9
Also, I can confirm that the colleague's traceback is almost identical to the one that I sent you; preceded by normal dagster-daemon behavior, checking for sensor / schedule runs
Update, getting more involved traceback. Sensor is now dying after five minutes:
d
Ah, I think it's likely that if you update to 0.11.0, this problem will go away. We were seeing this problem when evaluating an individual sensor took a long time (more than 2 minutes), but we landed a fix for it as part of that release.
🙌 1
a
Will try to push that, thank you!!!