https://dagster.io/ logo
#ask-community
Title
# ask-community
j

jasono

05/09/2022, 4:45 AM
Hi, my
dagster-daemon run
keeps crashing with an error message that states
ERROR - Thread for SENSOR did not shut down gracefully
and
Exception: Stopping dagster-daemon process since the following threads are no longer sending heartbeats: ['SENSOR', 'SCHEDULER']
. This occasionally happened in the past and somehow fixed itself, but this time the issue isn’t going away. Here is the full stack trace.
$ dagster-daemon run 2022-05-07 233435 -0700 - dagster.daemon - INFO - instance is configured with the following daemons: [‘BackfillDaemon’, ‘SchedulerDaemon’, ‘SensorDaemon’] warnings.warn(warning_message, DeprecationWarning) 2022-05-07 233605 -0700 - dagster.daemon - ERROR - Thread for SENSOR did not shut down gracefully 2022-05-07 233635 -0700 - dagster.daemon - ERROR - Thread for SCHEDULER did not shut down gracefully Traceback (most recent call last): File “C:\Program Files\Python\lib\runpy.py”, line 197, in _run_module_as_main return _run_code(code, main_globals, None, File “C:\Program Files\Python\lib\runpy.py”, line 87, in _run_code exec(code, run_globals) File “E:\Data\dagster\venv\Scripts\dagster-daemon.exe\__main__.py”, line 7, in <module> File “Y:\dagster\venv\lib\site-packages\dagster\daemon\cli\__init__.py”, line 142, in main cli(obj={}) # pylint:disable=E1123 File “c:\Users\u12345\AppData\Roaming\Python\Python39\site-packages\click\core.py”, line 1128, in call return self.main(*args, **kwargs) File “c:\Users\u12345\AppData\Roaming\Python\Python39\site-packages\click\core.py”, line 1053, in main rv = self.invoke(ctx) File “c:\Users\u12345\AppData\Roaming\Python\Python39\site-packages\click\core.py”, line 1659, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File “c:\Users\u12345\AppData\Roaming\Python\Python39\site-packages\click\core.py”, line 1395, in invoke return ctx.invoke(self.callback, **ctx.params) File “c:\Users\u12345\AppData\Roaming\Python\Python39\site-packages\click\core.py”, line 754, in invoke return __callback(*args, **kwargs) File “Y:\dagster\venv\lib\site-packages\dagster\daemon\cli\__init__.py”, line 43, in run_command _daemon_run_command(instance, kwargs) File “Y:\dagster\venv\lib\site-packages\dagster\core\telemetry.py”, line 110, in wrap result = f(*args, **kwargs) File “Y:\dagster\venv\lib\site-packages\dagster\daemon\cli\__init__.py”, line 55, in _daemon_run_command controller.check_daemon_loop() File “Y:\dagster\venv\lib\site-packages\dagster\daemon\controller.py”, line 263, in check_daemon_loop self.check_daemon_heartbeats() File “Y:\dagster\venv\lib\site-packages\dagster\daemon\controller.py”, line 236, in check_daemon_heartbeats raise Exception( Exception: Stopping dagster-daemon process since the following threads are no longer sending heartbeats: [‘SENSOR’, ‘SCHEDULER’]
I’m on Windows server 12. R2 and Dagster 0.14.13.
d

daniel

05/09/2022, 11:37 PM
Hi @jasono - it seems like the threads that the daemon spins up aren't even really able to start up. Is the machine where this is running close to resource limits? Did anything change in the environment between the last time it was working and when the problem started happening?
j

jasono

05/09/2022, 11:43 PM
Hi @daniel Thanks for your response. There is enough resource as far as I can tell, It’s running at 60% CPU and 40% memory, but it’s still failing.
Also I can’t think of any unusual environment change recently.
d

daniel

05/09/2022, 11:45 PM
Did you upgrade dagster? When was the last time it was working?
j

jasono

05/09/2022, 11:45 PM
It was working at 0.14.13 and the version was the same when the issue started.
I then tried 0.14.14 hoping it might fix the issue, but it didn’t.
I also tried
dagster-daemon wipe
which it wiped, but didn’t help with the issue.
also tried
heartbeat write and read
which ran okay.
d

daniel

05/09/2022, 11:48 PM
You could try running the daemon with the —empty-workspace arg to see if it's something related to your workspace.yaml that is causing problem
j

jasono

05/09/2022, 11:50 PM
trying right now
Wow, it’s not failing.
Copy code
$ dagster-daemon run --empty-workspace

Y:\dagster\venv\lib\site-packages\dagster\core\definitions\job_definition.py:93: ExperimentalWarning: "VersionStrategy" is an experimental class. It may break in future versions, even between dot releases. To mute warnings for experimental functionality, invoke warnings.filterwarnings("ignore", category=dagster.ExperimentalWarning) or use one of the other methods described at <https://docs.python.org/3/library/warnings.html#describing-warning-filters>.

  super(JobDefinition, self).__init__(

2022-05-09 16:50:17 -0700 - dagster.daemon - INFO - instance is configured with the following daemons: ['BackfillDaemon', 'SchedulerDaemon', 'SensorDaemon']

2022-05-09 16:50:21 -0700 - dagster.daemon.SensorDaemon - INFO - Not checking for any runs since no sensors have been started.

2022-05-09 16:50:23 -0700 - dagster.daemon.SchedulerDaemon - WARNING - Schedule recon_atlas_gl_job_schedule was started from a location recon_atlas_gl.py that can no longer be found in the workspace. You can turn off this schedule in the Dagit UI from the Status tab.

2022-05-09 16:50:23 -0700 - dagster.daemon.SchedulerDaemon - WARNING - Schedule recon_me_scheduled_reports_job_schedule was started from a location recon_me_scheduled_reports.py that can no longer be found in the workspace. You can turn off this schedule in the Dagit UI from the Status tab.
d

daniel

05/09/2022, 11:53 PM
Could you post your workspace.yaml?
j

jasono

05/09/2022, 11:54 PM
Issues the above warnings, but it doesn’t stop.
Copy code
load_from:

 

  - python_file:

      relative_path: repo/file_watcher.py

      working_directory: y:/datapipeline/file_load

 

  - python_file:

      relative_path: repo/recon_8510r.py

      working_directory: y:/datapipeline/me_recon_supports/recon_8510r

 

  - python_file:

      relative_path: repo/recon_atlas_gl.py

      working_directory: y:/datapipeline/me_recon_supports/recon_atlas_gl

 

  - python_file:

      relative_path: repo/recon_etracdw_atlas.py

      working_directory: y:/datapipeline/me_recon_supports

 

  - python_file:

      relative_path: repo/trend_prem_alltrans.py

      working_directory: y:/datapipeline/me_prem_trend

 

  - python_file:

      relative_path: repo/trend_prem_byclient.py

      working_directory: y:/datapipeline/me_prem_trend

 

  - python_file:

      relative_path: repo/trend_PL.py

      working_directory: y:/datapipeline/file_load

 

  - python_file:

      relative_path: repo/recon_me_scheduled_reports.py

      working_directory: y:/datapipeline/me_recon_supports/recon_me_scheduled_reports

 

  - python_file:

      relative_path: repo/check_epic_gov.py

      working_directory: y:/datapipeline/me_ng/check_epic_gov

 

  - python_file:

      relative_path: repo/je_dac_co61_reclass.py

      working_directory: y:/datapipeline/me_ng/

 

  - python_file:

      relative_path: repo/audit_data_files.py

      working_directory: y:/datapipeline/me_ng/

 

  - python_file:

      relative_path: repo/me_report_5332.py

      working_directory: y:/datapipeline/me_ng/

 

  - python_file:

      relative_path: repo/memoization_test.py

      working_directory: y:/datapipeline/me_ng/
d

daniel

05/09/2022, 11:57 PM
When you run dagit with that workspace.yaml, how long does it take to start up?
j

jasono

05/10/2022, 12:01 AM
it’s been over 2 minutes and still not accessible from the browser
now it responds and opened up in the browser.
Interestingly, I just restarted the daemon with no args and it’s not stopping anymore.
Perhaps running it with that arg somehow fixed the problem.
d

daniel

05/10/2022, 12:04 AM
My guess from what you've described so far is that one of your repository locations is taking a really long time to load
And we could be handling it better in the daemon (especially the fact that there are no useful logs before it fails), but you could also try to figure out which module is very slow and see if there's any way to make it faster
j

jasono

05/10/2022, 12:07 AM
so Dagit executes the repo when it starts?
d

daniel

05/10/2022, 12:07 AM
It loads the module so that it can display your jobs
And the daemon loads your code to check for schedules and sensors
j

jasono

05/10/2022, 12:08 AM
okay, I will try to load each repo module and see which one is slow
They are relatively small files with few dependencies, so I’m surprised loading them would take that long, but I’ll try none the less.
Thanks for looking into this!!!
condagster 1
3 Views