https://dagster.io/ logo
#ask-community
Title
# ask-community
p

Philip Strnad

07/10/2023, 9:56 PM
I recently ran into an issue with the daemon after deploying some new jobs, assets and schedules - the daemon wouldn't run and continually printed the message below. The error would not reproduce locally, so I figured it may be related to DB state (stored in Postgres). After debugging it turned out there were some records in the
instigators
table which caused the error - I removed those records, restarted the daemon and everything was fine again. I figured maybe there had been some inconsistent state or bad records in the DB, but since the daemon was running fine I moved on. A few days ago after another deployment with various changes I got the same error again - except that this time removing those records results in other errors.
Copy code
TypeError: unhashable type: 'list'

Stack Trace:
  File "/Users/philipstrnad/python-virtual-environments/dagster/lib/python3.9/site-packages/dagster/_daemon/daemon.py", line 82, in run_daemon_loop
    result = check.opt_inst(next(daemon_generator), SerializableErrorInfo)
  File "/Users/philipstrnad/python-virtual-environments/dagster/lib/python3.9/site-packages/dagster/_daemon/daemon.py", line 234, in core_loop
    yield from execute_scheduler_iteration_loop(
  File "/Users/philipstrnad/python-virtual-environments/dagster/lib/python3.9/site-packages/dagster/_scheduler/scheduler.py", line 140, in execute_scheduler_iteration_loop
    yield from launch_scheduled_runs(
  File "/Users/philipstrnad/python-virtual-environments/dagster/lib/python3.9/site-packages/dagster/_scheduler/scheduler.py", line 226, in launch_scheduled_runs
    states_to_delete = {
  File "/Users/philipstrnad/python-virtual-environments/dagster/lib/python3.9/site-packages/dagster/_scheduler/scheduler.py", line 227, in <setcomp>
    schedule_state
More details and questions in thread...
Of course I'd rather not be poking around in the db since I don't know the model, but at the time I didn't have a choice. What I'm curious about is whether something I'm doing in terms of deployment is not supported or recommended. For the most part deploying a new Docker image containing our module has always worked - what is process by which Dagster loads objects into the DB? Are there nuances I should be aware of? Will it delete objects that were in the DB previously but are no longer there in the latest version of the module? What about renaming objects, is that supported?
I'm mainly trying to understand how I got into this state, and how I can now recover from it. Thanks!
m

Mat Pataki

07/11/2023, 5:58 PM
We've hit this issue twice now and haven't been using dagster that long yet. Surely this is something others are hitting -- a crash loop from the daemon leading to schedules not running. Dagster has been great otherwise but it's a big deal if we need to keep doing DB surgery to keep it alive. Really hoping someone has insight here
v

Vasco Villas-Boas

08/07/2023, 3:05 PM
Hi, we encountered this issue this weekend. Were you able to get a good sense of the causes / solution?
We are attempting to avoid this problem by requiring our schedules to take in a tuple of cron string, rather than a list.
p

Philip Strnad

08/17/2023, 10:23 PM
It was a little complicated - we ended up having to delete rows from both the
jobs
and
instigators
table. If you're still having issues I could dig up the details, I made notes somewhere.
Sorry, didn't see this message until now
t

Tim Weelinck

09/05/2023, 3:55 PM
Hello Philip, we are running into the same problem with 5 error messages. Can you explain how you identified which records to remove from the db?
p

Pieter Custers

09/05/2023, 4:34 PM
@Philip Strnad or Dagster crew your help will be very much appreciated here, as our schedules are currently not working and we can’t figure out why exactly. It seems it has to do with Dagster trying to:
Copy code
# Remove any schedule states that were previously created with AUTOMATICALLY_RUNNING
# and can no longer be found in the workspace (so that if they are later added
# back again, their timestamps will start at the correct place)
I got this from
dagster/_scheduler/scheduler.py
where the error message originates
d

daniel

09/05/2023, 4:46 PM
Hi all - any chance you'd be able to post or DM the contents of one of the rows for which deleting resolved the problem?
p

Pieter Custers

09/05/2023, 4:48 PM
No rows deleted so far here
d

daniel

09/05/2023, 4:49 PM
OK got it - would passing over a dump of the instigators table be on the table possibly?
p

Pieter Custers

09/05/2023, 5:24 PM
I’ll send it to you in a DM @daniel
d

daniel

09/05/2023, 7:15 PM
This looks like a problem that comes from setting the cron_schedule to a list rather than a string - we'll get a fix out on our side, a workaround in the meantime would be to set the cron_schedule to your schedule as just a string rather than a list
There's a fix for this here: https://github.com/dagster-io/dagster/pull/16323 That I suspect we'll be able to get into the next release on Thursday
p

Philip Strnad

09/05/2023, 9:48 PM
Interesting - when we encountered this error there was definitely at least one cron schedule that was a list instead of string. We still have list cron schedules but haven't hit this error again (yet).
Sorry for the delay Pieter, I had to dig up some old notes. I ended up identifying rows to delete by inserting some debug statements on this line in scheduler.py.
Copy code
for selector_id, schedule_state in all_schedule_states.items():
  if selector_id not in schedules and schedule_state.status == InstigatorStatus.AUTOMATICALLY_RUNNING:
    print(schedule_state.selector_id)
Then used those selector_id's to delete from instigators table and jobs table (although there you have to map selector_id to job name first). Anyway, sounds like you can probably avoid doing this since it seems like there's a fix now?
d

daniel

09/05/2023, 9:52 PM
I'd expect the repro steps to be: • have a schedule like that (with a list cron schedule) • remove it from the codebase • scheduler daemon starts running into trouble
p

Pieter Custers

09/06/2023, 6:38 AM
That sounds correct Daniel, it is exactly what happened (we refactored a repo to make use of Definitions instead of the @repo decorator)
No worries Philip, thanks anyway! Our problem was that we could not reproduce it locally so we did not have a clue which lines to remove. Thanks to Daniel we figured it was about the lines with cron schedules as list, so for reference: we used this query to identify the rows and we didn’t delete them but just adjusted the list to a string
Copy code
select *
from public.instigators
where (instigator_body::json -> 'job_specific_data' -> 'cron_schedule')::varchar like '%[%'
👍 1