https://dagster.io/ logo
Title
r

Rubén Lopez Lozoya

08/09/2021, 3:20 PM
Hey team, I am trying to run a scheduled pipeline over a partition set of some 500 items and I keep getting this error:
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "{"created":"@1628474481.015524097","description":"Error received from peer ipv4:10.88.0.193:3030","file":"src/core/lib/surface/call.cc","file_line":1066,"grpc_message":"Deadline Exceeded","grpc_status":4}"
>

  File "/usr/local/lib/python3.7/site-packages/dagster/scheduler/scheduler.py", line 212, in launch_scheduled_runs_for_schedule
    debug_crash_flags,
  File "/usr/local/lib/python3.7/site-packages/dagster/scheduler/scheduler.py", line 258, in _schedule_runs_at_time
    scheduled_execution_time=schedule_time,
  File "/usr/local/lib/python3.7/site-packages/dagster/core/host_representation/repository_location.py", line 687, in get_external_schedule_execution_data
    scheduled_execution_time,
  File "/usr/local/lib/python3.7/site-packages/dagster/api/snapshot_schedule.py", line 55, in sync_get_external_schedule_execution_data_grpc
    else None,
  File "/usr/local/lib/python3.7/site-packages/dagster/grpc/client.py", line 272, in external_schedule_execution
    external_schedule_execution_args
  File "/usr/local/lib/python3.7/site-packages/dagster/grpc/client.py", line 97, in _streaming_query
    yield from response_stream
  File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 426, in __next__
    return self._next()
  File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 826, in _next
    raise self
I can manually run the backfill over this partition set though 😞
1
d

daniel

08/09/2021, 3:22 PM
Hey ruben - this error typically happens when the schedule function takes more than 60 seconds to execute for some reason. Does that seem plausible / would you expect the schedule function to sometimes take a long time to run?
r

Rubén Lopez Lozoya

08/09/2021, 3:23 PM
Do you mean that the time it takes to load the partition set is longer than 60 secs? Or could there be other things being accounted for
d

daniel

08/09/2021, 3:25 PM
Typically it would just be running whatever code is in your schedule function (to generate the run config) - that doesn't typically load the partition set unless you have code in your schedule function that accesses your partition set directly. Is it possible to share the code in your schedule?
r

Rubén Lopez Lozoya

08/09/2021, 3:49 PM
So the thing is, I tried running the backfill manually and Dagit pod suddenly crashed during pipeline enqueueing time. However, it came back after ~1min and the pipelines were properly enqueued and now they are running. I assume that this crash is what prevents the schedule from working, but when I open the backfill dialog it loads the partition set instantly so I don't know what could be happening. There are like 250 pipelines to be run
d

daniel

08/09/2021, 3:51 PM
cc @prha re: dagit crashing during a backfill. That's surprising to me (Rubén is on 0.11.13) since all the enqueueing should be happening on the daemon rather than dagit now. The only thing I can think of is that enqueueing on the daemon used up some shared resource that dagit was also accessing
p

prha

08/09/2021, 3:58 PM
Yeah, the only thing I can think of is the grpc server becoming unresponsive while calculating the config for the runs to enqueue. When running via the backfill daemon, we try to calculate the run config for a batch of 25 runs at a time.