https://dagster.io/ logo
#announcements
Title
# announcements
a

AXue

09/21/2020, 7:17 PM
Good morning team, I got this error on the dagster UI, and was told I should probably report this error. Do you have any clues how this happened?
Copy code
Message: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "failed to connect to all addresses"
	debug_error_string = "{"created":"@1600715637.535944837","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3948,"referenced_errors":[{"created":"@1600715637.535942275","description":"failed to connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":394,"grpc_status":14}]}"
>

Path: ["pipelineRunsOrError","results",0,"canTerminate"]

Locations: [{"line":26,"column":3}]

Stack Trace:
  File "{******}/env/lib/python3.6/site-packages/promise/promise.py", line 844, in handle_future_result
    resolve(future.result())
  File "{******}/env/lib/python3.6/site-packages/grpc/_channel.py", line 333, in result
    raise self
  File "{******}/env/lib/python3.6/site-packages/graphql/execution/executor.py", line 452, in resolve_or_error
    return executor.execute(resolve_fn, source, info, **args)
  File "{******}/env/lib/python3.6/site-packages/graphql/execution/executors/sync.py", line 16, in execute
    return fn(*args, **kwargs)
  File "{******}/env/lib/python3.6/site-packages/dagster_graphql/schema/runs.py", line 206, in resolve_canTerminate
    return graphene_info.context.instance.run_launcher.can_terminate(self.run_id)
  File "{******}/env/lib/python3.6/site-packages/dagster/core/launcher/default_run_launcher.py", line 50, in can_terminate
    ) or self._grpc_run_launcher.can_terminate(run_id)
  File "{******}/env/lib/python3.6/site-packages/dagster/core/launcher/grpc_run_launcher.py", line 139, in can_terminate
    res = client.can_cancel_execution(CanCancelExecutionRequest(run_id=run_id))
  File "{******}/env/lib/python3.6/site-packages/dagster/grpc/client.py", line 357, in can_cancel_execution
    can_cancel_execution_request
  File "{******}/env/lib/python3.6/site-packages/dagster/grpc/client.py", line 72, in _query
    response = getattr(stub, method)(request_type(**kwargs))
  File "{******}/env/lib/python3.6/site-packages/grpc/_channel.py", line 826, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "{******}/env/lib/python3.6/site-packages/grpc/_channel.py", line 729, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
a

alex

09/21/2020, 7:23 PM
what version are you on? I believe this bug was fixed on in
0.9.8
or
0.9.9
cc @daniel
d

daniel

09/21/2020, 7:23 PM
Hi, could I ask for some information about your setup / gRPC server? Are you deploying your own server? It looks like Dagit is attempting to connect to it but is unable to.
a

AXue

09/21/2020, 7:31 PM
@alex I’m on 0.9.8 @daniel Yes the instance is deployed on our own server.
d

daniel

09/21/2020, 7:31 PM
I see - and the server is currently running?
a

AXue

09/21/2020, 7:31 PM
yes
I can ssh with no problem
d

daniel

09/21/2020, 7:32 PM
Where exactly are you seeing the error? On specific runs / the runs page / all newly created runs?
a

AXue

09/21/2020, 7:33 PM
On a runs page of a specific pipeline
Which I just initiated a few runs earlier today
Other pipelines seem to be fine
d

daniel

09/21/2020, 7:35 PM
I see. Is it possible that it's checking can_terminate on an older run that ran against a previous server that's no longer running? For example, on another port. (This would still be an error on our end to fix, just making sure I understand the problem)
a

AXue

09/21/2020, 7:35 PM
The Overview page of the same pipeline raises the same error.
I would doubt it 👆I don’t think the server config has been changed lately at all 🤔
d

daniel

09/21/2020, 7:39 PM
got it - ok, thanks for the information, I'll take a look and see if I can figure out what's going on
a

AXue

09/21/2020, 7:39 PM
Thank you a lot! I’ll double check with my teammate on the server details to see if it’s something we can fix on our end
d

daniel

09/21/2020, 7:42 PM
One thing that would be very useful for debugging would be if there's any way you'd be able to send the result of SELECT * FROM run_tags in your run storage table - but if that's confidental or anything like that, no problem
a

AXue

09/21/2020, 7:45 PM
Yeah, sorry I don’t think I would be able to do that 😞.
d

daniel

09/21/2020, 7:52 PM
No problem. So you're able to create a new run right now against your server? Just not on that specific pipeline. The error really does look a connection issue, so I'm confused. One thing that I'm going to do is make it so that a connection issue only affects the specific run though - it definitely shouldn't be keeping you from loading the runs page or the overview page
a

AXue

09/21/2020, 8:00 PM
Yes, I could initiate runs of other pipelines on the same server as normal. Please let me get back to you later to see if something’s wrong with the connection.
Hi @daniel, it turned out that the rabbitmq and flower ports on the server has been updated. I don’t think my pipeline is using these resources, though. But just wondering if there is a way to force terminate the older run that got stuck?
If I understand correctly, that one stuck run is blocking me from loading the whole “runs” page?
d

daniel

09/21/2020, 9:41 PM
Yeah, that one stuck run is failing the page currently - that's a bug on our side, I just put out a fix for it and it'll be fixed in our next release on the 24th. to unblock you until that happens, do you happen to know the ID of the bad run?
if you're able to query the runs table, it would be something like SELECT run_id FROM runs WHERE run_status = "STARTED" (while no other runs are running)
👍 1
a

AXue

09/21/2020, 9:46 PM
Thanks, let me give it a try