https://dagster.io/ logo
Title
c

Charles Lariviere

07/20/2021, 8:53 PM
We re-deployed Dagster from Kubernetes to AWS ECS (with the same Postgres database) and are now getting unexpected GraphQL errors when looking at the “Runs” tab. Could that be related to the fact that both deployments were running in parallel with the same database for a few hours? (See thread for error message)
Operation name: RunsRootQuery

Message: list index out of range

Path: ["pipelineRunsOrError","results",1,"canTerminate"]

Locations: [{"line":36,"column":3}]

Stack Trace:
  File "/usr/local/lib/python3.8/site-packages/graphql/execution/executor.py", line 452, in resolve_or_error
    return executor.execute(resolve_fn, source, info, **args)
  File "/usr/local/lib/python3.8/site-packages/graphql/execution/executors/sync.py", line 16, in execute
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/dagster_graphql/schema/pipelines/pipeline.py", line 274, in resolve_canTerminate
    return graphene_info.context.instance.run_coordinator.can_cancel_run(self.run_id)
  File "/usr/local/lib/python3.8/site-packages/dagster/core/run_coordinator/queued_run_coordinator.py", line 107, in can_cancel_run
    return self._instance.run_launcher.can_terminate(run_id)
  File "/usr/local/lib/python3.8/site-packages/dagster_aws/ecs/launcher.py", line 148, in can_terminate
    status = self.ecs.describe_tasks(tasks=[arn], cluster=cluster)["tasks"][0]["lastStatus"]
j

jordan

07/20/2021, 9:01 PM
Hi @Charles Lariviere - I assume this is the same as https://github.com/dagster-io/dagster/issues/4377?
c

Charles Lariviere

07/20/2021, 9:03 PM
Hmm, it sounds similar, but in our case this is only happening on the
All Runs
,
Queued
, and
In Progress
tabs of the “Runs” section in Dagit. Nowhere else so far!
I thought it might be related to the fact that we boot up a new Postgres database for our ECS deployment from a snapshot of the other Postgres database we used on the kubernetes deployment. We’re trying to re-boot a new one from a newer snapshot 🤞
j

jordan

07/20/2021, 9:05 PM
Ah, interesting. I thought Matthias might also be part of your organization. Still, good data point - now that it’s happened twice 😅 Let me dig in a bit.
🙏 1
What version of Dagster?
c

Charles Lariviere

07/20/2021, 9:05 PM
0.11.15
👍 1
I’ll also clarify that both deployments were not running on the same database — that was my mistake. We actually spin up a new one from a snapshot for the ECS deployment.
j

jordan

07/20/2021, 9:11 PM
👍 my hope is that the database cutover is a red herring - but we’ll see
👍 1
What does your dagit task’s TaskRole look like? And perhaps a silly question, but your dagit task is running in the same ECS cluster as all of your other tasks, right?
c

Charles Lariviere

07/20/2021, 9:34 PM
What does your dagit task’s TaskRole look like?
Our TaskRole is currently none — should it be set to something?
And perhaps a silly question, but your dagit task is running in the same ECS cluster as all of your other tasks, right?
That’s right! 😄
j

jordan

07/20/2021, 9:38 PM
Perfect - that’s what I suspected. I’m about to have a change up for review that should fix this for this Thusrday’s 0.12.3 release. In short, dagit needs permission to describe ECS tasks 😄. If you’re using the reference deployment for ECS (or your own deployment modeled off of it), we weren’t including those permissions. I’ll tag you once I have a change up.
c

Charles Lariviere

07/20/2021, 9:39 PM
Awesome — thank you for the quick resolution! 🙏
j

jordan

07/20/2021, 9:53 PM
https://dagster.phacility.com/D8977 and https://dagster.phacility.com/D8978 I think those two should fix things. I haven’t quite reasoned through why
self.ecs.describe_tasks(tasks=[arn], cluster=cluster)
is coming back with an empty
tasks
key sometimes - I suspect if we were logging the entire response, we’d see something in the
failures
key indicating why. Look for these in the 0.12.3 release Thursday and keep us posted on how things are going with the transition to ECS!
🙌 1