https://dagster.io/ logo
Title
b

Ben Wilson

04/11/2023, 4:38 PM
When I try to cancel a pipeline I"m running into an error that looks like:
botocore.exceptions.NoCredentialsError: Unable to locate credentials
  File "/usr/local/lib/python3.10/site-packages/dagster_graphql/implementation/execution/__init__.py", line 108, in terminate_pipeline_execution
    instance.run_coordinator.cancel_run(run_id)
  File "/usr/local/lib/python3.10/site-packages/dagster/_core/run_coordinator/queued_run_coordinator.py", line 252, in cancel_run
    return self._instance.run_launcher.terminate(run_id)
  File "/usr/local/lib/python3.10/site-packages/dagster/_core/instance/__init__.py", line 675, in run_launcher
    launcher = cast(InstanceRef, self._ref).run_launcher
  File "/usr/local/lib/python3.10/site-packages/dagster/_core/instance/ref.py", line 491, in run_launcher
    return self.run_launcher_data.rehydrate() if self.run_launcher_data else None
  File "/usr/local/lib/python3.10/site-packages/dagster/_serdes/config_class.py", line 99, in rehydrate
    return klass.from_config_value(self, check.not_none(result.value))
  File "/usr/local/lib/python3.10/site-packages/dagster_aws/ecs/launcher.py", line 311, in from_config_value
    return EcsRunLauncher(inst_data=inst_data, **config_value)
  File "/usr/local/lib/python3.10/site-packages/dagster_aws/ecs/launcher.py", line 127, in __init__
    task_definition = self.ecs.describe_task_definition(taskDefinition=self.task_definition)
  File "/usr/local/lib/python3.10/site-packages/botocore/client.py", line 530, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.10/site-packages/botocore/client.py", line 943, in _make_api_call
    http, parsed_response = self._make_request(
  File "/usr/local/lib/python3.10/site-packages/botocore/client.py", line 966, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File "/usr/local/lib/python3.10/site-packages/botocore/endpoint.py", line 119, in make_request
    return self._send_request(request_dict, operation_model)
  File "/usr/local/lib/python3.10/site-packages/botocore/endpoint.py", line 198, in _send_request
    request = self.create_request(request_dict, operation_model)
  File "/usr/local/lib/python3.10/site-packages/botocore/endpoint.py", line 134, in create_request
    self._event_emitter.emit(
  File "/usr/local/lib/python3.10/site-packages/botocore/hooks.py", line 412, in emit
    return self._emitter.emit(aliased_event_name, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/botocore/hooks.py", line 256, in emit
    return self._emit(event_name, kwargs)
  File "/usr/local/lib/python3.10/site-packages/botocore/hooks.py", line 239, in _emit
    response = handler(**kwargs)
  File "/usr/local/lib/python3.10/site-packages/botocore/signers.py", line 105, in handler
    return self.sign(operation_name, request)
  File "/usr/local/lib/python3.10/site-packages/botocore/signers.py", line 189, in sign
    auth.add_auth(request)
  File "/usr/local/lib/python3.10/site-packages/botocore/auth.py", line 418, in add_auth
    raise NoCredentialsError()
but I'm having trouble narrowing down exactly where to look. I follow that it's an issue with AWS credentials but not sure exactly what dagster is trying to do here. Can anyone offer any insight? Thanks very much in advance!
b

Bernardo Cortez

04/11/2023, 4:46 PM
which dagster version are you using?
b

Ben Wilson

04/11/2023, 4:47 PM
Hi @Bernardo Cortez I'm running 1.1.21
b

Bernardo Cortez

04/11/2023, 4:53 PM
Are you using the default step launcher?
b

Ben Wilson

04/11/2023, 4:53 PM
yes i am
b

Bernardo Cortez

04/11/2023, 6:03 PM
I had a similar problem, that was solved with this https://github.com/dagster-io/dagster/pull/11421
t

Tim Castillo

04/11/2023, 7:39 PM
Hi! Is the pipeline that you're canceling using any AWS resources?
b

Ben Wilson

04/11/2023, 7:41 PM
Hi @Tim Castillo the dags themselves are running on AWS ECS and writing files to a few different S3 buckets. When I let the pipeline run through I am able to see the files in those s3 buckets
t

Tim Castillo

04/11/2023, 7:42 PM
Thanks for the response! Sounds like two possible vectors: • the ECS instances (likely) • the connections that write to the S3 buckets (less likely) let me see what I can dig up about this.
b

Ben Wilson

04/11/2023, 7:43 PM
ok great! Thanks very much for the help
Hi @Tim Castillo I haven't been able to get to the bottom of this issue, Wondering if you were able to find anything?
t

Tim Castillo

04/20/2023, 5:50 PM
Let me know me follow up with the team to see if they can solve this!
b

Ben Wilson

04/20/2023, 5:51 PM
Thank you!
Hi @Tim Castillo I was wondering if you were able to find anything out here?
Hi @Tim Castillo just checking in on the above to see if you were able to uncover anything...I'm totally stumped
t

Tim Castillo

05/03/2023, 3:26 PM
Hi again! Sorry for not replying last time, following up with the team again.
b

Ben Wilson

05/03/2023, 3:28 PM
No problem @Tim Castillo I appreciate the help!
j

johann

05/03/2023, 3:36 PM
@Ben Wilson how do you have dagster deployed and how are you providing aws credentials for it to launch the ECS tasks in the first place?
The first thing that comes to mind is that runs are launched by the daemon process, and cancellations take place in the dagit process. Maybe only the daemon has credentials
b

Ben Wilson

05/03/2023, 4:27 PM
Ah okay thanks @johann I can check on that. I am deploying Dagster on ECS/Fargate with roles defined in terraform but didn't appreciate that separation of duties
Very helpful. I am seeing that I have a task role defined one the daemon defined with the following permissions, but no task role for the task running dagit. Is there a baseline set of permissions I should use for the dagit instance?
j

johann

05/03/2023, 6:08 PM
I don’t think we have a list compiled, you could file an issue for that. I think most users don’t run in to it since they use the docker compose solution https://github.com/dagster-io/dagster/tree/1.3.2/examples/deploy_ecs. I’d imagine that this particular action needs
StopTask
https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonelasticcontainerservice.html
b

Ben Wilson

05/12/2023, 5:09 PM
Posting back in case helpful to someone else. Adding the permissions
"ecs:RunTask",
      "ecs:StopTask",
      "ecs:DescribeTaskDefinition",
      "ecs:DescribeTasks",
      "ecs:ListTasks"
seems to address the issue and allows the dagit process/assigned role to cancel the process. Thanks @johann and @Tim Castillo very much for giving me some helpful guidance along the way!
:thankyou: 2