I've been working on refining the scope of the IAM...
# deployment-ecs
b
I've been working on refining the scope of the IAM policy for the Dagster tasks in ECS. I'm currently encountering this when attempting to launch a job:
Copy code
botocore.errorfactory.AccessDeniedException: An error occurred (AccessDeniedException) when calling the DescribeTaskDefinition operation: User: arn:aws:sts::<ACCT>:assumed-role/<ROLE> is not authorized to perform: ecs:DescribeTaskDefinition on resource: * because no identity-based policy allows the ecs:DescribeTaskDefinition action
This is after restricting
ecs:DescribeTaskDefinition
access from all resources down to the task definition for the ephemeral jobs and with the task definition to launch manually specified in my
EcsRunLauncher
config. Has anyone had any luck pulling the scope in? Is there a reason Dagster should need access
ecs:DescribeTaskDefinition
access to all resources that I'm not seeing? I'm able to proceed for the moment by opening access up again, but DevOps here is less than thrilled
FYI: for anyone trying to rein in the IAM policy on their task definitions, AWS doesn't currently allow resource-level restrictions on
DescribeTaskDefinition
requests. Thanks @daniel for the help 🙂
d
Hey Brendan, do you have a stack trace for the boto failure?
and what version of dagster is this?
b
Sure thing!
Copy code
File "/opt/venv/lib/python3.10/site-packages/dagster_graphql/implementation/utils.py", line 125, in _fn
    return fn(*args, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/dagster_graphql/implementation/execution/launch_execution.py", line 29, in launch_pipeline_reexecution
    return _launch_pipeline_execution(graphene_info, execution_params, is_reexecuted=True)
  File "/opt/venv/lib/python3.10/site-packages/dagster_graphql/implementation/execution/launch_execution.py", line 72, in _launch_pipeline_execution
    run = do_launch(graphene_info, execution_params, is_reexecuted)
  File "/opt/venv/lib/python3.10/site-packages/dagster_graphql/implementation/execution/launch_execution.py", line 56, in do_launch
    return graphene_info.context.instance.submit_run(
  File "/opt/venv/lib/python3.10/site-packages/dagster/_core/instance/__init__.py", line 1913, in submit_run
    submitted_run = self._run_coordinator.submit_run(
  File "/opt/venv/lib/python3.10/site-packages/dagster/_core/run_coordinator/default_run_coordinator.py", line 34, in submit_run
    self._instance.launch_run(pipeline_run.run_id, context.workspace)
  File "/opt/venv/lib/python3.10/site-packages/dagster/_core/instance/__init__.py", line 1966, in launch_run
    self.run_launcher.launch_run(LaunchRunContext(pipeline_run=run, workspace=workspace))
  File "/opt/venv/lib/python3.10/site-packages/dagster/_core/instance/__init__.py", line 659, in run_launcher
    launcher = cast(InstanceRef, self._ref).run_launcher
  File "/opt/venv/lib/python3.10/site-packages/dagster/_core/instance/ref.py", line 491, in run_launcher
    return self.run_launcher_data.rehydrate() if self.run_launcher_data else None
  File "/opt/venv/lib/python3.10/site-packages/dagster/_serdes/config_class.py", line 101, in rehydrate
    return klass.from_config_value(self, check.not_none(result.value))
  File "/opt/venv/lib/python3.10/site-packages/dagster_aws/ecs/launcher.py", line 277, in from_config_value
    return EcsRunLauncher(inst_data=inst_data, **config_value)
  File "/opt/venv/lib/python3.10/site-packages/dagster_aws/ecs/launcher.py", line 125, in __init__
    task_definition = self.ecs.describe_task_definition(taskDefinition=self.task_definition)
  File "/opt/venv/lib/python3.10/site-packages/botocore/client.py", line 508, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/opt/venv/lib/python3.10/site-packages/botocore/client.py", line 915, in _make_api_call
    raise error_class(parsed_response, operation_name)
And this is on
1.1.14
d
Ah so it looks like it's describing the task definition that you explicitly set in the config
if you allow just that one does it still complain?
b
Yep, that's what I'd tried, just a sec - I'll drop the policy statement here
Copy code
statement {
    sid = "TaskDefinitionPermissions"
    actions = [
      "ecs:DescribeTaskDefinition"
    ]
    resources = [
      "arn:aws:ecs:us-west-2:${local.account_id}:task-definition/${var.env}-dagster-job:*"
    ]
  }
d
and what's the task definition that you set in the run launcher config?
b
I'm defining it via an ENV var right now,
Copy code
{
  "name": "DAGSTER_JOB_TASK_DEFINITION",
  "value": "arn:aws:ecs:us-west-2:<ACCT>:task-definition/dev-dagster-job:15"
}
Here's the actual policy from IAM (instead of terraform):
Copy code
{
  "Action": "ecs:DescribeTaskDefinition",
  "Effect": "Allow",
  "Resource": "arn:aws:ecs:us-west-2:<ACCT>:task-definition/dev-dagster-job:*",
  "Sid": "TaskDefinitionPermissions"
}
d
hmmm maybe you can sanity check in python leaving dagster out of the mix to start? If you have the IAM set up the way you want i'd expect this to not complain:
Copy code
import boto3
boto3.client("ecs").describe_task_definition(taskDefinition="arn:aws:ecs:us-west-2:<ACCT>:task-definition/dev-dagster-job:15")
b
I don't have a great way to test it, but I can inject that into my Dagster deployment and log it...will drop the results here when I get them 🙂
d
It is odd that the error message says " on resource: *"...
you can see right from the stack trace that it's being called on a specific task_definition string
Copy code
task_definition = self.ecs.describe_task_definition(taskDefinition=self.task_definition)
might just be a weird boto3 error message, wouldn't be the first time
b
I see it in CloudTrail too
I'd missed the call in the stack trace facepalm
maybe a buggy
boto3
call
d
you know what, i think this might just be an AWS restriction
"Task definition IAM policies do not support resource-level permissions,"
b
👀
d
that said it's not clear that dagster actually 100% needs to call that API method if you're already passing in an arn
b
That was my first thought, was a bit surprised though not exactly troubled
I may be able to scope this down with a condition, will let you know
d
i think its mainly to translate it to an ARN if its a short name
and double-check that it has the right container that dagster is expecting
b
Looks like there aren't any condition keys that we can use on this one as well, so no real way to rein this one in. Found this on this page from the docs: https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonelasticcontainerservice.html