Ben Jordan
04/06/2022, 5:57 PMbotocore.errorfactory.ClientException: An error occurred (ClientException) when calling the RunTask operation: ECS was unable to assume the role 'arn:aws:iam::<aws_account_id>:role/portal-DaemonTaskRole-1SLWIFK9HLFR0'
which seems as though it wants to pass a role that doesn't exist in IAM (there's a different DaemonTaskRole
there now) - is this correct? Should this new role be created for the run or is there a cached role_id somewhere? Thanks for all your help!johann
04/07/2022, 1:58 PMBen Jordan
04/07/2022, 1:58 PMjohann
04/07/2022, 2:08 PM- Effect: "Allow"
Action:
- "iam:PassRole"
on dagit and the daemon so that the run tasks that we spin up will have the same rolerole/portal-DaemonTaskRole
is what you specified for the daemon?Ben Jordan
04/07/2022, 2:10 PMjohann
04/07/2022, 2:11 PMBen Jordan
04/07/2022, 2:17 PMdocker-compose.yml
files are extremely similar, any idea where to start?johann
04/07/2022, 2:22 PMBen Jordan
04/07/2022, 2:26 PMdocker context
as my code, and got the same errordocker context
and the incorrect role ARN is the same, so it does not appear to be related to the context. Next I will delete the ECR repositories and recreate to see if that worksEcsRunLauncher
get its secrets from?johann
04/07/2022, 6:05 PMBen Jordan
04/07/2022, 6:06 PMdocker compose
as in the example), this child TaskDefinition is still active.
As the roles are provisioned when the instance is built, a subsequent rebuild creates a new set of roles with new ARNs. When a new run is launched, the orphaned TaskDefinition with the expired ARN generates a new Task with the error I mentioned previously: botocore.errorfactory.ClientException: An error occurred (ClientException) when calling the RunTask operation: ECS was unable to assume the role
I was able to allow a new run to complete by deregistering the orphaned TaskDefinition - after this, a new run still uses this TaskDefinition but with a new revision with the current, correct ARNs.
It seems to me that the docker compose down
should deregister this TaskDefinition, as it does for the other services (dagit, daemon...)docker compose up
• Launch a run
• docker compose down
• check ECS TaskDefinition is still active: user_code
• docker compose up
• compare ARNs in IAM vs the TaskDefinition above
• Launch a run - it fails with the botocore error
• Deregister the TaskDefinition
• Launch a run - check the TaskDefinition to see the ARN has updated (you can compare to previous revisions as well)
• Task now completes as the correct ARN is passed to the Taskjohann
04/08/2022, 3:25 PMBen Jordan
04/08/2022, 3:25 PM