https://dagster.io/ logo
#deployment-ecs
Title
# deployment-ecs
f

Francois-DE

02/28/2022, 10:35 AM
Good day. I'm currently deploying dagster with the ecs launcher and the task resource tags on my relevant jobs. However these tags seem to on reflect within the Dagit UI, but doesn't effect the actual task(container) that has been created by ecs run launcher. Here is the tags being assigned in a job:
Copy code
@job(tags = {
    "ecs/cpu": "1024",
    "ecs/memory": "2048"}
    )
I've attached a screenshot of the Dagit UI reflecting the resources and also what the task(container) is currently showing/using. Any idea of what else I can try to get the tags applied to the tasks?
@Randy Coburn
j

jordan

02/28/2022, 3:00 PM
What version of dagster are you running?
r

Randy Coburn

02/28/2022, 3:41 PM
0.14.1
j

jordan

02/28/2022, 4:44 PM
I think this is possibly a limitation of the AWS UI when setting memory/cpu via container overrides? I just created an ECS cluster on Dagster 0.14.1. When I launch a run with the same tags you’ve provided, the AWS UI indeed doesn’t show the larger memory/cpu allocation. But when I describe the task via the AWS API, I see the larger memory/cpu allocation. Can you check:
Copy code
import boto3
ecs = boto3.client("ecs")
task = ecs.describe_tasks(tasks=[TASK_ID], cluster=CLUSTER_ID)["tasks"][0]
print(task["memory"], task["cpu"])
Does that show the cpu and memory you’d expect?
r

Randy Coburn

02/28/2022, 5:26 PM
The values that you are requesting are correct. However the containers in the task still show the incorrect values.
I'll make a slimdown version of the response to show you what I mean
j

jordan

02/28/2022, 5:29 PM
What about the container overrides? That’s what we’re setting - I suspect the container values might reflect what’s in the task definition.
r

Randy Coburn

02/28/2022, 5:31 PM
This is the slimmed down version of the full response:
Copy code
{
  'containers': [
    {
      'exitCode': 0,
      'reason': 'OutOfMemoryError: Container killed due to memory usage',
      'cpu': '256',
      'memory': '512'
    }
  ],
  'cpu': '1024',
  'memory': '4096',
  'overrides': {
    'containerOverrides': [
      {
        'name': 'code-container',
      }
    ],
    'cpu': '1024',
    'memory': '4096'
  },
  'ephemeralStorage': {
    'sizeInGiB': 20
  }
}
What you are telling me is true. The overrides are there, however the container itself appears to have the wrong allocations. We tried to set the memory to 10gb to get this task to run. but it still failed due to out of memory exceptions.
j

jordan

02/28/2022, 5:38 PM
what’s the name of the container that got killed? also
code-container
?
aka is it mapping the override to the right name?
oh wait i see now - the overrides aren’t in the individual container
r

Randy Coburn

02/28/2022, 5:39 PM
yea
I think what we see in the AWS Console is the containers on the tasks.
j

jordan

02/28/2022, 5:42 PM
https://github.com/dagster-io/dagster/blob/6ac63490cb4ae68143c4bfb00a049431ec600756/python_modules/libraries/dagster-aws/dagster_aws/ecs/launcher.py#L158-L159 my guess is that this splatted
**overrides
needs to move up a line - i’ll test real quickly and get a fix PRed
r

Randy Coburn

02/28/2022, 5:43 PM
No, I think it needs to move inside the container overrides
j

jordan

02/28/2022, 5:44 PM
yeah that’s what i mean by moving it up a line - into the containerOverrides value instead of splatting the items at the same level
r

Randy Coburn

02/28/2022, 5:44 PM
https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_ContainerOverride.html This document seems to suggest that the ContainerOverride also has a CPU and Memory field.
Yea, that 🙂
I'll admit this thing is mighty confusing
plus1 1
j

jordan

02/28/2022, 5:56 PM
https://aws.amazon.com/blogs/containers/how-amazon-ecs-manages-cpu-and-memory-resources/
There are two general rules of thumb with containers:
• unless otherwise restricted and capped, a container that gets started on a given host (operating system) gets access to all the CPU and memory capacity available on that host.
• unless otherwise protected and guaranteed, all containers running on a given host (operating system) share CPU, memory, and other resources in the same way that other processes running on that host share those resources.
My read of this is if you have a task definition with no constraints, it’ll get access to all of the memory/cpu available to the task. If your task definition has constraints, it’ll observe those constaints. Are you providing a custom task definition? I suspect what’s happening here is that the memory/cpu overrides work with the dagster-generated task definitions but not with custom ones. Either way, I’m going to open a PR that applies the overrides to both the task itself and to the individual container that the run launches inside of.
r

Randy Coburn

02/28/2022, 6:03 PM
Right, I see it. So the Task itself has a limit that we have set in our case 4096. But, the container has a HARD limit of 512. So it would never be able to take advantage of the memory in the task space.
I think you may actually need those overrides in BOTH places.
in the task to expand the space available and in the container to allow the container to use all available resource in the task.
j

jordan

02/28/2022, 6:08 PM
yeah - i’m going to set it in both places
👍 1
https://github.com/dagster-io/dagster/pull/6836 thanks for bringing this to our attention! - we’ll get this released with Thursday’s 0.14.3 release.
r

Randy Coburn

02/28/2022, 6:25 PM
wooo
How long does a release normally take so I can back burner this for a while?
j

jordan

02/28/2022, 6:27 PM
we release every Thursday - usually mid-afternoon Pacific time. you can follow #dagster-releases to see when it goes live.
r

Randy Coburn

02/28/2022, 6:33 PM
👍 Thank you
6 Views