Good day. I'm currently deploying dagster with the...
# deployment-ecs
f
Good day. I'm currently deploying dagster with the ecs launcher and the task resource tags on my relevant jobs. However these tags seem to on reflect within the Dagit UI, but doesn't effect the actual task(container) that has been created by ecs run launcher. Here is the tags being assigned in a job:
Copy code
@job(tags = {
    "ecs/cpu": "1024",
    "ecs/memory": "2048"}
    )
I've attached a screenshot of the Dagit UI reflecting the resources and also what the task(container) is currently showing/using. Any idea of what else I can try to get the tags applied to the tasks?
@Randy Coburn
j
What version of dagster are you running?
r
0.14.1
j
I think this is possibly a limitation of the AWS UI when setting memory/cpu via container overrides? I just created an ECS cluster on Dagster 0.14.1. When I launch a run with the same tags you’ve provided, the AWS UI indeed doesn’t show the larger memory/cpu allocation. But when I describe the task via the AWS API, I see the larger memory/cpu allocation. Can you check:
Copy code
import boto3
ecs = boto3.client("ecs")
task = ecs.describe_tasks(tasks=[TASK_ID], cluster=CLUSTER_ID)["tasks"][0]
print(task["memory"], task["cpu"])
Does that show the cpu and memory you’d expect?
r
The values that you are requesting are correct. However the containers in the task still show the incorrect values.
I'll make a slimdown version of the response to show you what I mean
j
What about the container overrides? That’s what we’re setting - I suspect the container values might reflect what’s in the task definition.
r
This is the slimmed down version of the full response:
Copy code
{
  'containers': [
    {
      'exitCode': 0,
      'reason': 'OutOfMemoryError: Container killed due to memory usage',
      'cpu': '256',
      'memory': '512'
    }
  ],
  'cpu': '1024',
  'memory': '4096',
  'overrides': {
    'containerOverrides': [
      {
        'name': 'code-container',
      }
    ],
    'cpu': '1024',
    'memory': '4096'
  },
  'ephemeralStorage': {
    'sizeInGiB': 20
  }
}
What you are telling me is true. The overrides are there, however the container itself appears to have the wrong allocations. We tried to set the memory to 10gb to get this task to run. but it still failed due to out of memory exceptions.
j
what’s the name of the container that got killed? also
code-container
?
aka is it mapping the override to the right name?
oh wait i see now - the overrides aren’t in the individual container
r
yea
I think what we see in the AWS Console is the containers on the tasks.
j
https://github.com/dagster-io/dagster/blob/6ac63490cb4ae68143c4bfb00a049431ec600756/python_modules/libraries/dagster-aws/dagster_aws/ecs/launcher.py#L158-L159 my guess is that this splatted
**overrides
needs to move up a line - i’ll test real quickly and get a fix PRed
r
No, I think it needs to move inside the container overrides
j
yeah that’s what i mean by moving it up a line - into the containerOverrides value instead of splatting the items at the same level
r
https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_ContainerOverride.html This document seems to suggest that the ContainerOverride also has a CPU and Memory field.
Yea, that 🙂
I'll admit this thing is mighty confusing
plus1 1
j
https://aws.amazon.com/blogs/containers/how-amazon-ecs-manages-cpu-and-memory-resources/
There are two general rules of thumb with containers:
• unless otherwise restricted and capped, a container that gets started on a given host (operating system) gets access to all the CPU and memory capacity available on that host.
• unless otherwise protected and guaranteed, all containers running on a given host (operating system) share CPU, memory, and other resources in the same way that other processes running on that host share those resources.
My read of this is if you have a task definition with no constraints, it’ll get access to all of the memory/cpu available to the task. If your task definition has constraints, it’ll observe those constaints. Are you providing a custom task definition? I suspect what’s happening here is that the memory/cpu overrides work with the dagster-generated task definitions but not with custom ones. Either way, I’m going to open a PR that applies the overrides to both the task itself and to the individual container that the run launches inside of.
r
Right, I see it. So the Task itself has a limit that we have set in our case 4096. But, the container has a HARD limit of 512. So it would never be able to take advantage of the memory in the task space.
I think you may actually need those overrides in BOTH places.
in the task to expand the space available and in the container to allow the container to use all available resource in the task.
j
yeah - i’m going to set it in both places
👍 1
https://github.com/dagster-io/dagster/pull/6836 thanks for bringing this to our attention! - we’ll get this released with Thursday’s 0.14.3 release.
r
wooo
How long does a release normally take so I can back burner this for a while?
j
we release every Thursday - usually mid-afternoon Pacific time. you can follow #dagster-releases to see when it goes live.
r
👍 Thank you