When specifying the `task_definition_arn` on `DAGS...
# deployment-ecs
b
When specifying the
task_definition_arn
on
DAGSTER_CONTAINER_CONTEXT
, do I need to specify the task revision? Currently hitting
botocore.errorfactory.ClientException: An error occurred (ClientException) when calling the RunTask operation: TaskDefinition not found.
without the task rev.
Also any way to see + logs to check what was the task def name used?
j
Could you share the full stack trace for
TaskDefinition not found.
? Wondering if it’s in
launch_run
b
It is
Copy code
botocore.errorfactory.InvalidParameterException: An error occurred (InvalidParameterException) when calling the RunTask operation: TaskDefinition not found.
  File "/usr/local/lib/python3.10/site-packages/dagster/_daemon/run_coordinator/queued_run_coordinator_daemon.py", line 335, in _dequeue_run
    instance.run_launcher.launch_run(LaunchRunContext(dagster_run=run, workspace=workspace))
  File "/usr/local/lib/python3.10/site-packages/dagster_aws/ecs/launcher.py", line 394, in launch_run
    response = self.ecs.run_task(**run_task_kwargs)
  File "/usr/local/lib/python3.10/site-packages/botocore/client.py", line 530, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.10/site-packages/ddtrace/contrib/botocore/patch.py", line 377, in patched_api_call
    result = original_func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/botocore/client.py", line 960, in _make_api_call
    raise error_class(parsed_response, operation_name)
j
And what dagster version?
b
Latest!
Lemme confirm
Copy code
dagit = "1.1.21" # "1.0.12"
dagster = "1.1.21" # "1.0.12" # "0.15.5"
dagster-aws = "0.17.21" # "0.16.12"
dagster-docker = "0.17.21" # "0.16.12"
dagster-mysql = "0.17.21" # "0.16.12"
j
And how are you deployed? Are you using https://github.com/dagster-io/dagster/tree/1.2.1/examples/deploy_ecs or something else?
b
so we have a mix of terraformed ecs config and deploys through circleci
we have been using
0.15.9
before with one code location and its custom task def, but then upgraded to
1.1.21
to use 2 code locations each with its custom task def
so I’m trying to get this setup
j
Could you share what you’re passing for container context? (with anything sensitive removed)
b
yep
Copy code
ENV TASK_DEF_ARN=arn:aws:ecs:us-west-2:<redacted>:task-definition/$ENVIRONMENT-ml-workflows-runs
ENV CONTAINER_NAME=$ENVIRONMENT-ml-workflows-runs
ENV DAGSTER_CONTAINER_CONTEXT='{"ecs":{"task_definition_arn":"'$TASK_DEF_ARN'","container_name":"'$CONTAINER_NAME'"}}'
EXPOSE 4000
Screen Shot 2023-03-13 at 17.35.27.png
(we print
DagsterInstance.get().info_dict()
on startup) ⬆️
I tried adding the task def version too but it looks as thought it didnt have any effect.
We also have that policy allowing dagsterdaemon to describe task defs:
Copy code
# Account wide settings. These resources cannot be filtered.
      {
        Action = [
          "ec2:DescribeNetworkInterfaces",
          "ecs:ListAccountSettings",
          "ecs:DescribeTaskDefinition",
          "ecs:RegisterTaskDefinition",
          "secretsmanager:ListSecrets"
        ],
        Resource = [
          "*"
        ],
        Effect = "Allow"
      },
Any way I can increase logs to have more information here?
j
Not currently unfortunately. So you have a
qa-ml-workflows-runs
task def specified in your run launcher, and in the container context of one grpc server you’re overriding that with another?
What happens if you don’t pass the container context (and therefore use the task def configured on the run launcher)?
b
I think I’ve removed the one in the run_launcher
And left just the container context
Issue here is that the run launcher gives one single task def and we would like each code repository to have its own task def
If I successfully remove the one from the run_launcher, it wont show here right https://dagster.slack.com/archives/C014UDS8LAV/p1678739857226159?thread_ts=1678727160.406249&amp;cid=C014UDS8LAV?
j
That’s correct. I was just wondering if settting the task def via the run launcher was working?
b
Oh yes!
It works -we’ve been using 4ever and the upgrade didnt change that
j
Got it. I’m curious if you set the new task def (that you’re trying to use in the container context) on the run launcher if that will work, or if it will fail with the task def not found. That would isolate it to be something about the new task def, vs something about container context
❤️ 1
b
Oooh gotcha! I just tried that and it looks like it’s working.
Screen Shot 2023-03-14 at 17.48.57.png
It failed for some other unrelated reason but it grabbed the task def!
This is just the
run_launcher
setting, without the context override.
j
strange. So the same
task_definition_arn
works when set in the run launcher, but not in container context?
b
Okay so I think I found the problem - the questions and walkthrough were super helpful btw. I checked in another env other than the one I’ve been messing around and looks like using a regular docker
ENV
instead of
ONBUILD ENV
might have caused the issue
I am also wondering if we are using the task_def for the wrong purpose cause it looks like we just want to override the container image / make sure its grabbing DAGSTER_CURRENT_IMAGE from the current repo.
Can I tweak the start timeout from 180s to another time? 🤔
D 1
a
Hi Bianca,
ENV DAGSTER_GRPC_TIMEOUT_SECONDS=300
This environment variable should do this if I am not mistaking! I also use this variable to increase the schedule evaluation period :)
❤️ 1