So I got this up and running in AWS but when I tri...
# deployment-ecs
j
So I got this up and running in AWS but when I tried to run my first pipeline from dagit it hangs on the first message:
Copy code
[EcsRunLauncher] Launching run in task arn:aws:ecs:us-east-1:...:task/dagster-cluster/... on cluster arn:aws:ecs:us-east-1:...:cluster/dagster-cluster
It’s been several minutes now and nothing has happened. I’ll have to terminate it manually. I checked the cloudwatch logs and I saw a new
run
log stream that gave an error after 3 lines:
Copy code
Error: Got unexpected extra arguments (dagster api execute_run {...})
this is an abbreviated version of the error, it lists all the arguments (which are many) but it doesn’t give any indication which arguments are extra
j
We’ve seen something similar in the past when the image has an entrypoint configured: https://dagster.slack.com/archives/C01U954MEER/p1625582594413200?thread_ts=1625563476.407600&cid=C01U954MEER It seems to try to run both the entrypoint followed by the overridden command.
j
hmm … no entrypoint in my pipeline dockerfile but there is a
CMD
that runs the grpc server
dagster api grpc …
could that cause an issue? I can’t imagine I can get around having that in the dockerfile.
well, I suppose I could pass it in as part of the task definition instead …
j
Hm. It’s only using https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_ContainerOverride.html which should override the default CMD based on what the docs say. But admittedly I’ve only tried it with the command being set by the Task Definition and not in the Dockerfile.
But the symptom remains fairly consistent with it somehow chaining your startup command (whether it be from
CMD
or
ENTRYPOINT
) with the overridden command.
Got unexpected extra arguments
is actually coming from Click https://github.com/pallets/click/blob/ddcabdd4f22f438c0c963150b930d0d09b04dea7/src/click/core.py#L1385 And all of our Dagster CLIs are written with the framework. So I suspect it’s actually doing
dagster api grpc ... dagster api execute_run ...
and seeing the overridden command as extra arguments to the grpc command.
j
I pulled the
CMD
out of the dockerfile, and put it in the task definition, and redeployed the pipeline container, but to no avail. I get the same error
j
Can I see your Task Definition and Dockerfile?
j
I’m using Pulumi to deploy my infrastructure. Here’s the task definition:
Copy code
ecs_task_definition_meltano = aws.ecs.TaskDefinition(
    resource_name='ecs_task_definition_meltano',
    family='meltano-fargate-task',
    memory='512',
    cpu='256',
    network_mode='awsvpc',
    requires_compatibilities=['FARGATE'],
    execution_role_arn=iam_role_task_execution.arn,
    task_role_arn=iam_role_task.arn,
    container_definitions=Output.all(
        ecr_meltano=ecr_registry_meltano.repository_url,
        grp_name=logs_dagster.name,
    ).apply(
        lambda args: json.dumps(
            [
                {
                    'name': 'wdt_meltano',
                    'image': args['ecr_meltano'],
                    'command': ["dagster", "api", "grpc", "-h", "0.0.0.0", "-p", "4000", "-d", "orchestrate",
                                "--python-file", "orchestrate/dagster_repos.py"],
                    'environment': [
                        {'name': 'DAGSTER_CURRENT_IMAGE', 'value': args['ecr_meltano']},
                        {"name": "DAGSTER_POSTGRES_HOSTNAME", "value": "elt_system_db"}
                    ],
                    'secrets': dagster_postgres_secrets,
                    "logConfiguration": log_config(args['grp_name']),
                    "dependsOn": [{"condition": "SUCCESS", "containerName": "meltano_resolvconf_initcontainer"}],
                    "essential": True
                },
                {
                    'name': 'meltano_resolvconf_initcontainer',
                    'image': "docker/ecs-searchdomain-sidecar:1.0",
                    'command': ['us-east-1.compute.internal', 'dagster.prod'],
                    "logConfiguration": log_config(args['grp_name']),
                    "essential": False
                },
            ]
        )
    ),
)
Here’s the dockerfile:
Copy code
FROM python:3.7-slim

ENV DAGSTER_APP=/opt/dagster/app
WORKDIR $DAGSTER_APP

# Install any requirements
RUN apt-get -y update
RUN apt-get -y install git gcc
COPY orchestrate/pipelines.requirements.txt .
RUN pip install -r pipelines.requirements.txt

# Set $DAGSTER_HOME and copy dagster instance there
ENV DAGSTER_HOME=/opt/dagster/dagster_home
RUN mkdir -p $DAGSTER_HOME

# Add repository code
COPY meltano.yml $DAGSTER_APP
COPY orchestrate $DAGSTER_APP/orchestrate
COPY extract $DAGSTER_APP/extract
COPY transform $DAGSTER_APP/transform

# Install all plugins into the `.meltano` directory
RUN meltano install
RUN meltano invoke dbt:deps

# Pin `discovery.yml` manifest by copying cached version to project root
RUN cp -n .meltano/cache/discovery.yml . 2>/dev/null || :

# Don't allow changes to containerized project files
ENV MELTANO_PROJECT_READONLY 1

# Run dagster gRPC server on port 4000
EXPOSE 4000

# Expose default port used by `meltano ui`
EXPOSE 5000
j
And are you passing that task definition’s arn in to the EcsRunLauncher configuration? Or are you letting the EcsRunLauncher construct its own task definition?
j
it must be constructing its own because I didn’t put anything in the playground when I ran it
j
Yep, that should be the case then. When it launches the run, it should tag it with the task arn. Can you describe that task and its task definition?
aws ecs describe-tasks --cluster $CLUSTER --tasks $TASK
and
aws ecs describe-task-definition --task-definition $TASK_DEFINITION
Perhaps something will shake out when we look at those in detail.
j
task:
Copy code
{
    "tasks": [
        {
            "attachments": [
                {
                    "id": "83e9438c-b280-4320-b132-fa3d7ca56532",
                    "type": "ElasticNetworkInterface",
                    "status": "DELETED",
                    "details": [
                        {
                            "name": "subnetId",
                            "value": "subnet-0cdfa5c9f277f103f"
                        },
                        {
                            "name": "networkInterfaceId",
                            "value": "eni-0d260ab344cc886ff"
                        },
                        {
                            "name": "macAddress",
                            "value": "0e:52:7a:86:b6:f1"
                        },
                        {
                            "name": "privateDnsName",
                            "value": "ip-10-171-31-194.ec2.internal"
                        },
                        {
                            "name": "privateIPv4Address",
                            "value": "10.171.31.194"
                        }
                    ]
                }
            ],
            "availabilityZone": "us-east-1b",
            "clusterArn": "arn:aws:ecs:<>:<>:cluster/dagster-cluster",
            "connectivity": "CONNECTED",
            "connectivityAt": "2021-07-13T15:38:04.608000-06:00",
            "containers": [
                {
                    "containerArn": "arn:aws:ecs:<>:<>:container/dagster-cluster/5230dbd6c06642179c2cba75873a2471/419d1b0a-cb47-4cf6-9007-86da2ee13102",
                    "taskArn": "arn:aws:ecs:<>:<>:task/dagster-cluster/5230dbd6c06642179c2cba75873a2471",
                    "name": "run",
                    "image": "<>.<http://dkr.ecr.us-east-1.amazonaws.com/wdt-meltano|dkr.ecr.us-east-1.amazonaws.com/wdt-meltano>",
                    "imageDigest": "sha256:922f459660a303fa4aef58c52d551071100f3657eff65851f3bf7608d4d6a469",
                    "runtimeId": "5230dbd6c06642179c2cba75873a2471-718098122",
                    "lastStatus": "STOPPED",
                    "exitCode": 2,
                    "networkBindings": [],
                    "networkInterfaces": [
                        {
                            "attachmentId": "83e9438c-b280-4320-b132-fa3d7ca56532",
                            "privateIpv4Address": "10.171.31.194"
                        }
                    ],
                    "healthStatus": "UNKNOWN",
                    "cpu": "0"
                },
                {
                    "containerArn": "arn:aws:ecs:<>:<>:container/dagster-cluster/5230dbd6c06642179c2cba75873a2471/d5cd096a-0023-4d6f-992f-84e94bd6a857",
                    "taskArn": "arn:aws:ecs:<>:<>:task/dagster-cluster/5230dbd6c06642179c2cba75873a2471",
                    "name": "daemon_resolvconf_initcontainer",
                    "image": "docker/ecs-searchdomain-sidecar:1.0",
                    "runtimeId": "5230dbd6c06642179c2cba75873a2471-904540743",
                    "lastStatus": "STOPPED",
                    "exitCode": 0,
                    "networkBindings": [],
                    "networkInterfaces": [
                        {
                            "attachmentId": "83e9438c-b280-4320-b132-fa3d7ca56532",
                            "privateIpv4Address": "10.171.31.194"
                        }
                    ],
                    "healthStatus": "UNKNOWN",
                    "cpu": "0"
                }
            ],
            "cpu": "256",
            "createdAt": "2021-07-13T15:38:00.588000-06:00",
            "desiredStatus": "STOPPED",
            "enableExecuteCommand": false,
            "executionStoppedAt": "2021-07-13T15:38:57.885000-06:00",
            "group": "family:dagster-run",
            "healthStatus": "UNKNOWN",
            "lastStatus": "STOPPED",
            "launchType": "FARGATE",
            "memory": "512",
            "overrides": {
                "containerOverrides": [
                    {
                        "name": "daemon_resolvconf_initcontainer"
                    },
                    {
                        "name": "run",
                        "command": [
                            "dagster",
                            "api",
                            "execute_run",
                            "{\"__class__\": \"ExecuteRunArgs\", \"instance_ref\": {\"__class__\": \"InstanceRef\", \"compute_logs_data\": {\"__class__\": \"ConfigurableClassData\", \"class_name\": \"LocalComputeLogManager\", \"config_yaml\": \"base_dir: /opt/dagster/dagster_home/storage\\n\", \"module_name\": \"dagster.core.storage.local_compute_log_manager\"}, \"custom_instance_class_data\": null, \"event_storage_data\": {\"__class__\": \"ConfigurableClassData\", \"class_name\": \"PostgresEventLogStorage\", \"config_yaml\": \"postgres_db:\\n  db_name:\\n    env: DAGSTER_POSTGRES_DB\\n  hostname:\\n    env: DAGSTER_POSTGRES_HOSTNAME\\n  password:\\n    env: DAGSTER_POSTGRES_PASSWORD\\n  port: 5432\\n  username:\\n    env: DAGSTER_POSTGRES_USER\\n\", \"module_name\": \"dagster_postgres.event_log\"}, \"local_artifact_storage_data\": {\"__class__\": \"ConfigurableClassData\", \"class_name\": \"LocalArtifactStorage\", \"config_yaml\": \"base_dir: /opt/dagster/dagster_home/\\n\", \"module_name\": \"dagster.core.storage.root\"}, \"run_coordinator_data\": {\"__class__\": \"ConfigurableClassData\", \"class_name\": \"QueuedRunCoordinator\", \"config_yaml\": \"{}\\n\", \"module_name\": \"dagster.core.run_coordinator\"}, \"run_launcher_data\": {\"__class__\": \"ConfigurableClassData\", \"class_name\": \"EcsRunLauncher\", \"config_yaml\": \"{}\\n\", \"module_name\": \"dagster_aws.ecs\"}, \"run_storage_data\": {\"__class__\": \"ConfigurableClassData\", \"class_name\": \"PostgresRunStorage\", \"config_yaml\": \"postgres_db:\\n  db_name:\\n    env: DAGSTER_POSTGRES_DB\\n  hostname:\\n    env: DAGSTER_POSTGRES_HOSTNAME\\n  password:\\n    env: DAGSTER_POSTGRES_PASSWORD\\n  port: 5432\\n  username:\\n    env: DAGSTER_POSTGRES_USER\\n\", \"module_name\": \"dagster_postgres.run_storage\"}, \"schedule_storage_data\": {\"__class__\": \"ConfigurableClassData\", \"class_name\": \"PostgresScheduleStorage\", \"config_yaml\": \"postgres_db:\\n  db_name:\\n    env: DAGSTER_POSTGRES_DB\\n  hostname:\\n    env: DAGSTER_POSTGRES_HOSTNAME\\n  password:\\n    env: DAGSTER_POSTGRES_PASSWORD\\n  port: 5432\\n  username:\\n    env: DAGSTER_POSTGRES_USER\\n\", \"module_name\": \"dagster_postgres.schedule_storage\"}, \"scheduler_data\": {\"__class__\": \"ConfigurableClassData\", \"class_name\": \"DagsterDaemonScheduler\", \"config_yaml\": \"{}\\n\", \"module_name\": \"dagster.core.scheduler\"}, \"settings\": {\"telemetry\": null}}, \"pipeline_origin\": {\"__class__\": \"PipelinePythonOrigin\", \"pipeline_name\": \"google_analytics_pipeline\", \"repository_origin\": {\"__class__\": \"RepositoryPythonOrigin\", \"code_pointer\": {\"__class__\": \"FileCodePointer\", \"fn_name\": \"widen_elt_repository\", \"python_file\": \"orchestrate/dagster_repos.py\", \"working_directory\": \"orchestrate\"}, \"container_image\": \"<http://974251539015.dkr.ecr.us-east-1.amazonaws.com/wdt-meltano\|974251539015.dkr.ecr.us-east-1.amazonaws.com/wdt-meltano\>", \"executable_path\": \"/usr/local/bin/python\"}}, \"pipeline_run_id\": \"56566ceb-6ced-42b0-b265-755167a5d0e9\"}"
                        ]
                    }
                ],
                "inferenceAcceleratorOverrides": []
            },
            "platformVersion": "1.4.0",
            "pullStartedAt": "2021-07-13T15:38:13.685000-06:00",
            "pullStoppedAt": "2021-07-13T15:38:50.434000-06:00",
            "startedAt": "2021-07-13T15:38:55.879000-06:00",
            "stopCode": "EssentialContainerExited",
            "stoppedAt": "2021-07-13T15:39:21.359000-06:00",
            "stoppedReason": "Essential container in task exited",
            "stoppingAt": "2021-07-13T15:39:07.919000-06:00",
            "tags": [],
            "taskArn": "arn:aws:ecs:<>:<>:task/dagster-cluster/5230dbd6c06642179c2cba75873a2471",
            "taskDefinitionArn": "arn:aws:ecs:<>:<>:task-definition/dagster-run:4",
            "version": 6,
            "ephemeralStorage": {
                "sizeInGiB": 20
            }
        }
    ],
    "failures": []
}
task-definition:
Copy code
{
  "taskDefinition": {
    "taskDefinitionArn": "arn:aws:ecs:<>:<>:task-definition/dagster-run:4",
    "containerDefinitions": [
      {
        "name": "daemon_resolvconf_initcontainer",
        "image": "docker/ecs-searchdomain-sidecar:1.0",
        "cpu": 0,
        "portMappings": [],
        "essential": false,
        "command": [
          "<>.compute.internal",
          "dagster.prod"
        ],
        "environment": [],
        "mountPoints": [],
        "volumesFrom": [],
        "logConfiguration": {
          "logDriver": "awslogs",
          "options": {
            "awslogs-create-group": "true",
            "awslogs-group": "/ecs/dagster-cluster",
            "awslogs-region": "<>",
            "awslogs-stream-prefix": "dagster-service"
          }
        }
      },
      {
        "name": "run",
        "image": "<>.<http://dkr.ecr.us-east-1.amazonaws.com/wdt-meltano|dkr.ecr.us-east-1.amazonaws.com/wdt-meltano>",
        "cpu": 0,
        "portMappings": [],
        "essential": true,
        "entryPoint": [
          "dagster-daemon",
          "run"
        ],
        "environment": [
          {
            "name": "DAGSTER_POSTGRES_HOSTNAME",
            "value": "elt_system_db"
          }
        ],
        "mountPoints": [],
        "volumesFrom": [],
        "secrets": [
          {
            "name": "DAGSTER_POSTGRES_DB",
            "valueFrom": "arn:aws:ssm:<>:<>:parameter/elt_system_db_dbname"
          },
          {
            "name": "DAGSTER_POSTGRES_USER",
            "valueFrom": "arn:aws:ssm:<>:<>:parameter/elt_system_db_user"
          },
          {
            "name": "DAGSTER_POSTGRES_PASSWORD",
            "valueFrom": "arn:aws:ssm:<>:<>:parameter/elt_system_db_password"
          }
        ],
        "dependsOn": [
          {
            "containerName": "daemon_resolvconf_initcontainer",
            "condition": "SUCCESS"
          }
        ],
        "logConfiguration": {
          "logDriver": "awslogs",
          "options": {
            "awslogs-create-group": "true",
            "awslogs-group": "/ecs/dagster-cluster",
            "awslogs-region": "<>",
            "awslogs-stream-prefix": "dagster-service"
          }
        }
      }
    ],
    "family": "dagster-run",
    "taskRoleArn": "arn:aws:iam::<>:role/dagsterEcsTaskRole",
    "executionRoleArn": "arn:aws:iam::<>:role/dagsterEcsTaskExecutionRole",
    "networkMode": "awsvpc",
    "revision": 4,
    "volumes": [],
    "status": "ACTIVE",
    "requiresAttributes": [
      {
        "name": "com.amazonaws.ecs.capability.logging-driver.awslogs"
      },
      {
        "name": "ecs.capability.execution-role-awslogs"
      },
      {
        "name": "com.amazonaws.ecs.capability.docker-remote-api.1.19"
      },
      {
        "name": "com.amazonaws.ecs.capability.ecr-auth"
      },
      {
        "name": "com.amazonaws.ecs.capability.task-iam-role"
      },
      {
        "name": "ecs.capability.container-ordering"
      },
      {
        "name": "ecs.capability.execution-role-ecr-pull"
      },
      {
        "name": "ecs.capability.secrets.ssm.environment-variables"
      },
      {
        "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
      },
      {
        "name": "ecs.capability.task-eni"
      },
      {
        "name": "com.amazonaws.ecs.capability.docker-remote-api.1.29"
      }
    ],
    "placementConstraints": [],
    "compatibilities": [
      "EC2",
      "FARGATE"
    ],
    "requiresCompatibilities": [
      "FARGATE"
    ],
    "cpu": "256",
    "memory": "512",
    "registeredAt": "2021-07-13T15:37:59.773000-06:00",
    "registeredBy": "arn:aws:sts::<>:assumed-role/dagsterEcsTaskRole/505f328f691a4439ad3a1554bfcebb38"
  }
}
there are some redactions
j
Yeah, understandable. Those look like they might be the task and task definition for the grpc server, not for the actual pipeline run. In dagit, did the run get tagged with a task arn? There should be two tags:
ecs/cluster
and
ecs/task_arn
j
oh, right .. just a second
ok, I just updated the two messages above with the details of the run instead of the pipeline task definition. I don’t want to make this thread any longer than it has to be 😉
j
Copy code
"entryPoint": [
                    "dagster-daemon",
                    "run"
                ],
ah, it looks the task definition indeed has an entrypoint. It looks like it’s not your grpc server image that it’s using to generate its task, it’s your daemon image. Does that Dockerfile have an entrypoint defined?
Although I think I might have a more general fix in mind that I can try to get into next week’s release - Basically, when it starts building its task definition, if there’s an entrypoint defined, we could probably just have the new task definition drop it: https://github.com/dagster-io/dagster/blob/5db556594b050b1bb980fbae7c5725411cfbdba5/python_modules/libraries/dagster-aws/dagster_aws/ecs/launcher.py#L167-L169
j
I thought that was odd, but I don’t yet understand very well how all this works. The dagster daemon dockerfile does not have an entrypoint or a CMD
j
Does its task definition?
j
yes
if I make it a
command
instead, would that fix it, do you think?
j
I have a hunch it might
j
Nice! past that error now. On to the next. Thanks a million!!!
j
Awesome! Good stuff - glad to get some real world feedback on this. I’ll look into putting a more general fix in for this.
250 Views